Microsoft launches ASSERT, an AI behavior testing tool

BTC

$63,734.01

-3.09%

NewsNewsDetails

TechCrunch

06-03 03:06

Ai Focus

Microsoft has released the open-source framework ASSERT for behavioral testing and regression evaluation of AI applications.

Helpful

No.Help

Microsoft released the open-source framework ASSERT on Tuesday, aiming to simplify AI behavior testing. This tool is designed for application-level scenarios, helping developers check whether models or agents are functioning as required by the product.

Natural Language Direct Generation Test

ASSERT stands for Adaptive Spec-driven Scoring for Evaluation and Regression Testing. Microsoft states that developers simply need to write down goals, policies, or expected behaviors in natural language, and the system will convert these descriptions into scoreable test cases.

The tool first breaks down the rules into acceptable and unacceptable behaviors, then generates scenarios and test tasks, and finally runs tests on the target system and scores them. It also records the model's execution path, including intermediate actions and tool calls, making it easier for developers to pinpoint failure points.

More suitable for application-level AI scenarios

Microsoft states that ASSERT is suitable for systems whose behavior is constrained by product context, policies, and tools. Compared to general assessments, it places greater emphasis on whether specific business rules are followed.

For example, if a document research agent is required not to send emails outside the company, to disclose confidential information only to senior executives, and to provide concise summaries, ASSERT can continuously generate tests around these requirements.

It can be used for development, post-deployment, and monitoring.

Sarah Bird, Microsoft's chief product officer in charge of Responsible AI, said that evaluation is crucial for making the right decisions. She stated that without understanding how an AI system behaves, it's difficult to determine whether it meets organizational requirements.

Bird also stated that ASSERT can be used during model development, post-deployment, and ongoing monitoring. Microsoft said this release also reflects the AI industry's increasing emphasis on repeatability and regression testing.

Evaluation programs such as Stanford's HELM, MLCommons' AILuminate, and METR are also pushing for similar standards for testing model behavior.

Follow CoinMeta official accounts to stay updated

@CoinMeta_Labs

Tip

Save

CoinMeta reminds readers to view blockchain rationally, stay aware of risks, and beware of virtual token issuance and speculation. All content on this site represents market information or related viewpoints only and does not constitute any form of investment advice. If you find sensitive content, please click“Report”，and we will handle it promptly。

Submit

Comment 0

Hot

Latest

No comments yet. Be the first!

Revolut launches limited testing in India

Revolut has begun a limited rollout in India, with initial users able to use UPI payments, e-wallets, and multi-currency cards.

TechCrunch

·2026-06-01 23:06:31

807

Nvidia launches RTX Spark, joining forces with Microsoft, Dell, and HP to develop AI PCs.

NVIDIA launched RTX Spark, partnering with Microsoft and several PC manufacturers to advance Windows AI PCs capable of running AI agents locally, targeting a new CPU market.

TechCrunch

·2026-06-02 05:44:56

158

Solana launches native on-chain subscription payment tool

Solana has launched a native on-chain subscription payment infrastructure that supports recurring billing, delegated budgeting, and payroll automation, and is expanding into AI agent and stablecoin payment scenarios.

SolanaFloor

·2026-06-04 05:07:03

246

Microsoft releases enterprise AI agent Scout

Microsoft launched Scout, an enterprise AI agent at Build 2026, which is based on OpenClaw and designed for Microsoft 365 office scenarios.

Coinpaper

·2026-06-03 04:55:46

899

Microsoft released seven AI models, claiming that some outperformed Claude and Google.

Microsoft released seven MAI series AI models, claiming that some test results surpassed those of Anthropic and Google products, indicating that it is accelerating the development of its own cutting-edge models.

Coinpaper

·2026-06-03 06:45:42

209

Latest from Author

Refresh

Apple approves Poke integration with enterprise messaging platform

17m ago 172peopleViews

Helion raises $465 million to advance Microsoft's fusion power project.

37m ago 218peopleViews

Meta's oversight committee criticized the lack of transparency in the account banning process.

2h ago 366peopleViews

Recommended