Microsoft launches ASSERT, an AI behavior testing tool
TechCrunch
06-03 03:06
Ai Focus
Microsoft has released the open-source framework ASSERT for behavioral testing and regression evaluation of AI applications.
Helpful
No.Help

Microsoft released the open-source framework ASSERT on Tuesday, aiming to simplify AI behavior testing. This tool is designed for application-level scenarios, helping developers check whether models or agents are functioning as required by the product.

Natural Language Direct Generation Test

ASSERT stands for Adaptive Spec-driven Scoring for Evaluation and Regression Testing. Microsoft states that developers simply need to write down goals, policies, or expected behaviors in natural language, and the system will convert these descriptions into scoreable test cases.

The tool first breaks down the rules into acceptable and unacceptable behaviors, then generates scenarios and test tasks, and finally runs tests on the target system and scores them. It also records the model's execution path, including intermediate actions and tool calls, making it easier for developers to pinpoint failure points.

More suitable for application-level AI scenarios

Microsoft states that ASSERT is suitable for systems whose behavior is constrained by product context, policies, and tools. Compared to general assessments, it places greater emphasis on whether specific business rules are followed.

For example, if a document research agent is required not to send emails outside the company, to disclose confidential information only to senior executives, and to provide concise summaries, ASSERT can continuously generate tests around these requirements.

It can be used for development, post-deployment, and monitoring.

Sarah Bird, Microsoft's chief product officer in charge of Responsible AI, said that evaluation is crucial for making the right decisions. She stated that without understanding how an AI system behaves, it's difficult to determine whether it meets organizational requirements.

Bird also stated that ASSERT can be used during model development, post-deployment, and ongoing monitoring. Microsoft said this release also reflects the AI industry's increasing emphasis on repeatability and regression testing.

Evaluation programs such as Stanford's HELM, MLCommons' AILuminate, and METR are also pushing for similar standards for testing model behavior.

Tip
$0
Like
0
Save
0
Views 784
CoinMeta reminds readers to view blockchain rationally, stay aware of risks, and beware of virtual token issuance and speculation. All content on this site represents market information or related viewpoints only and does not constitute any form of investment advice. If you find sensitive content, please click“Report”,and we will handle it promptly。
Submit
Comment 0
Hot
Latest
No comments yet. Be the first!
Related
Revolut launches limited testing in India
Revolut has begun a limited rollout in India, with initial users able to use UPI payments, e-wallets, and multi-currency cards.
TechCrunch
·2026-06-01 23:06:31
807
Nvidia launches RTX Spark, joining forces with Microsoft, Dell, and HP to develop AI PCs.
NVIDIA launched RTX Spark, partnering with Microsoft and several PC manufacturers to advance Windows AI PCs capable of running AI agents locally, targeting a new CPU market.
TechCrunch
·2026-06-02 05:44:56
158
Solana launches native on-chain subscription payment tool
Solana has launched a native on-chain subscription payment infrastructure that supports recurring billing, delegated budgeting, and payroll automation, and is expanding into AI agent and stablecoin payment scenarios.
SolanaFloor
·2026-06-04 05:07:03
246
Microsoft releases enterprise AI agent Scout
Microsoft launched Scout, an enterprise AI agent at Build 2026, which is based on OpenClaw and designed for Microsoft 365 office scenarios.
Coinpaper
·2026-06-03 04:55:46
899
Microsoft released seven AI models, claiming that some outperformed Claude and Google.
Microsoft released seven MAI series AI models, claiming that some test results surpassed those of Anthropic and Google products, indicating that it is accelerating the development of its own cutting-edge models.
Coinpaper
·2026-06-03 06:45:42
209