Stanford study: AI outperforms law professors in legal reasoning.
Coinpaper
22h ago
Ai Focus
A Stanford study found that law professors more often chose AI-generated answers in blind tests of contract law questions, with models such as Gemini and NotebookLM generally outperforming human answers.
Helpful
No.Help

A study led by Stanford University shows that in contract law reasoning tasks, law professors are more likely to choose AI-generated answers than versions written by their peers. The research team believes this demonstrates that large language models, in certain specialized scenarios, are already approaching common evaluation standards in the legal field.

Nearly 3,000 blind test comparisons

The study invited 16 professors from 14 law schools in the United States to participate in creating the questions, including Stanford, Yale, New York University, University of Chicago, Georgetown University, UCLA, and the University of Virginia. The 40 questions covered contract law principles, case law, hypothetical questions, and policy discussions.

In 2,918 blind tests, reviewing professors were required to choose between two anonymous responses that they would prefer to give to their students. The results showed that Google's Gemini 2.5 Pro outperformed human responses by 75.92%, while NotebookLM had a win rate of 74.75%.

Dominant in multiple question types

The study found that AI outperformed human answers across multiple question types, including memorization-based questions involving case law, legal provisions, and legal principles, as well as hypothetical analysis and policy discussions. Researchers also examined whether professors' judgments were merely a matter of personal preference, finding a higher level of consistency than randomness.

To rule out the possibility that it was merely a matter of more formal writing style, the team further analyzed characteristics such as answer length, structure, level of reasoning, legal basis, tone, clarity, and pedagogical support. The study concludes that these superficial factors are insufficient to fully explain professors' preferences for AI answers.

Fewer harmful content tags

This study also compared the proportion of answers marked as harmful. Gemini's proportion was 3.41%, NotebookLM's was 3.64%, and human answers' was 12.06%. In another set of additional model comparisons, Anthropic's Claude Opus 4.7 ranked first, followed by OpenAI's ChatGPT 5.4.

However, the study also suggests that this test did not measure whether the answers aligned with each professor's individual teaching preferences. Therefore, while AI answers may be generally acceptable, they may not precisely match the teaching style of a particular teacher.

The legal industry is still weighing the pace of adoption.

This research comes as courts, law firms, and law schools are still debating how AI should be integrated into legal workflows. Supporters argue that AI can improve the efficiency of legal services and will become one of the fundamental tools for future legal roles.

However, the legal industry remains wary of the potential for AI illusions. The report mentions that in April of this year, the law firm Sullivan & Cromwell admitted in a U.S. bankruptcy court that one of its documents contained AI-generated false quotations.

Tip
$0
Like
0
Save
0
Views 653
CoinMeta reminds readers to view blockchain rationally, stay aware of risks, and beware of virtual token issuance and speculation. All content on this site represents market information or related viewpoints only and does not constitute any form of investment advice. If you find sensitive content, please click“Report”,and we will handle it promptly。
Submit
Comment 0
Hot
Latest
No comments yet. Be the first!
Related
Perplexity will push hybrid reasoning functionality
Perplexity plans to launch hybrid inference in its Windows app in July, allowing some AI tasks to run locally on the user's device while the rest are handled by cloud-based models.
Coinpaper
·2026-06-04 04:27:15
687
Foreign media: XRP outperforms ETH at the start of Q2
Foreign media reports that the growth in the size of the XRPL stablecoin and on-chain transactions drove XRP to outperform ETH in early Q2.
AMBCrypto
·2026-05-31 00:23:07
123
WLFI and Justin Sun are locked in a legal battle, escalating the dispute over token freezing.
WLFI and Justin Sun have filed lawsuits against each other over disputes over wallet freezing and token transfers, drawing market attention to DeFi governance authority and blacklist mechanisms.
crypto.news
·2026-05-29 21:17:46
608
Microsoft releases enterprise AI agent Scout
Microsoft launched Scout, an enterprise AI agent at Build 2026, which is based on OpenClaw and designed for Microsoft 365 office scenarios.
Coinpaper
·2026-06-03 04:55:46
898
Uber tightens cap on employee AI spending
Uber has set monthly spending caps for employees and AI programming tools because the company used up its entire annual AI budget in four months.
TechCrunch
·2026-06-03 03:28:11
191