AI Detector False Positives Are Mathematically Unavoidable
Even a perfect AI detector wrongly accuses real students—by design. The base-rate math, the signal-detection tradeoff, and why no update fixes it.
AI Detector False Positives Are Mathematically Unavoidable
Every few months a detector vendor announces a new model that's "99% accurate" and "virtually eliminates false positives." Every few months, more students get wrongly accused. That's not a coincidence, and it's not a temporary growing pain that the next version will fix. A floor on false accusations is baked into the math of what these tools are trying to do. You can move it around. You cannot engineer it to zero.
Here's the argument—and it doesn't rest on hating the technology. It rests on three things that are true no matter how good the detector gets.
1. The signal-detection tradeoff
An AI detector is a binary classifier: it draws a line and says "AI" on one side, "human" on the other. The problem is that human and machine writing aren't two separate clouds with a gap between them—they overlap. Plenty of genuine human writing is formulaic, grammatically smooth, and low in surprise (statisticians call this low "perplexity"), which is exactly what the "AI" side of the line looks like.
That overlap creates an unbreakable tradeoff. Move the line to catch more real AI (raise sensitivity), and you sweep in more humans whose authentic style sits in the overlap (raise false positives). Move it to spare those humans, and more AI slips through. You don't get to lower both at once; you only get to choose which error you'd rather make. This is the ROC tradeoff, and it's a property of overlapping distributions, not of any particular vendor's code.
2. The base-rate problem makes a "tiny" error rate huge
Now scale it up, and a second, more brutal piece of arithmetic kicks in. When the thing you're hunting is rare, even a small false-positive rate produces mostly false accusations.
The cleanest real-world example comes from Vanderbilt University, which disabled Turnitin's AI detector in 2023 and showed its work: at Turnitin's claimed false-positive rate of about 1%, applied to the roughly 75,000 papers Vanderbilt processed in 2022, you'd expect about 750 students wrongly flagged—each one a misconduct investigation that shouldn't exist. That's the optimistic case, using the vendor's own number. Push the false-positive rate down and you blunt the detector's ability to catch anything; a 2026 mathematical analysis illustrates the squeeze—demanding a false-positive rate under 1% can drop a detector's catch rate to single digits. (More on that paper below, with the appropriate caveats.)
The base-rate trap in one line: if only a small fraction of essays are actually AI-written, then even a 99%-accurate detector produces a pile of flags that are mostly false. Accuracy on a balanced test set is not the same as trustworthiness on a real applicant pool.
3. The information-theoretic limit: better AI makes detection worse
The third leg is the one vendors never put on the slide. The whole field is built on the idea that AI text has a detectable statistical fingerprint. But the entire point of a frontier language model is to write more like a human—which, by construction, shrinks that fingerprint. A widely cited analysis ("Can AI-Generated Text Be Reliably Detected?", published in the peer-reviewed Transactions on Machine Learning Research) showed that as language models get better at emulating human writing, the best possible detector's accuracy drifts toward a coin flip. Improving the model and defeating the detector are, mathematically, the same motion. Every detector "win" is temporary because the next model erases the edge it exploited—the same arms-race dynamic we mapped in the AI-humanizer economy.
A March 2026 preprint by Nathan Garland, an applied mathematician at Griffith University in Australia, ties these threads into one formal claim: any text-only detector with useful power "must produce false accusations at a rate governed by the distributional overlap between student writing and AI output"—a floor that is "logically independent of AI model quality and cannot be overcome by better detector engineering." Two honest caveats: it's a preprint, not peer-reviewed, and its eye-catching figures (≈750 false accusations per 10,000 students; a 1% cap collapsing power to ~6%) are, in the author's own words, illustrative worked examples, not measurements. But the underlying machinery—overlap sets a floor—is the same total-variation-distance reasoning as the peer-reviewed work above. The preprint is the clean statement of an idea that was already established.
Who the floor lands on
A floor on false accusations would be tolerable if it fell randomly. It doesn't. It falls on the writers whose authentic style sits inside the overlap—the people who write in plainer, more predictable English. That's not a hypothetical: peer-reviewed Stanford research found detectors flagged 61% of essays by non-native English speakers as AI, while barely touching native writers, and the false positives vanished when the same essays were rewritten with fancier vocabulary. The detectors were reading style, not authorship. We cover that in depth in how AI detectors are biased against international students.
The same mechanism is now surfacing for a second group—autistic and ADHD students, whose consistent, literal, structured prose produces the same low-perplexity signature. That frontier is newer and not yet quantified, but the cases are already in court; we examine it in AI detectors and neurodivergent students.
Even the people who build them concede the point
The most telling evidence isn't a critic's paper—it's the builders backing away:
- OpenAI killed its own AI-text classifier in July 2023 for a "low rate of accuracy" (its own numbers: 26% of AI text caught, 9% of human text wrongly flagged). The company that makes the models couldn't reliably detect them.
- Vanderbilt and the University of Waterloo turned their detectors off, citing unreliability and bias.
- Turnitin and GPTZero both state, in their own documentation, that a score is an indicator, not proof, and should never be the sole basis for an accusation.
- A federal regulator got involved: in 2025 the FTC ordered the maker of one detector to stop advertising "98% accuracy" after independent testing put it around 53%—"no better than a coin toss." (That finding was about one company's inflated marketing claim, not a verdict on every tool—but it's the government now policing detector accuracy.)
The honest counterarguments
To be fair: detectors are improving, and independent evaluations show the best current tools holding a genuinely low false-positive rate on ordinary essays. Used as one signal among several—a flag that triggers a conversation and a look at your drafts, not an automatic charge—a detector is far less dangerous than a detector used as a verdict. And the impossibility papers are theoretical and contested; none proves that every deployed tool is useless today.
All true. But none of it moves the three facts above. A low average false-positive rate still produces hundreds of wrong flags at population scale; the errors still concentrate on the most vulnerable writers; and a probability is still not proof. (It's also worth not conflating this with plagiarism detection, which matches your text against real, verifiable sources—that's concrete evidence, where an AI score is a statistical guess.)
The fix was never a better detector
If the false-positive floor is structural, then waiting for an accurate-enough detector is waiting for something the math won't deliver. The honest institutional response is to stop treating a score as a conviction:
- Keep your receipts. Drafts, outlines, and Google Docs version history are evidence of how you wrote—worth more than any guess about the finished text.
- Demand a human, and a real hearing. A score can start a conversation; it can't end one. That's the line a growing number of professors refuse to cross, and it's why students who fight these cases tend to win on due process, not on the detector being "debunked".
- Write like yourself. The students in our collection of false-accusation stories did nothing wrong; the tool did. Authenticity isn't a detector-evasion strategy—it's just the only thing that's actually true.
The detector vendors will keep shipping a "more accurate" model. Believe the math instead of the press release: the floor moves; it never disappears.
Sources
Vanderbilt University, "Guidance on AI Detection and Why We're Disabling Turnitin's AI Detector" (2023); Sadasivan et al., "Can AI-Generated Text be Reliably Detected?" (TMLR; arXiv:2303.11156); Weber-Wulff et al., "Testing of detection tools for AI-generated text" (International Journal for Educational Integrity, 2023); Liang et al., "GPT detectors are biased against non-native English writers" (Patterns, 2023); N. Garland, "AI Detectors Fail Diverse Student Populations: A Mathematical Framing of Structural Detection Limits" (arXiv:2603.20254, 2026 — preprint, illustrative figures); OpenAI's classifier retirement (July 2023); FTC v. Workado order (2025). Detector self-checking and per-tool false-positive rates are compared in our AI detector false-positive data. Nothing here is legal advice.
Related Reading
- AI Detectors Flag Autistic and ADHD Writers — The Evidence
- AI Detection Tools Are Biased Against International Students
- AI Detector False-Positive Rates Compared
- Do AI Humanizers Actually Work?
- AI Cheating Lawsuit Myths — Students Win on Due Process
Quick AI Check
See if your essay will pass university AI detection in seconds.