CASPA AI Detection — What PAEA Says About False Positives
PAEA will not investigate CASPA applicants on AI-detection-only evidence. Stanford found a 61% false-positive rate for ESL writers. What to know.
CASPA AI Detection: What PAEA's Own Research Says About False Positives
Short answer: PAEA centrally will not investigate a CASPA applicant on AI-detection-only evidence. Its own published guidance to PA programs acknowledges current detectors have unacceptably high false-positive rates and recommends in-person essay-writing during interviews as the real verification mechanism. But — MEDEX Northwest is the one program in the 20-program PA-admissions survey that reserves the right to use detection anyway.
If you are losing sleep over whether the sentence you are most proud of will get flagged by some algorithm at PAEA central — stop. The algorithm is not the thing that will decide your application, and PAEA has said so in writing.
This article assembles the primary-source evidence. It is the article I wish existed when I first went looking for what PAEA actually does. For the binding rule itself — the CASPA certification statement that every applicant signs — read our companion piece, CASPA AI Certification Decoded. This piece is about the enforcement question: given that CASPA bans AI use, will a detector be the thing that catches you?
What PAEA actually says about AI detection
PAEA, the Physician Assistant Education Association that runs CASPA, published a member-facing guidance document titled What Your Program Should Know About AI and Admissions. This is the primary source every PA applicant should know exists, and almost none do.
The document says three things that matter for applicants worrying about false positives:
"Current tools available for detecting material written by generative AI have fairly high false positive and false negative rates."
"PAEA will not investigate an applicant if the only evidence that the applicant did not write his or her personal essay comes from AI detection software."
"The developers of the AI detectors themselves warn in their terms of service against using them as the sole basis to make important decisions about students."
Read that second sentence again. PAEA will not investigate on detection-only evidence. This is not "PAEA discourages." This is not "PAEA advises caution." PAEA explicitly states it will not act. The organization that operates the centralized application service for every PA program in the United States has said on the record that a detector score, standing alone, is not grounds for investigation.
That sentence — buried in a member-facing page most applicants never find — is the single most reassuring piece of primary-source evidence available to any CASPA applicant reading this article. It is the reason this article exists.
What PAEA recommends instead
Rather than detection, PAEA's guidance document suggests three alternative verification methods for programs that want to test essay authenticity:
- Short response interview-type prompts answered in a remote-online proctored platform through a lockdown browser — think a 15-minute timed essay written while the program watches.
- Short-form digital video statements of two minutes or less in response to program-specific prompts, submitted alongside the written application.
- Asking applicants to write an essay during the in-person interview — the classic "write about X for thirty minutes while we sit next to you" method.
All three methods have something in common: they compare your live writing voice to your submitted essay's voice. A candidate who wrote an eloquent, specific, deeply personal CASPA personal statement but then produces generic, disorganized prose under live conditions has a problem — and that problem is revealed by the comparison, not by a detector. PAEA's preferred verification method is human judgment against a live writing sample, not algorithmic analysis.
If you are worried about detection, this is important: the risk PAEA has designed its system to catch is a voice mismatch, not a stylometric fingerprint. Write your CASPA essay yourself and you will be able to match it under any conditions.
Why detectors fail: the Hollman PT-school study
The closest published research to the question "can detection tools reliably identify AI-written health-professions personal statements" comes from physical therapy — close enough to PA that it is the best analog in the literature.
Hollman et al., Physical Therapy, April 2024. Researchers at Mayo Clinic analyzed 152 deidentified personal statements from applicants who interviewed for a doctoral physical therapy program in 2020-2021 — crucially, before ChatGPT was publicly available, so these were all guaranteed to be human-written. They then generated 20 ChatGPT personal statements in response to the same standardized PTCAS prompt. Both sets were run through recurrence quantification analysis (RQA), a lexical method that measures how repetitive and predictable a piece of writing is.
The paper asked: can a sophisticated, well-designed lexical method reliably distinguish human from AI personal statements for this applicant pool?
Here is the answer, from the published abstract:
"The strongest discriminator was a 13.04% determinism rate, which differentiated ChatGPT from human-generated writing samples with 70% sensitivity and 91.4% specificity."
Translate that from statistics into English:
- Sensitivity 70% means the method caught 70% of the actual AI essays. It missed 30%. Three out of every ten AI-written personal statements slipped through.
- Specificity 91.4% means the method correctly cleared 91.4% of human essays — but it wrongly flagged 8.6% of human applicants as AI users.
Project that onto a realistic CASPA applicant pool. CASPA received roughly 30,000 applicants in the 2024-2025 cycle. If you applied a Hollman-style detector to every essay, you would expect around 2,580 innocent applicants to be falsely flagged — in the best-performing academic method published in the literature. And you would still miss 30% of the actual AI-written essays.
That is not a detector you can build an enforcement system around. And PAEA knows it.
The authors are blunt about what their method can and cannot do. Their own conclusion:
"RQA indices detected personal statements generated by ChatGPT as being less lexically sophisticated and more predictable, repetitive, and disordered than those composed by human applicants to a professional physical therapist education program."
A finding about populations — AI essays on average look different from human essays on average — is not a finding that lets you classify individuals. A 91.4% specificity is a finding you can publish in a peer-reviewed journal. It is not a finding you can use to end a nursing student's career over a personal statement they wrote themselves.
Why detectors fail worse for ESL writers: the Stanford 61% finding
The Hollman paper is about PT-school applicants generally. The Stanford research is about something more specific — and more damning for anyone who learned English as a second language.
In 2023, researchers at Stanford HAI published a now-famous study, AI-Detectors Biased Against Non-Native English Writers. They ran several mainstream AI detectors against TOEFL essays written by non-native English speakers and against essays by US-born eighth-graders.
The findings:
- The detectors were "near-perfect" on essays by US-born eighth-graders.
- The detectors incorrectly labeled more than half of the TOEFL essays as AI-generated.
- The average false-positive rate on TOEFL essays was 61.3%.
- 19.8% of human-written TOEFL essays were unanimously flagged as AI by every single detector.
- At least one detector flagged 97.8% of the TOEFL essays as AI-generated.
Sit with those numbers for a second. Nearly one in five TOEFL essays — essays written by real international students about their real experiences — was classified as AI by every detector the researchers tested. Nearly ninety-eight percent of them were flagged by at least one.
The root cause is mechanical, not ideological. AI detectors measure perplexity (how predictable a piece of text is, word by word) and burstiness (how much sentence length and structure varies). Non-native English speakers are often taught formal, structured, formulaic academic English — because that is what teachers and textbooks reward, everywhere in the world. Short sentences. Common vocabulary. Transitions like "Furthermore" and "Moreover." Hedging language like "It can be argued that."
Those are exactly the patterns AI detectors associate with machine-generated text. The detectors are not picking up "AI-ness." They are picking up "English taught as a second language."
What this means for CASPA applicants. A significant share of PA applicants are internationally-trained healthcare workers — nurses, physicians, medics, paramedics — who cross-trained to the US PA pathway. Many wrote their personal statements in English as a second or third language. If any PA program ran their essays through a naive detector, these applicants would be disproportionately false-flagged. That is why PAEA's guidance is categorical: the only evidence can never be a detector score.
For a deeper look at the false-positive landscape across the whole AI-detection ecosystem — why it is rational to be anxious about it and how that anxiety is changing how students write — read our pillar piece on flagxiety.
How the commercial detectors perform
Let's go through the tools you would actually worry about, tool by tool, using each vendor's own published numbers or well-documented independent testing.
Turnitin
Turnitin is the detector most likely to be used by any program that adopts one, because Turnitin is already integrated into many university LMS platforms. Turnitin has been unusually transparent about its own rates:
- Document-level false-positive rate: less than 1%, according to Turnitin's own published guidance, validated against an 800,000-document pre-GPT test set.
- Sentence-level false-positive rate: approximately 4%, with a higher incidence at the transitions between human- and AI-written segments.
Turnitin itself recommends that scores in the 1%-19% range not be attributed or highlighted — the company explicitly recognizes that this band produces too many false positives to be actionable. That is a significant disclosure from a vendor whose business model depends on the tool's perceived reliability.
Turnitin's own CPO has published multiple blog posts warning educators not to use the tool as the sole basis for academic-integrity decisions. This is the detector that most closely aligns with PAEA's central guidance: Turnitin's own position is that its output is a signal, not a verdict.
ZeroGPT
ZeroGPT is one of the free online detectors students run their own writing through in a panic. Independent testing has documented ZeroGPT at a 16.9% false-positive rate — meaning roughly one in six human-written essays is misclassified. If you are running your authentic CASPA essay through ZeroGPT "just to see," you are rolling a 1-in-6 die for no good reason.
GPTZero
GPTZero, another consumer-facing detector, has been documented at roughly a 22% false-positive rate on certain text types. That is worse than ZeroGPT on average. It is the rate Stanford-style formal writing is flagged at — formal academic prose, TOEFL-style essays, and (not coincidentally) the exact register of a careful CASPA personal statement.
The UC Davis case: 15 of 17 false positives
The most quotable real-world failure is the UC Davis incident reported by The California Aggie. A linguistics professor flagged 17 students for AI use based on the university's detector output. After manual review of each case, 15 of the 17 flags turned out to be false positives.
The flagged students were disproportionately non-native English speakers and students who had worked with writing tutors — both groups whose prose tends to be more formulaic than typical, and both groups the detector treated as suspect. That one case, if it had been enforced on the basis of the detector score alone, would have produced a 15-student miscarriage of justice at a single institution in a single classroom.
Now imagine that multiplied across 30,000 CASPA applicants. That is the math PAEA is staring at, and that is why PAEA has declined to enforce on detection alone.
The MEDEX exception
One PA program has chosen to go against PAEA's central guidance.
University of Washington MEDEX Northwest is the only one of the 20 prominent PA programs in our program-by-program survey that publishes its own AI policy, and the only one that explicitly reserves the right to use detection tools. The policy says:
"MEDEX may use tools that detect AI generated or AI modified content, and may use AI supported systems during admissions review."
This is the direct contradiction. PAEA central: do not act on detector output alone. MEDEX: we reserve the right to use detection. The two positions coexist because PAEA publishes central guidance, not central rules — individual programs can layer their own policies on top.
We covered this contradiction in depth in our MEDEX Northwest AI policy breakdown, which is the only existing deep dive on the one PA program with its own rules. The short version for this article: if MEDEX is on your list, you should assume your essay will be run through a detector even though PAEA centrally will not do that. MEDEX's policy does not commit to investigating on detection-only evidence either — but it leaves the door open, and that is more than any other program in the survey has done.
If MEDEX is not on your list, your risk from detection is materially lower. Every other program we surveyed is silent on detection, which in practice means they follow PAEA's central guidance.
What CASPA still bans even if they won't run the detector
None of what you just read changes the binding rule. PAEA not enforcing on detection output does not mean CASPA allows AI writing. It absolutely does not. The CASPA applicant certification is the strictest published AI prohibition in any US centralized health-professions application:
"I am strictly prohibited from using Generative AI to create, write and/or modify any content, in whole or part, submitted in CASPA and/or provided to PA programs on behalf through any means of communication."
We broke down every clause of the certification — including the word "modify," which does heavy lifting — in our companion piece CASPA AI Certification Decoded. The short version: CASPA bans generative AI from every written passage in the application, including the 600-character experience descriptions most applicants treat as boilerplate.
The certification is an affirmative attestation. You sign it, and that signature is binding regardless of whether any detector ever runs. If a program investigates for any reason — a recommender tip-off, a voice mismatch in an interview, the in-person writing exercise PAEA recommends — the certification you signed becomes the basis for a potential misrepresentation claim.
So the practical stance is this: CASPA will probably not catch you with a detector. CASPA will likely catch you in the interview writing exercise, or through the voice mismatch between a polished personal statement and a live conversation. Those are the enforcement vectors that actually work. The applicant who cannot defend their own sentences under questioning is the applicant who gets caught.
For the full four-system comparison — AMCAS, AACOMAS, CASPA, and TMDSAS — read our breakdown of which AI uses each application service actually permits. CASPA is the strictest, but the other three are not blanket permissions either.
Practical guidance for the AI-anxious CASPA applicant
You read all of the above and you are still worried. Here is what to actually do.
1. Write your essay yourself. Actually.
This is not a workaround; it is the solution. An essay you wrote yourself will survive any enforcement vector PAEA can use. It will match your interview voice. It will match your live writing sample. It will read as yours because it is yours. No detector on earth can take away an application you wrote from first principles.
Every other tactic in this list is secondary to this one.
2. Save your drafting history — every version, with timestamps
Turn on version history in Google Docs or use a text editor that auto-saves. Keep your rough outline, your first draft, the paragraphs you cut, the revisions you made after your pre-health advisor's feedback. This is not paranoia; this is evidence. If your essay is ever questioned, a complete drafting history is the single most persuasive rebuttal you can offer — far more persuasive than any detector's counter-score.
A CASPA personal statement is typically written over weeks, not hours. If someone has to defend yours later, the chronological record of how it came together is the proof that a machine did not write it in a single pass.
3. Prepare for the in-interview writing exercise
PAEA's preferred verification mechanism is that you write an essay at your interview. Some PA programs already do this. Prepare for it: practice writing a 15-minute response to a standardized prompt (about your path to PA, about a clinical experience, about an ethical dilemma you encountered) without any reference materials. Your live writing should sound like your personal statement. If it does not, the mismatch is the actual detector — and it is the one PAEA trusts.
4. Do not run your essay through consumer AI detectors
Really. Do not. The ZeroGPT 16.9% false-positive rate, the GPTZero 22% rate, the Stanford 61% rate for ESL writers — these are the rates you are throwing your essay at when you paste it into a free online tool. A false positive there does not mean a false positive at PAEA. All it does is spike your anxiety and push you into dumbcrafting your essay down to a worse version.
PAEA is not running your essay through ZeroGPT. Stop doing it to yourself.
5. If you are falsely flagged by a program
If a PA program ever contacts you with a detection concern:
- Ask for the specific evidence. The detector name, the score, the sections flagged.
- Reference PAEA's own guidance that detection software is not sufficient basis for an investigation. Cite the PAEA resource page directly.
- Provide your drafting history. Version timestamps, earlier drafts, notes from your advisors.
- Request a live writing sample as an alternative verification. You are volunteering for the exact method PAEA recommends.
- Document the process. Keep a record of every communication, in case the program's decision later requires an appeal.
The few reported cases of AI-detection false-positive disputes at health-professions programs have been resolvable when the applicant could document their writing process. The applicants who struggle are the ones who did not save drafts.
The flagxiety paradox
Here is the trap. You read all the research above. You understand that PAEA will not enforce on detection. You understand that the academic method with the best published specificity still false-flagged 8.6% of human writers. You understand that the Stanford research shows 61% false positives for ESL writers.
And you still open your essay and start "humanizing" it to make it sound less like AI.
That is flagxiety — the anxiety of being falsely flagged, which we coined a term for because it was reshaping how students write across every application cycle. And it has a cousin: dumbcrafting, the practice of deliberately weakening your writing to appease detectors. Students cut their strongest paragraphs. They add deliberate typos. They replace strong verbs with weak ones. They shorten every complex sentence.
All of it makes the essay worse. None of it changes the actual enforcement landscape — because the actual enforcement landscape is the in-interview writing exercise, not a detector.
The hardest mental shift for an AI-anxious applicant is this one: your best writing is your defense, not your vulnerability. Specific sensory details about a patient, a paragraph that captures exactly what you were thinking during a code, a sentence structure that sounds like how you actually talk — these are the exact things that cannot be generated by a prompt. AI-written personal statements fail because they are generic. Yours should be the opposite of generic.
If you want a model of what actually-good-writing-that-cannot-be-generated looks like, read through the sample CASPA personal statements we've analyzed. Every one of them has moments a language model could not produce. That specificity is what gets applicants in. That specificity is also what survives any detection scare.
How we score essays — and why it is not the same as detection
At GradPilot, our AI-assisted medical-school essay review scores applicant essays against a rubric: specificity, narrative coherence, clinical insight, authenticity of voice. We do not run detection on the essay to tell you if it "looks AI." We read the essay against the criteria that actually differentiate strong personal statements from weak ones.
If what you want is a second set of eyes on your CASPA essay — scored the way admissions committees actually score — that is what we built the tool for. We track AI policies for 60+ medical and health-professions schools so we can match our feedback to the rule each program actually enforces. We have a whole cluster of medical school essay guides covering AMCAS, AACOMAS, CASPA, and TMDSAS.
The one sentence to remember
If you remember nothing else from this article, remember this sentence from PAEA's own published guidance:
PAEA will not investigate an applicant if the only evidence that the applicant did not write his or her personal essay comes from AI detection software.
That is the sentence that should make you stop running your essay through ZeroGPT. That is the sentence that should let you stop rewriting your strongest paragraph four times. That is the sentence that matters more than any detector's false-positive rate, because it tells you the detector is not the enforcement mechanism to begin with.
The thing that will decide your CASPA application is the essay you actually wrote, read by a human admissions reader, compared against your live writing voice in the interview. Write it yourself. Keep your drafts. And trust that the system PAEA built was explicitly designed not to punish you for the way an algorithm pattern-matches your sentences.
Related Reading
The CASPA + AI cluster:
- CASPA AI Certification Decoded — What It Actually Bans — the verbatim certification statement and what each clause means
- MEDEX Northwest AI Policy — The One PA Program with Rules — the one program that reserves the right to use detection
- PA School AI Policies 2026 — Program-by-Program Survey — what 20 PA programs published
- CASPA AI and Technology Essay 2026-2027 — Prompt + 7 Angles — the new CASPA AI essay
- Can You Use ChatGPT for Your Medical School Application? — AMCAS, AACOMAS, CASPA, and TMDSAS compared
Detection and flagxiety:
- What Is Flagxiety? — the term we coined for AI-detection anxiety
- The Dumbcrafting Epidemic — How Flagxiety Is Making Students Write Worse — why simplifying your prose to appease detectors backfires
Hubs and resources:
- Medical School Essays — The Complete Guide to AMCAS, AACOMAS, CASPA & TMDSAS — every medical school essay guide on the site
- Medical School AI Policies 2026 — our AI policy hub for health-professions admissions
- How GradPilot's AI-Assisted Medical School Essay Review Works — how we score personal statements without running detection
Worried about your CASPA essay? Run it through GradPilot for a rubric-based review that scores what admissions committees actually read — specificity, coherence, voice — not a detector score that PAEA itself has said not to trust.
Review Your Personal Statement
See how your AMCAS or secondary essay scores before you submit.