International Students Research AI in Education

20% of UC Applicants Wrote in Spanish. ChatGPT Wrote Zero.

Cornell + Stanford analyzed 35,789 Latinx UC applicants: 20% used some Spanish in their essays. Across ~26,000 GPT-3.5 and GPT-4 essays on the same prompts, 0% did.

Nirmal Thacker, CS, Georgia Tech · Cerebras Systems AIMay 10, 202612 min read

Free Statement ReviewVisa statement check

20% of UC Applicants Wrote in Spanish. ChatGPT Wrote Zero.

In a Cornell + Stanford analysis of 35,789 Latinx in-state University of California applicants from the 2016–2017 admissions cycle — a full five years before ChatGPT existed — about 20% of human-written essays included some Spanish: a phrase from a grandmother, the name of a holiday, a song lyric, a beloved nickname, the word for a food no English translation does justice to. When the same researchers fed the same UC essay prompts to GPT-3.5 and GPT-4 and generated roughly 26,000 synthetic essays, 0% of the AI essays included any Spanish words at all (Alvero et al., Cornell + Stanford, Sep 2024).

This is not a story about an AI detector misreading bilingual writing — that's a different problem, covered in our deep dive on AI detection bias against international students. This is the inverse: ask a frontier LLM to write a college essay out of the box, and its default voice is monolingual English in a context where one in five real applicants chose otherwise. The model isn't refusing Spanish. It just doesn't reach for it.

Updated: May 10, 2026. We revise this post as new research emerges.

What the researchers actually did

The paper is Alvero, Lee, Regla-Vargas, Kizilcec, Joachims, and Antonio's "Large language models, social demography, and hegemony", published in Journal of Big Data in September 2024 (volume 11, article 138). It is the same Cornell team behind three of the four major studies in our research pillar on AI in college admissions, with Stanford's Anthony Antonio joining for this one.

The Latinx UC corpus — the part that produced the 20% / 0% finding — has three pieces:

Human baseline: 35,789 in-state Latinx UC applicants from the 2016–2017 cycle, each submitting up to four short personal-statement responses, for a total of 143,156 essays. One of the largest demographically-tagged college-essay corpora researchers have, and it predates ChatGPT's public release by more than five years.
AI corpus: the same UC prompts fed to two models — 12,964 essays from GPT-3.5 and 13,088 from GPT-4, roughly 26,000 synthetic essays generated under "out-of-the-box" conditions. No persona instructions, no demographic priming, just the prompt.
Comparison method: LIWC features for both corpora, plus word-level checks for Spanish-language terms. The team released public-school LIWC features on Harvard Dataverse; a SocArXiv preprint is also available.

The result was a clean comparison of two distributions on the same task: same prompts, same essay length, two very different writers. This was not a controlled experiment in which researchers told the model "do not use Spanish." The default behavior of GPT-3.5 and GPT-4 — asked to write a UC personal statement on a prompt where one in five real Latinx applicants chose to write at least partly in Spanish — was to write in English only, every time, across roughly 26,000 generations.

What 1 in 5 real applicants chose to do

To understand why the zero matters, picture what the 20% looks like. Code-switching in a college essay is rarely a paragraph in Spanish followed by an English translation. It is closer to texture — a word, a phrase, a name — placed where English alone would lose something:

A relative's name. "Mija, you are tired" from a grandmother. The diminutive does work no English equivalent does.
A holiday or ritual. Quinceañera. Día de los Muertos. Las Posadas. Naming the thing in Spanish signals participation, not observation.
A food, a song lyric, a phrase of advice. Pan dulce on Sunday morning. A line from a ranchera at a wedding. Échale ganas. Sí se puede. No te rajes. Untranslatable as a unit; translatable only by losing the cultural register.
A community term. Familia. Barrio. Compadres. The English near-equivalents — family, neighborhood, friends — are flatter.

None of these are stunts. They are how a bilingual writer who lives in two registers actually writes about her own life. A reader who has never used the word mija doesn't need a footnote — context carries it. A reader who has used the word mija gets a small flash of recognition no equivalent English sentence can produce.

In the UC corpus, roughly one in five Latinx applicants made at least one of these moves. In the GPT-3.5 and GPT-4 corpus on the same prompts, the rate was zero. Not low. Zero.

Why ChatGPT defaulted to monolingual English

The easy framing — "AI is racist against Latinx students" — both overstates and misdirects. The Cornell team's own framing is "linguistic hegemony" and "cultural homogenization," and that's the right register. The mechanism is not a moral failure. It's a probability distribution.

Frontier language models are trained on a distribution of text in which standard, formal, monolingual English dominates the "college essay" register. Asked to write a UC personal statement, the model samples the most likely next words for that genre as it exists in its training data — and those words, in a register associated with selective US universities, are English. The model is doing exactly what it was trained to do.

The broader 4-paper Cornell synthesis reinforces the point. Across roughly 170,000 essays, AI-generated text aligns with male-coded writing 65–79% of the time, continuing-generation writing 76–81%, and high-economic-connectedness writing 80–92% on the LIWC features that significantly differ between groups. The "default voice" of an out-of-the-box LLM essay is not a neutral middle. It tracks the dominant register of the genre — and that register, in the model's training data, is the writing of students from privileged, continuing-generation, English-dominant households.

The Spanish gap is a particularly visible instance of the same pattern. Code-switching is not standard in a "selective US college personal essay" as the genre exists on the open web. The model reaches for the genre it knows. The genre it knows is monolingual. And the more capable model doesn't fix it — the same paper found GPT-4 was more skewed than GPT-3.5, not less. Capability is not the binding constraint. The training distribution is.

What this means for bilingual applicants

If you are a bilingual applicant whose real voice on the page includes Spanish — or Mandarin, Tagalog, Yoruba, Vietnamese, Hindi, Haitian Creole, or any other home language you actually live in — the operative finding is straightforward. AI ghostwriting will flatten that out. Not because it can't reach for your home language, but because by default it doesn't.

Brainstorming with AI is not the same as drafting with AI. Foundry10's 2024 survey of 523 student applicants found 50% of AI-users used it for brainstorming and 32% for first drafts, with 10% for translation (Foundry10, July 2024). Brainstorming preserves your voice; first-drafting hands it to the model's default. For bilingual writers, the second mode costs a layer of who you are that the model is structurally unlikely to put back.
Asking the model to "include some Spanish" is not the same as writing it yourself. The Cornell team notes explicitly that it is "easy" to instruct an LLM to use Spanish — the constraint is not capability. But when prompted, the model tends to deploy Spanish vocabulary mechanically: italicized stock phrases like familia and abuela dropped in spots that read as decorative rather than lived. Authentic code-switching is conditioned on memory and context. AI Spanish is conditioned on the prompt.
The cost lands where the income cliff lands. Lower-income and first-generation students adopt AI on essays at higher rates per dollar of household income. Multilingual applicants disproportionately come from those same households. The default-monolingual behavior of AI ghostwriting hits hardest where the multilingual texture would have been most present.

This isn't an argument for never using AI. It's an argument for being deliberate about which parts of the drafting process you hand over.

How this is different from AI detection bias against ESL writers

Two AI-and-language problems are getting confused in the public conversation. They are not the same problem.

The other one — covered at AI Detection Tools Are Biased Against International Students — is the input side: a 2023 Stanford study found 61% of TOEFL essays written by non-native English speakers were misclassified as AI-generated by popular detectors, and Vanderbilt disabled Turnitin's AI detector entirely in part because of that bias. ESL applicants write authentic prose; detectors flag it as machine-generated because the statistical features of careful second-language English overlap with the features of generic AI text.

This post is about the output side: multilingual applicants who let ChatGPT draft for them lose the multilingual texture of their own voice, because the model's default English doesn't reach for code-switching even when one in five real peers in the same pool did.

Both are real. Both compound the cost of multilingualism in admissions. A bilingual applicant can be hit by both in the same cycle: AI-erased on the way in, AI-flagged on the way back out.

Can you instruct ChatGPT to include Spanish?

Yes. The Cornell paper is explicit: the finding is not that LLMs can't output Spanish, only that they don't by default. Prompt GPT-4 with "include the Spanish phrase my grandmother used to say to me, mija, in this paragraph," and it will. Two qualifiers matter.

Most users don't think to do that. The whole reason Cornell tested "out-of-the-box" prompts is that out-of-the-box is how most people use these tools. Students typing prompts into ChatGPT for the first time are not loading them up with linguistic style instructions. The default-behavior gap is the gap that matters in the population of actual users.

Even when explicitly prompted, the model deploys home-language vocabulary in stylized ways. This connects to the finding we cover in why "I'm first-gen Latina" doesn't make ChatGPT sound like you. When the same Cornell team identity-prompted 8 frontier LLMs with race, gender, and first-generation status, the resulting essays did not move meaningfully closer to how real applicants from those identities write — and for the Black applicant subgroup, identity-prompted essays were further from real Black applicants' writing than the default output (t = 2.327, p = 0.020). Prompted Spanish vocabulary tends to read the same way: as performance rather than texture.

You can write in Spanish and English in the same paragraph. The question is whether AI ghostwriting can do it for you authentically. The empirical answer, in 2024 and 2026, is: not without you, and not very well even with you.

Caveats

Every caveat on the table.

The UC data is from 2016–2017 — pre-ChatGPT. The human baseline reflects pre-AI applicant behavior; the "0% of AI essays" number is from out-of-the-box generations on the same prompts. Both halves are clean, but neither half is "what 2026 applicants do today."
Out-of-the-box, no demographic priming. Frontier 2026 models, if explicitly prompted to reflect a bilingual voice, may behave differently. The finding is about default behavior, not capability.
UC sample is Latinx-only, in-state. Roughly 36,000 applicants from one state, one cycle, one system. The pattern likely generalizes to Asian-American students who code-switch in Mandarin or Tagalog or Korean and to other multilingual applicants, but the paper does not test that directly.
2016–2017 essay prompts may differ slightly from today's UC prompts. Genre conventions are similar; specific prompts have evolved.
LIWC + descriptive comparison, not a causal experiment. The paper does not run an A/B test of "what if we instructed the LLM to include Spanish?"
Same Cornell team as three of our four papers. Alvero, Lee, Kizilcec, and Joachims appear on Papers 1, 2, and 4. Stanford's Antonio joins this one. This is one lab's program of work, not independent corroboration. The field needs replication from other groups.
Only GPT-3.5 and GPT-4 were tested. Claude, Gemini, Mistral, and Llama variants were tested in the Jan 2026 paper but not on the Spanish-language question specifically. The "0%" number is specific to GPT-3.5 and GPT-4.

None of these unmake the finding. They constrain how far it should be generalized.

What we'll update next

The literature on multilingual default behavior in LLM-generated essays is thin. We are tracking a few specific questions:

Does the finding hold for 2026 frontier models? GPT-4o, Claude Sonnet 4.5, and Gemini 2.5 may default differently — or may not. The relevant test is unchanged: out-of-the-box UC prompts, count the Spanish.
Does explicit multilingual prompting close the gap, or produce stylized, decorative Spanish? The Cornell identity-prompting result suggests the latter, but a direct test on Spanish specifically would be useful.
Do other home languages show the same pattern? Mandarin, Tagalog, Vietnamese, Hindi, Yoruba, Haitian Creole, Arabic. The pattern likely generalizes; the data does not exist yet. AAVE, Spanglish, Chicano English and other US English varieties belong in the same family of questions.
Does iterative AI-assisted drafting (paste, edit, regenerate) preserve more multilingual texture than one-shot generation? Real students don't one-shot. They iterate.

If you are a researcher working on any of these, we would love to read your draft. Email is in the footer.

The bottom line

One in five real Latinx UC applicants in 2016–2017 chose to put some Spanish in their college essays. Asked to write the same essays on the same prompts, GPT-3.5 and GPT-4 chose Spanish zero times out of roughly 26,000 generations. The model was reaching for the register it knows — and that register, in the modern US college essay, is monolingual standard English.

If you are bilingual and code-switching is part of how you actually write about your own life, AI ghostwriting will quietly remove that. For the broader pattern this finding sits inside — AI essays tracking male, continuing-generation, and high-economic-connectedness applicants — see the broader-style companion piece and our cross-study research synthesis.

Write the way you remember things. Including the words your grandmother used.

Review Your Statement

Check your visa statement or motivation letter before submitting.

20% of UC Applicants Wrote in Spanish. ChatGPT Wrote Zero.

20% of UC Applicants Wrote in Spanish. ChatGPT Wrote Zero.

What the researchers actually did

What 1 in 5 real applicants chose to do

Why ChatGPT defaulted to monolingual English

What this means for bilingual applicants

How this is different from AI detection bias against ESL writers

Can you instruct ChatGPT to include Spanish?

Caveats

What we'll update next

The bottom line

Review Your Statement

Related Articles

AI College Essays Sound Male and Privileged - Cornell Study

Telling ChatGPT Your Identity Won't Make Essays Sound Like You

Who Uses ChatGPT for College Essays? Not the Poorest Kids

Your Visa Statement Deserves a Second Look