Skip to main content

Telling ChatGPT Your Identity Won't Make Essays Sound Like You

Cornell tested 8 LLMs: telling ChatGPT your race or first-gen status doesn't make essays sound like you. For Black applicants, it actively backfired.

Nirmal Thacker, CS, Georgia Tech · Cerebras Systems AIMay 8, 202612 min read
Free Essay ReviewAI detection + scoring

Telling ChatGPT Your Identity Won't Make Essays Sound Like You

The most common workaround students share for "making ChatGPT sound like them" is to load up the prompt with identity context: race, gender, first-generation status, hometown, family background. The intuition is reasonable. If the model knows who you are, surely it can write in your voice. A Cornell team tested that intuition directly across 8 large language models and 30,000 real Common App essays — and found that for the Black applicant subgroup, identity-prompted essays were further from how real Black applicants actually write than the default un-prompted output (t = 2.327, p = 0.020) (Cornell, Jan 2026).

The strategy doesn't just fail to deliver. For at least one demographic group, it backfires.

Updated: May 10, 2026. We revise this post as new research emerges.

This post is a research-driven explanation of why identity prompting fails. If you want a side-by-side comparison of ChatGPT output against successful student essays, read ChatGPT vs Real College Essays. For the lexical companion piece on which words betray AI authorship, see the words that fingerprint AI essays. For the full cross-study synthesis on AI in admissions research, see our pillar on the major studies.

What students try (and why it makes intuitive sense)

The pitch you'll see in TikToks and Reddit threads goes something like this:

"Don't just ask ChatGPT to write your Common App essay. Tell it who you are. 'I am a Black, first-generation immigrant student from Houston applying to engineering programs.' Then it'll write like you."

This is a steerability hypothesis — the belief that LLMs can be aimed at a specific writing voice by feeding them demographic labels. It's not crazy. Modern LLMs do respond to system prompts and persona instructions. They will switch registers. They will adopt different vocabularies. If you ask GPT-4o to write "in the style of a 17-year-old who grew up in a Vietnamese household in San Jose," the output will look surface-level different from the default "write a college essay about a challenge you overcame."

So students try it. The output sounds different. They feel like they've personalized the essay.

The Cornell team built the experiment to test exactly this — not whether identity prompting changes the output (it does, a little), but whether the change moves the output toward how real students from those identities actually write.

What Cornell found

The team behind the Jan 2026 arXiv paper paired 30,000 real Common App essays from a single highly selective US engineering school with 87,696 synthetic essays from 8 LLMs (GPT-4o, GPT-4o-mini, Mistral Large, Mistral Nemo, Claude Sonnet, Claude Haiku, Llama 3.1 70B, Llama 3 8B) (Cornell, Jan 2026). For the identity-prompting test, they generated essays under two conditions: a default prompt with no demographic context, and an identity-prompted version that supplied the model with the applicant's self-reported race, gender, and first-generation status.

Two findings, both worth pausing on.

First, the change in output is small. A classifier trying to separate identity-prompted essays from default LLM output performs at F1 = 0.816. For comparison, a classifier separating any LLM essay from a real human essay scores F1 = 0.998 with T5 and 0.999 with TF-IDF (Cornell, Jan 2026). The identity-prompted version is meaningfully different from the default — but the gap between the two ChatGPT modes is much smaller than the gap between either ChatGPT mode and a real student. Identity prompting is a small dial, not a re-write.

Second, for Black applicants, the small dial pointed the wrong way. When the team measured cosine similarity between identity-prompted LLM essays and real essays from the same demographic group, the Black subgroup showed a statistically significant decrease in alignment compared to default un-prompted output (t = 2.327, p = 0.020). In plain English: telling the model "I am a Black student" produced essays that read less like real Black applicants' essays than the default version did.

This is one statistical test from one school's applicant pool, and we'll come back to that caveat. But the direction is the opposite of what the workaround promises.

Why identity prompting fails — three reasons

1. The model uses demographic words as decoration

When you tell an LLM "I am a first-generation immigrant student," it doesn't reach for the texture of that experience. It reaches for the vocabulary of that experience. Identity-prompted essays in the Cornell sample over-used demographic terms verbatim — words like "parent," "Asian," "first-generation," and "immigrant" appeared at much higher rates than in real student essays from those groups (Cornell, Jan 2026).

Real applicants from those demographic groups don't talk about themselves with the labels. They write about their grandmother's hands kneading dough at 5 a.m., the specific intersection where the family auto-shop was, the SAT prep book they shared with a cousin because the family could only buy one. The label is implicit in the scene. The LLM can't infer the scene from the label, so it just types the label more often.

2. The model writes a stereotype of the identity, not a person from it

LLMs are trained on text about demographic groups, much of which is media coverage, opinion writing, advocacy material, and other LLM-written content. When prompted with an identity, the model retrieves the average textual representation of that identity in its training corpus — which is closer to a generic essay-style portrayal than to how an actual 17-year-old in that group writes their Common App.

The result reads like a thoughtful op-ed about a first-gen Latina student. It doesn't read like the student herself.

3. The abstract-prompt-keyword problem doesn't go away

LLM essays disproportionately favor abstract prompt-keywords — "challenge," "growth," "journey," "resilience" — while real students lean on temporal and personal words like "year," "time," "friend," and "would" (Cornell, Jan 2026). Identity prompting doesn't fix this. It just staples demographic adjectives onto the same generic scaffolding: "this challenge as a first-generation student," "my journey as an immigrant," "the resilience my parents modeled."

That's not a personal essay. It's a default LLM essay with extra adjectives. We covered this lexical pattern in detail in the AI essay vocabulary fingerprint piece.

Why race and gender prompting can backfire harder than you'd think

Here's where we have to be careful. The Cornell team measured a statistically significant decrease in alignment for the Black subgroup when identity prompting was applied (t = 2.327, p = 0.020). The Black-applicant identity-prompted essays were further from the centroid of real Black-applicant writing than the default LLM output was.

The authors are cautious about how to interpret this, and we should be too. A few important framing points:

  • It's one test on one school's pool. This is the same single highly-selective engineering school used throughout the paper. We don't know if the same direction would hold at a liberal-arts college, an HBCU, or in a graduate-school applicant pool.
  • It's not a finding that "AI is racist." What it suggests is that LLMs hold a particular textual representation of demographic identity — likely shaped by which texts about which groups dominate the training corpus — and that representation is further from how real young people in some groups actually write.
  • The effect size is small (p = 0.020 is meaningful but not dramatic), and the Cornell team emphasizes that LLM-to-LLM cosine similarity is 0.952–0.957 vs. human-to-human at 0.882–0.889 — meaning all LLM output is much more uniform than humans regardless of prompting (Cornell, Jan 2026). The identity-prompting backfire is happening inside an already very narrow output distribution.

The honest takeaway: identity prompting is not a reliable way to inject your voice into an LLM essay, and for at least one tested subgroup it can move the output in the wrong direction. The mechanism is most plausibly that LLMs write a stereotype of demographic identity — drawn from their training data's textual portrayal of that group — rather than channeling how individuals in that group actually write about themselves.

What actually does sound like you

If the prompt-with-labels approach doesn't work, what does?

The pattern that survives in real student essays is specificity that an LLM can't fabricate. Concrete details about your particular life that aren't predictable from any demographic profile. Some examples of what to put in your essay (or your prompt, if you're using AI to brainstorm):

  • The exact scene, with sensory detail. Not "I grew up in a working-class household." Instead: the smell of the laundromat where your mom worked second shift, the sound of the dryer cycle that became your homework timer.
  • The names of specific people who shaped you. Mrs. Velazquez, the fifth-grade teacher who let you read ahead. Coach Park, who told you the truth about your shot in a way no one else would. Your cousin Hassan who taught you Python on a hand-me-down ThinkPad.
  • Dialogue you actually remember. The exact phrase your grandfather said when you told him you were applying to engineering school, in the language he said it in.
  • Mistakes, with specifics. Not "I learned from failure." Instead: the specific grant proposal you botched, the exact sentence in your reviewer feedback that gutted you, what you did the next morning.
  • Numbers that aren't round. "I ran the after-school tutoring program for 14 months and we went from 6 students to 31" carries more weight than "for over a year, dramatically increasing enrollment."

Specificity is the one thing AI can't fake. The model has no access to the smell of your kitchen at 6 a.m., the name of your soccer coach, the actual paragraph your AP Lit teacher wrote in the margin. This is also why dumbcrafting your essay to sound less polished is the wrong move — the goal isn't to write worse, it's to write more specifically.

What this means for the "AI as ghostwriter" workflow

Identity prompting is the workaround students were taught. It's the answer to "how do I make ChatGPT sound like me." The Cornell data says the workaround doesn't deliver — and at scale, may make output further from authentic voice for some groups.

The implication is uncomfortable for the AI-as-ghostwriter pitch: the personal essay form requires what AI structurally can't supply. Not because LLMs aren't sophisticated enough yet, but because the form's purpose is to put a specific person on the page, and the model has no access to that person beyond what the prompt provides. Demographic labels don't give it that access. Even iterative back-and-forth gives it only what the student types — which means a student who could type the specifics into the prompt could just write the essay.

This is the same conclusion the Foundry10 experiment reached from a different angle: when readers were told an essay had ChatGPT help (the same paragraph, three different vignettes), authenticity ratings dropped from 3.98 (no help) to 3.09 (ChatGPT) on a 5-point scale, p < 0.001 (Foundry10, July 2024). We dig into the reader-side evidence in how admissions officers evaluate AI-assisted essays.

The two findings reinforce each other from opposite ends of the pipeline. The writing side: AI can't produce the specific texture of a person's life, even when prompted with their identity labels. The reading side: when readers detect AI involvement, the essay loses authenticity even if the content is identical. There's no version of the AI-ghostwriter workflow that survives both pressures.

A few caveats before you generalize

We've been careful to flag these in line, but they're worth pulling together:

  1. Single-school sample. The Cornell paper analyzed essays from one highly selective US engineering school, 2019–2023 cycles. Findings may not generalize to liberal-arts colleges, less selective schools, or graduate programs.
  2. One-shot prompting only. The study tested single-prompt LLM generation, not how students iteratively co-write with AI in real life. Iterative editing might reduce some of the surface tells — though the underlying problem (AI has no access to your specific life) persists.
  3. 8 LLMs as of mid-2024 training cutoffs. Mostly GPT-4o, Claude 3.5 Sonnet, and contemporaries. Frontier model quality has improved since, and longer-context models with the ability to process student writing samples may do better at voice mimicry.
  4. The Black-subgroup finding is one statistical test, p = 0.020. Meaningful, but not a sweeping claim about racial bias in LLMs. The honest reading is "identity prompting writes a stereotype of identity rather than an instance of it, and for at least one tested group, that stereotype is measurably off."

What we'll update next

We're tracking three open questions and will revise this post as the evidence comes in:

  • Does long-context personalization change the result? If a student feeds an LLM 10,000 words of their own prior writing as context (journal entries, school essays, college-app brainstorms), does the model start producing output that aligns with real-applicant centroids? Replicating Cornell's design with that setup is the obvious next experiment.
  • Do frontier models trained after 2025 show the same demographic-stereotype pattern? Newer models with more diverse training data and better RLHF may handle identity prompting differently. We'll re-run the F1 = 0.816 separability test as new model snapshots become available.
  • Does the Black-subgroup directional finding replicate? A single significance test at p = 0.020 needs replication across applicant pools and schools. We're watching for any independent team to attempt the same analysis on a different essay corpus.

The bottom line

Identity prompting is the workaround you were probably taught. It moves the output a little, but in the directions that don't matter — and in at least one case, in the wrong direction entirely. The Cornell data is clear that LLM output is much more uniform than human writing across the board (cosine 0.952–0.957 vs. 0.882–0.889) and that demographic labels in the prompt don't crack that uniformity (Cornell, Jan 2026). They just add demographic vocabulary to the same generic scaffolding.

If you want your essay to sound like you, the path isn't through the prompt. It's through the specifics only you have access to — the names, the scenes, the dialogue, the smell of your kitchen at 6 a.m. That's the part AI can't fake, and it's also the part admissions readers are looking for.


See also: ChatGPT vs Real College Essays · The Words That Make a College Essay Sound AI-Written · The Dumbcrafting Epidemic · What the Research Says About AI in College Admissions

Quick AI Check

See if your essay will pass university AI detection in seconds.

Related Articles

Your Essay Deserves a Second Look

Professional AI detection and comprehensive scoring before you submit

No credit card required