Home / Guides / Do AI Detectors Work

Do AI Detectors Actually Work?

Unreliable accuracy, high false positive rates, and easy evasion: here is the honest picture.

AI writing detectors are tools that claim to tell whether a piece of text was written by a human or generated by a model like ChatGPT. The short answer to whether they work is: not well enough to trust. They produce false positives that flag innocent writers, they can be evaded with minimal effort, and even OpenAI shut down its own classifier in 2023 after concluding it was not accurate enough to be useful. Using a detector as the primary evidence for an academic or professional decision is a mistake.

That is the summary. The rest of this piece explains why, and what to do instead.

How detectors estimate AI authorship

Most commercial AI detectors work by measuring two statistical properties of text: perplexity and burstiness.

Perplexity is a measure of how surprising a sequence of words is to a language model. AI-generated text tends to pick the most probable next word at each step, which makes it statistically predictable, or low-perplexity. Human writing is messier. People reach for unusual word combinations, change direction mid-sentence, and make choices no probability distribution would call optimal. A detector reads a passage, runs it against its own language model, and flags text that looks too predictable as a candidate for AI authorship.

Burstiness refers to variation in sentence structure and length. Human writers tend to alternate between long, complicated sentences and short ones. AI models, trained to produce smooth, coherent output, tend toward more uniform sentence length and rhythm. Low burstiness, combined with low perplexity, pushes a detector's score toward AI-generated.

The problem is that both metrics are proxies. They describe statistical patterns that correlate with AI output in aggregate, but those same patterns appear in human writing all the time, particularly in formal or technical contexts, and in writing by people who are not native English speakers.

Why accuracy is shaky

Detectors are trained on text samples, which means they perform reasonably on text similar to their training data and degrade on anything else. As AI models improve, their output becomes more varied and harder to distinguish statistically. The training data problem cuts both ways: newer AI models produce text with higher perplexity than older ones, which means a detector calibrated on GPT-3 output will miss a lot of GPT-4 or Claude output.

Independent evaluations have not been kind. A 2023 study by researchers at the University of Maryland tested several leading detectors and found that simple paraphrasing reduced detection rates dramatically, even when the underlying content remained AI-generated. The detectors were sensitive to surface-level wording, not to the underlying authorship signal they were supposed to measure.

The field has not produced a tool with independently validated accuracy consistently above roughly 80 percent on real-world samples, and most fall below that. An 80 percent accuracy rate sounds tolerable until you consider that it means one in five judgments is wrong. In a classroom of 30 students submitting 30 essays, that could mean six incorrect flags or misses per assignment.

The false positive problem

A false positive is when a detector flags human-written text as AI-generated. This is arguably the most serious practical problem with current detectors.

Non-native English speakers are disproportionately at risk. Their writing often favors shorter, simpler, more syntactically predictable sentences, not because a model wrote them, but because writing in a second or third language tends toward constructions the writer is confident in rather than ones that feel natural. That pattern overlaps with what detectors score as low perplexity. The result is that a student writing in English as a foreign language can generate a high AI-probability score on entirely their own work.

Formal and bureaucratic writing has the same problem. Legal boilerplate, medical documentation, government reports, and certain academic styles are all low-perplexity by design. They use standardized phrasing because standardized phrasing reduces ambiguity. Detectors do not know the difference between a form letter and a chatbot output.

There are documented cases of students receiving academic misconduct allegations based primarily on detector output. In several instances, those allegations were later dropped after closer review. The detectors had flagged real work as AI-generated, and the burden of proof had been placed on the student to disprove it.

OpenAI shut down its own classifier

In January 2023, OpenAI released an AI Text Classifier intended to help educators and others identify AI-generated content. In July 2023, OpenAI quietly shut it down. The reason given on their website was low accuracy. OpenAI noted that the tool correctly identified AI-written text only about 26 percent of the time and flagged human-written text as AI-generated roughly 9 percent of the time, which they concluded made it more misleading than useful.

OpenAI has not released a replacement classifier. That decision, from the organization that built the models the detectors are trying to catch, says something about the difficulty of the problem. If reliably detecting AI output were tractable, OpenAI had every incentive to solve it.

Source: openai.com announcement (January 2023) and subsequent retirement notice (July 2023).

How easily detection is evaded

Evasion requires almost no effort. Asking an AI model to paraphrase its own output, running the text through a rewriting tool, or simply editing a few sentences manually is often enough to drop a detector's confidence score below whatever threshold the platform flags. Studies on adversarial attacks against detectors have shown evasion rates above 90 percent with minimal intervention.

This creates a perverse outcome. A student who uses AI extensively but knows to lightly edit the result may pass undetected. A student who writes their own work in a formal style, or who is writing in a second language, may get flagged. The detector is more likely to catch the careless or naive user than the one who is deliberately misusing it.

Watermarking, which involves embedding statistical patterns in model output at generation time, is a more promising long-term approach. Researchers at the University of Maryland published work on watermarking methods in 2023, and it is an active area of research. But watermarking requires cooperation from the model providers, and it does not address text that was generated before a watermark system was in place or by a model that does not implement one.

What teachers, students, and editors should do instead

The practical answer is to shift from detection to process.

For teachers: require visible process alongside the final work. Drafts, outlines, annotated bibliographies, and revision notes are harder to fake wholesale than a polished final submission. Hold brief follow-up conversations about the work. A student who wrote something can discuss it. An assignment designed around specific local context, personal experience, or in-class components gives AI output much less room to be useful. These methods do not produce a score, but they are more reliable than a detector that can wrong a student who did nothing wrong.

For students: understand that a detector result is not evidence of wrongdoing, and if you are flagged, push back. Ask for the specific score and methodology. Provide drafts, notes, and your writing process as context. Academic integrity processes that rely solely on a detector score without corroborating evidence are on shaky ground, and in many jurisdictions, professional associations for educators have issued guidance discouraging that approach.

For editors and publishers: treat detector output as a weak signal at most, not a verdict. If AI use is a concern, ask authors directly. Build disclosure expectations into your editorial process. A policy that requires disclosure is more effective than a scanner that guesses.

For a look at the AI writing tools themselves, including what they are and are not good at, see our best AI writing tools guide. If you are looking at tools specifically for academic or educational contexts, our best AI for students and best AI for teachers roundups cover the actual tool verdicts.

FAQ

Are AI detectors accurate?

Not reliably. Current detectors produce meaningful false positive and false negative rates. A 2023 study by researchers at the University of Maryland found that detectors frequently misclassify human writing as AI-generated, and accuracy degrades further when the text is paraphrased or lightly edited. No detector has published independent validation showing consistent accuracy above roughly 80 percent, and most fall short of that on real-world samples.

Can AI detectors be wrong?

Yes, and they are wrong in both directions. They flag human writing as AI-generated (false positives) and miss AI-generated writing that has been lightly paraphrased (false negatives). Non-native English speakers are disproportionately flagged because their writing tends toward simpler, more predictable sentence structures, which detectors score as low perplexity, a signal associated with AI output.

Did OpenAI have an AI detector?

Yes. OpenAI released an AI Text Classifier in January 2023 and shut it down in July 2023, citing low accuracy. The announcement was posted on OpenAI's own website. OpenAI has not released a replacement, which is a meaningful signal about the difficulty of the problem from the organization best positioned to solve it.

How can teachers handle AI writing without a detector?

Process over detection. Require drafts, outlines, and revision notes alongside the final submission. Hold short follow-up conversations about the work, since students who wrote something can discuss it. Design assignments around personal experience, local context, or in-class conditions that make generic AI output less useful. These approaches do not produce a score, but they are more reliable than a detector that can be wrong about innocent students.

MV
About the author
Marcus Vance
AI & Productivity Writer, Encore Editorial

Marcus Vance reviews AI tools for Encore Editorial. He has tested dozens of assistants and editors, and is hard to impress.

We use cookies for analytics and ads. See our Privacy Policy.