Requisite Variety Index

A benchmark measuring the diversity and variance of AI-generated text. Human writers are erratic. AI is smooth. The variance is the signal.

An anti-RLHF metric for prose authenticity

The Problem with AI Prose

RLHF-trained language models produce text that is competent, coherent—and suspiciously uniform. Human writers vary wildly: Melville writes 60-word sentences, Carroll writes 10-word ones. AI clusters in a narrow band. The Requisite Variety Index quantifies this difference.

Burstiness

The coefficient of variation in sentence length. Human prose is "bursty"—short punchy sentences followed by sprawling complex ones. AI maintains steady rhythm.

burstiness = σ(lengths) / μ(lengths)

Vocabulary Richness

Type-Token Ratio measures unique words per total words. Higher = richer vocabulary. Hapax ratio tracks words used only once—a hallmark of natural writing.

TTR = unique_words / total_words

Bigram Entropy

How unpredictable are word-to-word transitions? Human prose surprises; AI follows well-worn paths. Higher entropy = less predictable.

H = -Σ p(bigram) × log₂(p(bigram))

RVI Score

A composite index combining burstiness, vocabulary richness, and entropy into a single 0–100 score. Higher = more human-like variance.

RVI = weighted_composite(metrics)

The Evidence

We analyzed 319 chunks from 25 sources: 14 Project Gutenberg texts, one contemporary encyclical, 8 default AI-generated passages, and 2 experimental AI passages prompted for variance. The separation is stark.

Seismograph comparison of sentence lengths
Figure 1: Sentence-by-sentence word counts. Human writing (Melville) shows dramatic variation; AI maintains consistent rhythm.
Scatter plot of burstiness vs vocabulary richness across 25 sources
Figure 2: Burstiness vs vocabulary richness across 25 sources. Default AI clusters in a tight band; human authors scatter widely. Experimental AI breaks out of the cluster.
RVI score distribution across human, default AI, and experimental AI
Figure 3: RVI score distribution. Default AI clusters tightly at 37–39. Human writing spans a wide range. Experimental AI, prompted for variance, scores in the human range.
RVI score ranked by source
Figure 4: RVI by source, ranked. Default AI clusters at 37–39. Adam Smith’s Wealth of Nations scores below the AI cluster at 33—formal economic prose with less variance than RLHF-trained output.

Model Leaderboard

Aggregated metrics by source. Click column headers to sort. Higher RVI indicates more human-like textual variance.

Loading data...

Test Your Text

Paste any text to analyze its variance metrics. Longer texts are chunked into 500-word segments (matching the baseline methodology) and metrics are averaged across chunks. Text is sent to the server for analysis and is not stored.

* Magnifica Humanitas, Pope Leo XIV's first encyclical (May 2026)

Analysis Results

RVI Score
Burstiness
Type-Token Ratio
Hapax Ratio
Bigram Entropy
Word Distribution Entropy
Mean Sentence Length
Sentence Count

Note: Scores may differ by 2-4 points from leaderboard values due to different chunk sampling methods. The leaderboard uses evenly-spaced samples; this analyzer uses consecutive chunks.

Full Document View

Per-chunk metrics across the document. Shows variance within the text, not just the average.

Burstiness
TTR
Bigram Entropy (normalized)

Cherry Pick Detector

Sliding window analysis (~125 words). Shows which sections would flag as AI-like vs human-like.

← AI-like (RVI < 45) Ambiguous Human-like (RVI > 65) →
Why this matters: Every sufficiently long human text contains sections that would be flagged as "AI-generated" by paragraph-level detectors. This is why tools like Turnitin produce false positives — they cherry-pick low-variance sections and ignore the document's overall diversity.