SLOP OR NOT

A game and research project measuring how well humans detect AI-generated text.

Why I Built This

I'm Vignesh, a software engineer based in Seattle. I've been studying AI-to-AI and AI-to-human interactions — how language models behave in the wild, how humans adapt to their presence, and what happens to online discourse when a growing share of it is machine-generated.

The question that kept nagging me: can people actually tell the difference? Not in a lab with cherry-picked examples, but with real internet text — Reddit arguments, Hacker News hot takes, Yelp reviews — matched against modern models trying their best to blend in. Slop or Not is an attempt to answer that at scale, and to make the process fun enough that thousands of people will contribute data without it feeling like a survey.

What is this?

You see two responses to the same prompt. One was written by a real person. The other was generated by an AI model. Your job is to figure out which is which.

Three wrong guesses and you're out. As you get better, the AI gets harder to spot. Every guess you make contributes to a research dataset on human AI-detection ability.

Data Sources

All human text comes from three real-world platforms:

Reddit — Comments from 10 popular subreddits (AskReddit, explainlikeimfive, personalfinance, etc.). Filtered from ~86M comments for quality, length, and substance.
Hacker News — Top-level comments on popular stories. Filtered from ~33M comments. Tends to be more technical and opinionated.
Yelp — Business reviews from the Yelp Open Dataset (~7M reviews). Everyday language about real experiences.

Comments are filtered by length (40-300 words), quality signals (score/upvotes), and deduplication. The final dataset contains ~13,000 human-AI pairs.

AI Models

Each human text gets 6 AI counterparts — 2 providers at 3 difficulty tiers. The game picks one based on your current difficulty level. As you streak, difficulty ramps up.

Tier	Anthropic	OpenAI
Easy	Haiku 4.5	GPT-4.1 Nano
Medium	Sonnet 4	GPT-4.1 Mini
Hard	Sonnet 4.6	GPT-5.4

Easy-tier models are smaller and cheaper — their text tends to be more formulaic. Hard-tier models are state-of-the-art — much harder to distinguish from humans.

Prompt Methodology

Each AI model receives the same base prompt tailored to the source platform. For example, Reddit prompts say “You are writing a response on Reddit. Match the tone and format typical of Reddit.” Yelp prompts ask for a review in Yelp's style.

Models are instructed to match the approximate length of the human text and avoid markdown, headers, or bullet points. The goal is to produce text that's indistinguishable in format — so the only signal is substance and style.

Prompt templates are hashed (SHA-256) and stored alongside every AI response for reproducibility. AI responses are generated via batch APIs where available (Anthropic, OpenAI) for cost efficiency.

Scoring System

You start at Easy difficulty. Get 3 correct in a row and you advance to the next tier. Get one wrong and you drop back down. Three total wrong answers end the game.

After the game, you see your accuracy, best streak, and a round-by-round breakdown showing which model fooled you (or didn't). An optional demographic survey helps us analyze detection ability across different populations.

Research Goals

Every vote is recorded (anonymously) along with the model shown, difficulty tier, response time, and streak. This produces a clean dataset for studying:

Which AI models are hardest/easiest to detect
How detection accuracy changes with practice (per-round curves)
Whether source platform (Reddit vs HN vs Yelp) affects detectability
Demographic factors in AI detection ability

The dataset will be released on Hugging Face once we have enough data for meaningful analysis.

Privacy

No accounts. No login. An anonymous cookie tracks your session so the game works. We record your country/city (from IP geolocation) for aggregate analytics — never your IP address in research exports. The optional survey is entirely voluntary.

Built With

Next.js, TypeScript, Tailwind CSS, Framer Motion, Drizzle ORM, PostgreSQL. Deployed on Railway.

Built by @_vgnsh