The Category Problem: Why Most Benchmarks Are Wrong — Friday Notes Audio
The Benchmark Citation Epidemic
Walk through any five Series A decks and you will find the same six benchmark reports cited in ways their authors never intended. Amplitude’s DAU/MAU study. The Bessemer Cloud Index. Andreessen Horowitz’s consumer app retention curves. Mary Meeker’s annual Internet trends. These are legitimate sources. They are widely misapplied.
The misapplication is almost never deliberate. It is the result of a founder searching for “SaaS DAU/MAU benchmark”, finding a number that looks reasonable, and citing it without reading the methodology section that specifies exactly which class of product the data covers.
“A benchmark is only as good as the category it describes. Most founders are citing the right number from the wrong study.”
How Category Mismatches Happen
The root cause is aggregation. Research firms publish benchmark reports that cover broad product categories because their clients need broad markets. “Mobile apps” is a reportable market. “B2B utility apps with daily notification prompts targeting compliance teams at companies with 50–500 employees” is your exact product but not a reportable category.
So you look at the “mobile apps” report and find a DAU/MAU ratio. Your product is a mobile app. The ratio looks defensible. The problem is that your product’s engagement pattern is nothing like the average across gaming apps, social apps, and consumer utility apps that make up the benchmark population.
The methodology section of every benchmark report defines the exact population of products included. This section is almost never read by the founders who cite the report’s headline figures. It is always read by the analysts who check those citations.
The Five Most Misused Benchmarks in Founder Decks
1. DAU/MAU ratios from broad “mobile app” reports. Gaming, social, and consumer utility apps systematically inflate this metric relative to B2B tools. The correct comparator population is B2B workflow tools with mandatory daily usage patterns.
2. Churn rates from “SaaS” benchmarks. SaaS churn varies by 15× between SMB and enterprise segments. A benchmark that does not segment by ACV is not a useful comparator for either segment.
3. CAC from “startup” surveys. CAC is stage-dependent, channel-dependent, and geography-dependent. A pooled startup CAC average from a survey that includes both Stripe and a 3-person B2B vertical SaaS is meaningless for your model.
4. NPS scores from consumer benchmarks. Consumer and B2B NPS are not on the same scale. A 40 NPS is exceptional in B2B and mediocre in consumer. Citing the wrong baseline turns a strength into a liability.
5. TAM figures from analyst reports that aggregate adjacent markets. Market sizing reports frequently combine markets that are not actually accessible to your product. The definition of “Total Addressable Market” in an analyst report is often total global spend across a broad sector, not the serviceable market for your specific product at your price point.
How to Find the Right Benchmark
The correct approach starts with the methodology section, not the headline figure. Before citing any benchmark, answer four questions:
Who was surveyed? What is the sample size, and how were participants recruited? A survey of 40 founders is not a benchmark. A panel study of 3,000 products over 12 months with defined selection criteria is.
What product category is included? Does the report define its product population in a way that includes your exact type of product? If it covers “B2B SaaS” without segmenting by vertical, ACV, or deployment model, it may not be representative of your position.
What year is the data from? The publication date of the report and the data collection date are different. Check when the underlying data was collected, not when the PDF was published.
What are the confidence intervals? Headline figures in benchmark reports often obscure wide distributions. A reported “median of 35%” DAU/MAU might have a 10th–90th percentile range of 12%–68%. Knowing where you sit in the distribution matters more than the median.
Get the Friday Notes dispatch
Intelligence on pre-launch verification, AI governance, and what we’ve shipped. Read by 2,400+ founders.
Recency: The Other Problem Nobody Talks About
Category mismatch gets most of the attention. Recency misalignment is just as dangerous and less discussed.
Markets move. A SaaS churn benchmark from 2022 was published into a market with different competitive dynamics, different interest rate expectations, and different buyer behaviour than the market in 2026. In AI infrastructure, the entire landscape changed between Q1 2023 and Q3 2024. Any benchmark from before that period is measuring a different market.
18 months is the maximum defensible recency window for fast-moving markets. For AI, developer tooling, and fintech compliance, the window may be shorter. If a benchmark is older than this, either replace it with a more recent source or acknowledge the gap explicitly and explain why the older data is still directionally relevant.
A Framework for Defensible Benchmarks
A defensible benchmark satisfies five criteria: (1) published by a research firm or institutional source with a defined methodology; (2) based on a sample that includes your exact product sub-type; (3) collected within 18 months of your raise date; (4) cited from the primary document, not a secondary summary; and (5) used to represent a metric your product actually measures in the same way the research defines it.
- Always read the methodology section before citing a benchmark’s headline figure
- Confirm the sample population matches your exact product sub-type, not just the parent category
- 18 months is the maximum defensible recency window for fast-moving markets
- Cite primary sources; secondary summaries often misquote or recontextualise the original data
- Know where you sit in the distribution — the median is less useful than your percentile
In 64% of decks reviewed in Q1 2026, the benchmark used for DAU/MAU or retention was sourced from a product category that did not match the founder’s product. In most cases, the mismatched benchmark made the metric look better than the correct comparator would have. This gap is what investors notice when they check.