# Methodology

> How the Verasight Data Library is built, how featured topics are assembled, and how source reports remain canonical.

The Data Library presents key findings from Verasight survey research. Source reports remain canonical at https://www.verasight.io/reports.

## Library building blocks

- Category: the top-level grouping for public navigation.
- Featured topic: a key-finding page built from multiple supporting questions.
- Indexed question: a canonical question not absorbed into a published featured topic.
- Supporting data: the underlying questions, crosstabs, methodology notes, and report links that justify a featured topic.

## Pipeline stages

- 01 Ingest: raw Verasight files become long-format intermediate tables with metadata sidecars.
- 02 Normalize: null handling, PII scrub, demographic normalization, and respondent context propagation.
- 03 Weight: row-level weight validation and reusable weighted summaries.
- 04 Crosstab: banner and extra demographic breakdowns with low-N flags and canonical dimension names.
- 05 Enrich: questions package into a wave-scoped bundle with toplines, crosstabs, methodology, citation, and slug.
- 06 Emit: canonical question JSON, long-format per-question CSV, per-wave summary, and a site-wide index.

## Publication model

- Canonical layer: one JSON and one CSV per question, plus per-wave and site-wide summaries.
- Curation layer: category mappings, topic proposals, and curated featured topics commit on top of the canonical layer.
- Publication layer: the site reads committed artifacts. Published featured topics drive home, category, and search surfaces.
- Public URL layer: featured topics and indexed questions publish at category-scoped routes shaped as /[category]/[public-title-slug].

## Output quality gates

- Featured topics must read as clear multi-question key-finding pages, not source-question restatements.
- Indexed questions must use a natural title and one or two reading paragraphs grounded in the question text and measured responses.
- The hard audit blocks generic standalone copy, missing standalone summaries, broken category-scoped routes, and no-ship generated phrasing.
- The review audit adds a per-wave report for suspicious titles, suspicious body copy, duplicate standalone titles, and source-wave coverage.
- Dynamic categories discovered from new data must ship with a category description before the site build can complete.

## Source policy

- Canonical destination: the upstream report at https://www.verasight.io/reports.
- Question-level citation: canonical questions link back to the relevant report anchor when available.
- Raw data on site: V1 does not surface raw downloads on-site, though the pipeline emits per-question CSV artifacts.
- Transparency: methodology is presented under AAPOR transparency standards and attributed to Verasight upstream.

## Machine-readable surface

- llms.txt: [https://data.verasight.io/llms.txt](https://data.verasight.io/llms.txt)
- Home markdown mirror: [https://data.verasight.io/index.html.md](https://data.verasight.io/index.html.md)
- Categories overview: [https://data.verasight.io/categories.md](https://data.verasight.io/categories.md)
- Featured topics overview: [https://data.verasight.io/featured-topics.md](https://data.verasight.io/featured-topics.md)
- Indexed questions overview: [https://data.verasight.io/indexed-questions.md](https://data.verasight.io/indexed-questions.md)
- Structured output manifest: [https://data.verasight.io/structured-output.json](https://data.verasight.io/structured-output.json)