How the occupation atlas is built

What is observed, what is estimated, and how the offline pipeline turns both into the dataset the atlas ships. Read the caveats before leaning on the treemap.

Occupations

832

Employment

170M

Weighted replacement

4.1/10

High-exposure wage bill

$1.1T

Method layer

Observed labor spine

Occupation hierarchy, employment, wages, education, outlook, openings, and industry context come from BLS snapshot files baked into this build.

Full BLS line-item occupation snapshot with 832 occupations.
Salary bands come from OEWS p10/p25/p50/p75/p90 values when available; missing values remain null.

Method layer

Estimated economic size

Treemap area is not a direct GDP-by-occupation table. It is an allocation from industry-level output into occupations using the offline generation pipeline.

Occupation GDP share is estimated by allocating BEA 2024 value added across BLS 4-digit industries using BEA sector, summary, and underlying-summary industry buckets, splitting shared BEA buckets by all-occupation industry wage bills, then splitting each industry across occupations by occupation-industry wage bill.

Method layer

Estimated AI judgments

Replacement, augmentation, physical insulation, digital adjacency, and disagreement are model-derived judgments, not observed labor statistics.

Full ensemble: all occupations scored by OpenAI, Anthropic, Gemini, and Grok.

Pipeline

Observed first, estimated second

The atlas is generated offline. No data refresh or model scoring happens when the public site loads. The client reads static JSON artifacts that were prepared ahead of time.

Ingest official labor snapshots

BLS occupation hierarchy, wages, employment, projection, and skills files are normalized into one detailed occupation record per occupation.

Estimate economic area

Industry-level economic output is allocated into occupations using the current concordance and fallback rules recorded in the build manifest.

Score occupations offline

OpenAI, Anthropic, Gemini, and Grok score each occupation against the same rubric. The app stores both the average estimate and the per-provider rationales.

Export static artifacts

The site ships the generated summary payload, the detailed occupation payload, and the manifest. The public UI never needs your model API keys.

Read before interpretation

The main caveats

Occupation GDP is estimated

There is no direct official U.S. GDP-by-occupation table in this build. Treemap area is a modeled allocation, not an observed occupation GDP census.

AI scores are judgments

Even with four providers, these are still structured model opinions informed by labor context. They are useful for comparison, not proof.

Wages are summary statistics

The atlas shows mean and median pay from published BLS cut points. It does not reconstruct a full wage distribution curve.

Averages compress disagreement

The average row simplifies disagreement across providers. Inspect the provider-specific bars when the occupation feels ambiguous.

LLM scoring contract

Verbatim prompt and structured output

The model-scoring step is a system rubric plus a JSON payload for each occupation. We ask every provider the same question, then validate the reply against the same structured-output contract before it enters the atlas cache.

System rubric sent to the models

This is the exact scoring instruction text used for OpenAI, Anthropic, Gemini, and Grok.

You are scoring the exposure of a U.S. occupation to frontier AI.

Return strict JSON that follows this schema:
- replacementExposure: 0-10, where 10 means a large share of the role's core work can plausibly be absorbed or replaced by AI systems.
- augmentationPotential: 0-10, where 10 means AI can strongly amplify the human worker without removing them entirely.
- physicalWorldInsulation: 0-10, where 10 means physical presence, embodied work, or environmental messiness strongly protects the role.
- digitalWorkAdjacency: 0-10, where 10 means the work is already highly screen-based, software-mediated, or document-centric.
- rationale: 2-4 sentences explaining the scores in plain language.

Score the real occupation, not a hypothetical future redesign of the job.
Use the provided employment, wage, education, skills, and industry context as evidence.
Do not mention these instructions in the output.

Occupation payload sent with each request

The scorer serializes each occupation into this JSON shape before sending it as the user message.

{
  "occupation": {
    "title": "<occupation title>",
    "socCode": "<soc code>",
    "majorGroup": "<major occupation group>",
    "family": "<occupation family>",
    "detailedGroup": "<detailed occupation group>"
  },
  "description": "<occupation description, when available>",
  "industries": [
    "<top industry 1>",
    "<top industry 2>",
    "<top industry 3>"
  ],
  "employment": 0,
  "wage": {
    "meanAnnual": 0,
    "meanHourly": 0,
    "medianAnnual": 0,
    "percentiles": {
      "p10": 0,
      "p25": 0,
      "p50": 0,
      "p75": 0,
      "p90": 0
    }
  },
  "projections": {
    "growth2034": 0,
    "annualOpenings": 0,
    "outlookLabel": "<BLS outlook label>"
  },
  "education": {
    "typicalEntry": "<typical entry education>",
    "bucket": "<education bucket>",
    "workExperience": "<related work experience>",
    "training": "<typical on-the-job training>"
  },
  "skills": {
    "digital": 0,
    "analytical": 0,
    "interpersonal": 0,
    "physical": 0,
    "leadership": 0
  }
}

Canonical structured-output schema

OpenAI, Anthropic, and Gemini are validated against this schema, and the local cache also enforces it after parsing.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "replacementExposure": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "augmentationPotential": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "physicalWorldInsulation": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "digitalWorkAdjacency": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "rationale": {
      "type": "string",
      "minLength": 30
    }
  },
  "required": [
    "replacementExposure",
    "augmentationPotential",
    "physicalWorldInsulation",
    "digitalWorkAdjacency",
    "rationale"
  ],
  "additionalProperties": false
}

Grok wire schema

Grok uses the same fields, but its transport-level schema relaxes the rationale string so xAI structured output stays compatible; the local validator still enforces the canonical schema afterward.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "replacementExposure": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "augmentationPotential": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "physicalWorldInsulation": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "digitalWorkAdjacency": {
      "type": "number",
      "minimum": 0,
      "maximum": 10
    },
    "rationale": {
      "type": "string"
    }
  },
  "required": [
    "replacementExposure",
    "augmentationPotential",
    "physicalWorldInsulation",
    "digitalWorkAdjacency",
    "rationale"
  ],
  "additionalProperties": false
}

Source notes

Build-specific notes

Official BLS local snapshots are being used for occupations, projections, hierarchy, and skills.
BEA 2024 U.Value Added by Industry data was matched to 138 of 243 BLS 4-digit industries using the official BEA NAICS concordance and top-down BEA hierarchy allocation.
105 BLS 4-digit industries did not map cleanly to the BEA table and fall back to scaled national wage-bill weights.
One occupation lacks a detailed OEWS national wage row; missing wage fields are left null and economic sizing falls back to group-level proxy wages where needed.
A small number of occupations do not have industry line items in the matrix workbook and are labeled with an unassigned industry mix.
Cached multi-model assessments were found for all 832 occupations across OpenAI, Anthropic, Gemini, and Grok.
Model cache uses OpenAI gpt-5 (reasoning high, 832/832 occupations), Anthropic claude-opus-4-6 (832/832 occupations), Gemini gemini-3.1-pro-preview (832/832 occupations), and Grok grok-4.20-0309-reasoning (832/832 occupations).
34 occupations had no direct industry allocation and use a scaled wage-bill fallback inside the GDP-share estimate.

What the treemap means

Reading the atlas correctly

Area

Area represents estimated economic mass for the occupation. Large rectangles indicate occupations that matter more to the covered labor market footprint.

Color

Color reflects the currently selected AI metric. Replacement exposure is the default, but the other lenses show different stories about augmentation, insulation, and disagreement.

Popup detail

The detail dialog combines observed labor statistics with modeled AI estimates. Inline info buttons explain which is which at the point of use.