How the occupation atlas is built
This page separates what is observed from what is estimated, shows how the offline pipeline assembles the public dataset, and makes the main caveats legible before you interpret the treemap.
Occupations
832
Employment
170M
Weighted replacement
4.1/10
High-exposure wage bill
$1.1T
Method layer
Observed labor spine
Occupation hierarchy, employment, wages, education, outlook, openings, and industry context come from BLS snapshot files baked into this build.
- Full BLS line-item occupation snapshot with 832 occupations.
- Salary bands come from OEWS p10/p25/p50/p75/p90 values when available; missing values remain null.
Method layer
Estimated economic size
Treemap area is not a direct GDP-by-occupation table. It is an allocation from industry-level output into occupations using the offline generation pipeline.
- Occupation GDP share is estimated by allocating BEA 2024 value added across BLS 4-digit industries using BEA sector, summary, and underlying-summary industry buckets, splitting shared BEA buckets by all-occupation industry wage bills, then splitting each industry across occupations by occupation-industry wage bill.
Method layer
Estimated AI judgments
Replacement, augmentation, physical insulation, digital adjacency, and disagreement are model-derived judgments, not observed labor statistics.
- Full ensemble: all occupations scored by OpenAI, Anthropic, Gemini, and Grok.
Pipeline
Observed first, estimated second
The atlas is generated offline. No data refresh or model scoring happens when the public site loads. The client reads static JSON artifacts that were prepared ahead of time.
Ingest official labor snapshots
BLS occupation hierarchy, wages, employment, projection, and skills files are normalized into one detailed occupation record per occupation.
Estimate economic area
Industry-level economic output is allocated into occupations using the current concordance and fallback rules recorded in the build manifest.
Score occupations offline
OpenAI, Anthropic, Gemini, and Grok score each occupation against the same rubric. The app stores both the average estimate and the per-provider rationales.
Export static artifacts
The site ships the generated summary payload, the detailed occupation payload, and the manifest. The public UI never needs your model API keys.
Read before interpretation
The main caveats
Occupation GDP is estimated
There is no direct official U.S. GDP-by-occupation table in this build. Treemap area is a modeled allocation, not an observed occupation GDP census.
AI scores are judgments
Even with three providers, these are still structured model opinions informed by labor context. They are useful for comparison, not proof.
Percentile wages are bands
The wage panel shows published percentile cut points. It does not reconstruct a full wage distribution curve.
Averages compress disagreement
The average row simplifies disagreement across providers. Inspect the provider-specific bars when the occupation feels ambiguous.
LLM scoring contract
Verbatim prompt and structured output
The model-scoring step is a system rubric plus a JSON payload for each occupation. We ask every provider the same question, then validate the reply against the same structured-output contract before it enters the atlas cache.
System rubric sent to the models
This is the exact scoring instruction text used for OpenAI, Anthropic, Gemini, and Grok.
You are scoring the exposure of a U.S. occupation to frontier AI.
Return strict JSON that follows this schema:
- replacementExposure: 0-10, where 10 means a large share of the role's core work can plausibly be absorbed or replaced by AI systems.
- augmentationPotential: 0-10, where 10 means AI can strongly amplify the human worker without removing them entirely.
- physicalWorldInsulation: 0-10, where 10 means physical presence, embodied work, or environmental messiness strongly protects the role.
- digitalWorkAdjacency: 0-10, where 10 means the work is already highly screen-based, software-mediated, or document-centric.
- rationale: 2-4 sentences explaining the scores in plain language.
Score the real occupation, not a hypothetical future redesign of the job.
Use the provided employment, wage, education, skills, and industry context as evidence.
Do not mention these instructions in the output.Occupation payload sent with each request
The scorer serializes each occupation into this JSON shape before sending it as the user message.
{
"occupation": {
"title": "<occupation title>",
"socCode": "<soc code>",
"majorGroup": "<major occupation group>",
"family": "<occupation family>",
"detailedGroup": "<detailed occupation group>"
},
"description": "<occupation description, when available>",
"industries": [
"<top industry 1>",
"<top industry 2>",
"<top industry 3>"
],
"employment": 0,
"wage": {
"meanAnnual": 0,
"meanHourly": 0,
"medianAnnual": 0,
"percentiles": {
"p10": 0,
"p25": 0,
"p50": 0,
"p75": 0,
"p90": 0
}
},
"projections": {
"growth2034": 0,
"annualOpenings": 0,
"outlookLabel": "<BLS outlook label>"
},
"education": {
"typicalEntry": "<typical entry education>",
"bucket": "<education bucket>",
"workExperience": "<related work experience>",
"training": "<typical on-the-job training>"
},
"skills": {
"digital": 0,
"analytical": 0,
"interpersonal": 0,
"physical": 0,
"leadership": 0
}
}Canonical structured-output schema
OpenAI, Anthropic, and Gemini are validated against this schema, and the local cache also enforces it after parsing.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"replacementExposure": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"augmentationPotential": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"physicalWorldInsulation": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"digitalWorkAdjacency": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"rationale": {
"type": "string",
"minLength": 30
}
},
"required": [
"replacementExposure",
"augmentationPotential",
"physicalWorldInsulation",
"digitalWorkAdjacency",
"rationale"
],
"additionalProperties": false
}Grok wire schema
Grok uses the same fields, but its transport-level schema relaxes the rationale string so xAI structured output stays compatible; the local validator still enforces the canonical schema afterward.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"replacementExposure": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"augmentationPotential": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"physicalWorldInsulation": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"digitalWorkAdjacency": {
"type": "number",
"minimum": 0,
"maximum": 10
},
"rationale": {
"type": "string"
}
},
"required": [
"replacementExposure",
"augmentationPotential",
"physicalWorldInsulation",
"digitalWorkAdjacency",
"rationale"
],
"additionalProperties": false
}Source notes
Build-specific notes
- Official BLS local snapshots are being used for occupations, projections, hierarchy, and skills.
- BEA 2024 U.Value Added by Industry data was matched to 138 of 243 BLS 4-digit industries using the official BEA NAICS concordance and top-down BEA hierarchy allocation.
- 105 BLS 4-digit industries did not map cleanly to the BEA table and fall back to scaled national wage-bill weights.
- One occupation lacks a detailed OEWS national wage row; missing wage fields are left null and economic sizing falls back to group-level proxy wages where needed.
- A small number of occupations do not have industry line items in the matrix workbook and are labeled with an unassigned industry mix.
- Cached multi-model assessments were found for all 832 occupations across OpenAI, Anthropic, Gemini, and Grok.
- Model cache uses OpenAI gpt-5 (reasoning high, 832/832 occupations), Anthropic claude-opus-4-6 (832/832 occupations), Gemini gemini-3.1-pro-preview (832/832 occupations), and Grok grok-4.20-0309-reasoning (832/832 occupations).
- 34 occupations had no direct industry allocation and use a scaled wage-bill fallback inside the GDP-share estimate.
What the treemap means
Reading the atlas correctly
Area
Area represents estimated economic mass for the occupation. Large rectangles indicate occupations that matter more to the covered labor market footprint.
Color
Color reflects the currently selected AI metric. Replacement exposure is the default, but the other lenses show different stories about augmentation, insulation, and disagreement.
Popup detail
The detail dialog combines observed labor statistics with modeled AI estimates. Inline info buttons explain which is which at the point of use.