How the atlas was grounded in recent AI labor research

The atlas is model-generated, but it is not meant to stand alone. This page explains the external research layer behind it: which public papers were reviewed, what kind of evidence each one contributes, which cross-paper claims recur, and where those claims can be attached back to occupations without pretending they are the same thing as the atlas score.

Papers reviewed

5

Labs represented

2

Recurring themes

4

Occupation seeds

5

Assembly

How this research layer was assembled

This is not a paper dump. The goal was to build a small, legible evidence layer for the atlas: enough external grounding to interpret the map better, without blurring together evidence that means very different things.

Step 1

Start with a narrow public source set

The research layer uses a deliberately small set of recent OpenAI and Anthropic papers that say something concrete about workplace use, task capability, or labor-market effects.

Step 2

Separate evidence by what it actually measures

Some papers show observed usage, some benchmark frontier capability, and some frame workforce or policy implications. Keeping those buckets separate avoids false certainty.

Step 3

Pull out claims that can travel to occupations

For each paper, the useful outputs are recurring claims, caveats, and any occupations or occupation families the paper names directly enough to support a specific note.

Step 4

Use papers as interpretation, not hidden scoring

The papers help explain, challenge, or sharpen atlas estimates. They do not quietly overwrite the atlas score unless the scoring method itself changes.

Reading guide

How to read the papers against the atlas

The papers and the atlas answer related but different questions. The useful move is to let each source do the job it is actually good at.

What the papers contribute

They contribute observed product use, benchmark tasks built from expert work, and policy context. That is stronger evidence for what frontier models are doing now, or are plausibly close to doing soon.

What the atlas contributes

The atlas covers the full BLS occupation taxonomy in one place, keeps labor-market context like pay and projected growth attached to each occupation, and lets readers compare several AI lenses across the whole U.S. job structure.

What to keep separate

Observed usage, benchmark capability, and workforce-policy proposals answer different questions. This page keeps them separate instead of pretending they collapse into one universal exposure score.

Cross-paper synthesis

Four claims that recur across the source set

Observed use is still concentrated in digital, language-heavy work

Across Anthropic's task-level usage papers and OpenAI's narrative blueprint, the strongest early signal is still software, writing, analysis, and other screen-based knowledge work rather than the whole labor market at once.

AnthropicAnthropicOpenAI

Augmentation currently edges pure automation in product usage

The observed usage studies consistently show people using AI to draft, iterate, explain, and review more often than to fully hand over end-to-end work, even if API usage skews more automated.

AnthropicAnthropicOpenAI

Capability is ahead of adoption

The Anthropic labor note and OpenAI's GDPval benchmark point the same way: current models can do more than current usage implies, but that gap is mediated by tools, workflow, regulation, verification, and organizational change.

AnthropicOpenAIOpenAI

Exposure is uneven across occupations and worker groups

Physical-world jobs remain relatively insulated in current usage studies, while more educated and higher-paid white-collar roles show more observed exposure and more evidence of workflow change.

AnthropicAnthropicAnthropic
Paper-by-paper review

What each paper contributes

OpenAIPolicy blueprint

AI at Work: OpenAI's Workforce Blueprint

October 2025

What this source measures

Combines product-usage observations, early market interpretation, and workforce-transition proposals.

How it informs the atlas

Useful as a policy and timing lens for interpreting replacement scores cautiously and for explaining why current usage can lag capability.

It is not an occupation-by-occupation empirical exposure table, so it should not be treated as a direct labeling source for the full atlas.

Near-term use looks more collaborative than fully substitutive

OpenAI frames current workplace use as decision support, writing, research, and routine streamlining, and argues the observed pattern is still more enabling than replacing.

Foreword, pp. 2-4.

Workplace adoption often starts bottom-up

The paper says employees often begin using ChatGPT before formal enterprise deployment, with writing, market research, and data analysis showing up early across business functions.

Foreword, pp. 2-3.

Capability is moving faster than labor-market measurement

OpenAI links its GDPval benchmark to a claim that GPT-5-level systems already match or exceed professionals on about half of the benchmarked economically valuable tasks.

Foreword, p. 3.

AnthropicEmpirical labor note

Labor Market Impacts of AI: A New Measure and Early Evidence

March 5, 2026

What this source measures

Introduces observed exposure, combining theoretical LLM feasibility with real usage weighted toward work-related and more automated use.

How it informs the atlas

This is the closest external analogue to our replacement metric because it explicitly tries to bridge theoretical exposure and observed use.

It is still built from Claude-centered usage and a custom weighting scheme, so it is informative context rather than a drop-in target variable for our own labels.

Current deployment still trails theoretical capability

Anthropic argues actual task coverage remains only a fraction of what current models could theoretically do, which is a strong warning against equating capability with labor displacement.

Key findings, p. 2.

Higher observed exposure lines up with weaker projected growth

The note reports that occupations with higher observed exposure are projected by BLS to grow less through 2034.

Key findings, p. 2.

Early labor effects are subtle rather than dramatic

Anthropic reports no broad unemployment spike for exposed workers since late 2022, but it does see suggestive evidence that hiring of younger workers has slowed in exposed occupations.

Key findings, p. 2.

OpenAICapability benchmark

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

October 2025

What this source measures

Benchmarks frontier models on expert-authored, economically valuable tasks from predominantly digital occupations.

How it informs the atlas

Strong input for our digital-adjacency and augmentation reasoning because it measures capability on serious real-world deliverables.

Because GDPval is only 44 occupations and intentionally digital, it cannot stand in for exposure across manual, care, or field-heavy occupations.

The benchmark targets high-value digital work, not the full labor market

GDPval covers 44 occupations across the top 9 GDP sectors and deliberately focuses on predominantly digital roles.

Abstract and Section 2.1, pp. 1-3.

The task set is grounded in real expert work product

Tasks are built from work contributed by experienced practitioners and are evaluated with human expert pairwise comparisons rather than only automatic grading.

Abstract and Sections 2.2-2.5, pp. 1-4.

Frontier models are approaching expert-quality output on this narrow slice

OpenAI reports that the strongest frontier systems are approaching industry experts on the GDPval gold subset, which raises the upper bound on near-term exposure for digital occupations.

Section 3.1, pp. 4-5.

AnthropicUsage analysis

Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations

February 2025

What this source measures

Maps millions of Claude conversations onto O*NET tasks to show where AI is already being used in the economy.

How it informs the atlas

One of the best external anchors for our augmentation and physical-world insulation metrics.

It is platform-specific usage evidence, so it can understate occupations where capability exists but product adoption, regulation, or workflow integration lag.

Observed use is concentrated in software and writing

Anthropic finds software development and writing tasks together account for nearly half of total observed Claude usage.

Abstract, pp. 1-2.

Adoption is broad but shallow across many occupations

The paper reports that about 36% of occupations show AI use in at least a quarter of their tasks, but only a small share show deep task penetration.

Abstract and contributions, pp. 1-3.

Augmentation edges automation in observed product use

Anthropic estimates 57% of usage is augmentative and 43% is more automation-like, while occupations involving physical manipulation show minimal current use.

Abstract and Section 1 contributions, pp. 1-3.

AnthropicUsage and success-rate report

The Anthropic Economic Index Report: Economic Primitives

January 15, 2026

What this source measures

Adds task complexity, autonomy, success rate, and work-versus-coursework distinctions to Claude usage analysis.

How it informs the atlas

Best source in this set for occupation-specific nuance beyond a single headline score, especially around what kind of work remains after AI takes on some tasks.

It still reflects Claude usage and success rather than a cross-provider equilibrium, so it should complement rather than override our ensemble estimates.

Success rates meaningfully change occupational exposure

When Anthropic weights tasks by both importance and Claude success rate, some occupations such as data entry keyers and database architects show large swaths of work within reach.

Introduction, pp. 3-4.

Observed use remains mixed between collaboration and delegation

Anthropic reports augmentation again exceeds automation on Claude.ai, even while automated use remains stronger in first-party API traffic.

Chapter 1 overview, pp. 4-5.

Task removal can imply deskilling or upskilling depending on the occupation

The report uses travel agents and property managers to show that removing AI-covered tasks can either hollow out the most complex work or strip away bookkeeping-heavy work and leave more strategic responsibilities.

Introduction, pp. 3-4.

Occupation citation layer

How to attach paper evidence to occupations

A good next layer is a short paper-backed note inside each occupation detail view: evidence that sits beside the atlas estimate to explain it, pressure-test it, or add nuance that a single score cannot carry on its own.

Step 1

Start with explicit mentions, not fuzzy semantic matching

Seed the feature only with occupations or occupation families the papers name directly. That keeps the first pass auditable and avoids inventing authority where the papers were actually more general.

Step 2

Store statement, evidence type, and locator separately

Each note should keep a short paraphrased statement, the paper citation, page locator, and whether it reflects observed usage, benchmark capability, or policy framing.

Step 3

Attach notes as secondary evidence, not as score overrides

A citation should sit beside the atlas estimate to explain or challenge it. It should not silently rewrite a score unless the scoring method itself has changed to incorporate that evidence.

Occupation hooks

Occupation families with the clearest paper trail

These are strong first candidates because the papers either name the occupations directly or describe a narrow enough occupation family to support an auditable note.

Software DevelopersData ScientistsTechnical WritersWriters and Authors

Observed Claude usage is especially concentrated in software development, writing, and analytical work, so these occupations are good candidates for paper-backed exposure notes.

These are direct occupation families where our high digital-adjacency and replacement scores can be paired with external observed-usage evidence.

usageAnthropicAbstract and contributions, pp. 1-3.
AnesthesiologistsConstruction LaborersConstruction Workers

Occupations requiring physical manipulation of the environment show minimal current Claude usage, making them strong examples for the physical-world insulation metric.

This gives the atlas a concrete external citation for why some low-replacement cells stay relatively green even when they are economically large.

usageAnthropicAbstract and contributions, pp. 1-3.
Data Entry KeyersDatabase Architects

When Anthropic factors in task success rates, data entry keyers and database architects are examples where Claude appears capable across a large share of the job.

These are unusually clean occupation-level hooks for the modal because the report names them directly and says something more specific than a generic exposure score.

usageAnthropicIntroduction, pp. 3-4.
Travel Agents

Anthropic uses travel agents as an example where AI-covered tasks may remove more complex planning work and leave more routine ticketing and payment work behind.

This is exactly the kind of nuance a single replacement score cannot carry on its own.

usageAnthropicIntroduction, pp. 3-4.
Property, Real Estate, and Community Association ManagersProperty Managers

Anthropic uses property managers as the opposite case, where removing bookkeeping-heavy tasks can leave more negotiation and stakeholder management work behind.

This supports adding citation-backed notes that distinguish augmentation from deskilling even within occupations that look similarly exposed on a single color scale.

usageAnthropicIntroduction, pp. 3-4.