How the atlas was grounded in the full AI labor review
The atlas is model-generated, but it is not meant to stand alone. This page now synthesizes the broader research work captured in `research/notes` and `research/outputs`, not only the original handful of papers that first anchored occupation notes.
The current review consulted 24 source items, accepted 17 into the final synthesis, keeps 5 core source cards on this page, and now uses 11 sources for direct occupation-note mapping.
How this broader review was assembled
This is no longer just a paper shelf for the note layer. The goal is to keep the operational source subset visible while also bringing the stronger academic and institutional evidence into the same reading frame.
Step 1
Start with the full reviewed evidence base
The local review now synthesizes the research memos and final output artifact, not only the original five core papers that first anchored the occupation-note layer.
Step 2
Keep the operational note layer separate from the wider review
The app still foregrounds a five-source core card layer, but occupation notes can now also draw from selected additional-reading sources when the occupation mapping is specific enough to audit.
Step 3
Separate exposure, adoption, productivity, and employment
Some sources estimate task overlap, some measure firm-level productivity, some track online labor-market substitution, and some offer official cross-country synthesis. The review keeps those objects distinct.
Step 4
Use the literature to sharpen the atlas, not overwrite it
The broader review adds context, caveats, and stronger ranges for interpretation. It still does not silently replace the atlas score unless the scoring method changes.
How to read the full review against the atlas
The useful move is to let each layer do the job it is actually good at: the broader review for ranges and caveats, the selected note sources for occupation-facing context, and the atlas for comparable coverage across the BLS taxonomy.
What the broader review contributes
The full research review brings together exposure studies, field productivity evidence, online labor-market evidence, and institutional synthesis. That wider frame helps interpret the atlas without pretending a single paper settles the question.
What the note layer contributes
The note layer now combines the five core OpenAI and Anthropic sources with selected additional-reading studies where the occupation link is specific enough to audit rather than hand-wave.
What the atlas contributes
The atlas covers the full BLS occupation taxonomy in one place, keeps labor-market context like pay and projected growth attached to each occupation, and lets readers compare several AI lenses across the whole U.S. job structure.
What to keep separate
Observed usage, benchmark capability, task exposure, productivity gains, and employment effects answer different questions. This page keeps them separate instead of collapsing them into one universal exposure score.
Three evidence layers now sit behind the page
11 note sources
Occupation-note layer
The note system still starts from the five core OpenAI and Anthropic papers, but it now adds selected exposure, productivity, employment, and official sources when the occupation mapping is tight enough to stay auditable.
Atlas role
These sources power the note text shown inside occupation detail views and the occupation-hook examples on this page.
7 core studies
Academic empirical backbone
The broader review adds the strongest exposure papers, field productivity studies, and online labor-market evidence from Science, NBER, and peer-reviewed or widely cited working-paper sources.
Atlas role
This is the layer that stabilizes claims about which jobs move first, what gains are measurable, and where early substitution is actually showing up.
7 official reports
Institutional synthesis layer
ILO, IMF, and OECD reports add global exposure ranges, firm-adoption context, distributional patterns, and the strongest official cautions against equating exposure with layoffs.
Atlas role
This layer calibrates the atlas against cross-country evidence, firm heterogeneity, and labor-policy framing that the core paper-card subset cannot cover on its own.
What recurs across the full evidence base
Exposure is broad, but realized displacement is narrower so far
Across the strongest academic and institutional sources, technical exposure is large, but the most concrete negative labor effects remain selective and concentrated in specific task markets rather than broad economy-wide employment collapse.
Office-heavy, digitized, language-intensive work moves first
The repeated cross-source pattern is still clerical, administrative, customer-support, writing, legal-support, finance-support, and other highly digitized professional work at the high end of exposure, with physical and in-person roles relatively insulated in the near term.
Augmentation has the strongest measured evidence in structured workflows
The cleanest measured gains come from codified service and writing tasks where AI can draft, retrieve, summarize, or coach inside an existing workflow rather than trying to replace the whole job all at once.
Early substitution is clearest in modular external labor markets
The strongest short-run negative labor evidence comes from online labor and freelance markets, where work is easier to unbundle, price-compare, and substitute quickly.
Adoption and impact vary sharply across firms and countries
Capability alone does not determine labor outcomes. Firm maturity, workflow integration, regulation, and labor-market context shape whether AI appears as a productivity tool, a contractor substitute, or a slow-moving organizational change.
Distributional effects are real, but they do not all point the same way
Women are often more exposed in official data because of clerical concentration, higher-educated workers are often more exposed because they do cognitive work, and less-experienced workers sometimes gain more from AI assistance even when exposed occupations overall face pressure.
Quantitative ranges the broader review supports
Global exposure floor
25%
ILO estimates one in four workers globally are in occupations with some degree of generative-AI exposure.
Highest global exposure tier
3.3%
ILO's top exposure bucket is much smaller than the broader exposure population, which is one reason exposure should not be read as job loss.
Global vs advanced-economy exposure
~40% / ~60%
IMF's framing puts about 40% of global employment and about 60% of advanced-economy employment in exposed jobs.
U.S. task exposure potential
80% / 19%
OpenAI and Eloundou et al. estimate about 80% of U.S. workers could have at least 10% of tasks affected, while about 19% could have at least half their tasks affected.
Customer-support productivity gains
+14-15% avg.
Brynjolfsson, Li, and Raymond find measured productivity gains around 14-15% on average, with about 34-35% gains for novice or lower-skilled workers.
Writing-task productivity gains
-40% time, +18% quality
Noy and Zhang show strong experimental gains on professional writing tasks, supporting the augmentation case for codifiable language work.
SME adoption and staffing
31% use, 83% no staff-need change
OECD's SME evidence points to meaningful adoption and performance gains without a corresponding broad reduction in staff need in most firms surveyed.
Early freelance and platform pressure
-2% to -50%
Across the strongest platform studies, exposed markets show declines ranging from low single-digit contract and earnings losses to double-digit or larger demand drops in substitutable skill clusters.
The broader review rests on academic and official source backbones
Academic and empirical backbone
These are the strongest non-operational sources from the memos and final review for exposure rankings, productivity measurement, and early labor-demand effects.
2023-2024
Foundational exposure paper for the claim that task overlap can be broad even when realized labor outcomes are still uncertain.
2023
Useful for occupation and industry rankings, especially clerical, legal, finance, and customer-support-heavy roles.
2023 / 2025
Strongest measured workplace productivity evidence for augmentation in structured customer-support work.
2023
Shows large writing-task productivity gains in a controlled setting rather than a firm or labor-market equilibrium.
2024
One of the cleanest early indications of reduced contracts and earnings in AI-exposed freelance work.
2024
Shows larger drops in demand and transaction volume in AI-exposed submarkets, alongside worker reallocation toward programming.
2025
Sharpest evidence that substitutable freelance skill clusters can see double-digit or larger demand losses while complementary ones grow.
Institutional and official synthesis
These official sources anchor the review's global ranges, adoption context, distributional findings, and caution around over-interpreting near-term employment effects.
2025
Best official source for global occupational exposure ranges and the continued primacy of clerical exposure.
2025
Useful for methodology, multimodal updates, and keeping the global exposure frame current.
2024
Important for the global, advanced-economy, emerging-market, and low-income-country exposure ranges and complementarity framing.
2023
Strongest official caution that broad negative employment effects are still hard to establish in aggregate.
2024
Adds survey evidence on worker-perceived performance gains, enjoyment, and ongoing anxieties around job loss and inequality.
2025
Best current evidence that firms can report meaningful GenAI use and performance gains without broad staffing cuts.
2024
Important for the claim that adoption is concentrated in frontier firms and could widen productivity and wage gaps.
Where the research still needs to be read carefully
Exposure is not the same thing as displacement
The strongest exposure papers estimate task overlap or technical reach, not whether firms actually cut jobs or wages. The full review keeps those categories separate on purpose.
Capability is ahead of adoption
OpenAI and Anthropic both point to a gap between what current models can do and what current firms or workers have actually integrated into reliable production workflows.
Firm studies and platform studies tell different stories
Structured enterprise settings often show augmentation and productivity gains, while online labor platforms are where substitution pressure appears first and most clearly.
Cross-country headline percentages are range markers, not a single shared truth
ILO, IMF, OECD, and OpenAI-style studies use different taxonomies, exposure concepts, and capability assumptions, so their top-line percentages are informative ranges rather than interchangeable facts.
Employment evidence still lags task and productivity evidence
There is much more credible evidence on exposure and short-run workflow gains than on whole-economy hiring, wage, and occupation-transition effects over the medium term.
What the occupation-note papers contribute directly
The page now reflects the broader review, but the app still operationalizes a smaller source subset for occupation notes. These are the papers that currently map most directly into note text, evidence hooks, and atlas interpretation.
AI at Work: OpenAI's Workforce Blueprint
October 2025
What this source measures
Combines product-usage observations, early market interpretation, and workforce-transition proposals.
How it informs the atlas
Useful as a policy and timing lens for interpreting replacement scores cautiously and for explaining why current usage can lag capability.
It is not an occupation-by-occupation empirical exposure table, so it should not be treated as a direct labeling source for the full atlas.
Near-term use looks more collaborative than fully substitutive
OpenAI frames current workplace use as decision support, writing, research, and routine streamlining, and argues the observed pattern is still more enabling than replacing.
Foreword, pp. 2-4.
Workplace adoption often starts bottom-up
The paper says employees often begin using ChatGPT before formal enterprise deployment, with writing, market research, and data analysis showing up early across business functions.
Foreword, pp. 2-3.
Capability is moving faster than labor-market measurement
OpenAI links its GDPval benchmark to a claim that GPT-5-level systems already match or exceed professionals on about half of the benchmarked economically valuable tasks.
Foreword, p. 3.
Labor Market Impacts of AI: A New Measure and Early Evidence
March 5, 2026
What this source measures
Introduces observed exposure, combining theoretical LLM feasibility with real usage weighted toward work-related and more automated use.
How it informs the atlas
This is the closest external analogue to our replacement metric because it explicitly tries to bridge theoretical exposure and observed use.
It is still built from Claude-centered usage and a custom weighting scheme, so it is informative context rather than a drop-in target variable for our own labels.
Current deployment still trails theoretical capability
Anthropic argues actual task coverage remains only a fraction of what current models could theoretically do, which is a strong warning against equating capability with labor displacement.
Key findings, p. 2.
Higher observed exposure lines up with weaker projected growth
The note reports that occupations with higher observed exposure are projected by BLS to grow less through 2034.
Key findings, p. 2.
Early labor effects are subtle rather than dramatic
Anthropic reports no broad unemployment spike for exposed workers since late 2022, but it does see suggestive evidence that hiring of younger workers has slowed in exposed occupations.
Key findings, p. 2.
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
October 2025
What this source measures
Benchmarks frontier models on expert-authored, economically valuable tasks from predominantly digital occupations.
How it informs the atlas
Strong input for our digital-adjacency and augmentation reasoning because it measures capability on serious real-world deliverables.
Because GDPval is only 44 occupations and intentionally digital, it cannot stand in for exposure across manual, care, or field-heavy occupations.
The benchmark targets high-value digital work, not the full labor market
GDPval covers 44 occupations across the top 9 GDP sectors and deliberately focuses on predominantly digital roles.
Abstract and Section 2.1, pp. 1-3.
The task set is grounded in real expert work product
Tasks are built from work contributed by experienced practitioners and are evaluated with human expert pairwise comparisons rather than only automatic grading.
Abstract and Sections 2.2-2.5, pp. 1-4.
Frontier models are approaching expert-quality output on this narrow slice
OpenAI reports that the strongest frontier systems are approaching industry experts on the GDPval gold subset, which raises the upper bound on near-term exposure for digital occupations.
Section 3.1, pp. 4-5.
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
February 2025
What this source measures
Maps millions of Claude conversations onto O*NET tasks to show where AI is already being used in the economy.
How it informs the atlas
One of the best external anchors for our augmentation and physical-world insulation metrics.
It is platform-specific usage evidence, so it can understate occupations where capability exists but product adoption, regulation, or workflow integration lag.
Observed use is concentrated in software and writing
Anthropic finds software development and writing tasks together account for nearly half of total observed Claude usage.
Abstract, pp. 1-2.
Adoption is broad but shallow across many occupations
The paper reports that about 36% of occupations show AI use in at least a quarter of their tasks, but only a small share show deep task penetration.
Abstract and contributions, pp. 1-3.
Augmentation edges automation in observed product use
Anthropic estimates 57% of usage is augmentative and 43% is more automation-like, while occupations involving physical manipulation show minimal current use.
Abstract and Section 1 contributions, pp. 1-3.
The Anthropic Economic Index Report: Economic Primitives
January 15, 2026
What this source measures
Adds task complexity, autonomy, success rate, and work-versus-coursework distinctions to Claude usage analysis.
How it informs the atlas
Best source in this set for occupation-specific nuance beyond a single headline score, especially around what kind of work remains after AI takes on some tasks.
It still reflects Claude usage and success rather than a cross-provider equilibrium, so it should complement rather than override our ensemble estimates.
Success rates meaningfully change occupational exposure
When Anthropic weights tasks by both importance and Claude success rate, some occupations such as data entry keyers and database architects show large swaths of work within reach.
Introduction, pp. 3-4.
Observed use remains mixed between collaboration and delegation
Anthropic reports augmentation again exceeds automation on Claude.ai, even while automated use remains stronger in first-party API traffic.
Chapter 1 overview, pp. 4-5.
Task removal can imply deskilling or upskilling depending on the occupation
The report uses travel agents and property managers to show that removing AI-covered tasks can either hollow out the most complex work or strip away bookkeeping-heavy work and leave more strategic responsibilities.
Introduction, pp. 3-4.
How to read the paper-backed notes inside occupations
The note layer is a compact interpretation aid for readers, not an implementation plan. The notes now mix core vendor studies with selected exposure, productivity, employment, and official sources from the broader review where the occupation link is strong enough to defend.
Step 1
Treat occupation notes as context, not a verdict
A paper-backed note helps explain why an occupation looks exposed, insulated, augmentable, or ambiguous. It should sharpen your reading of the atlas rather than replace it.
Step 2
Pay attention to what kind of evidence you are reading
Some notes reflect observed usage, some reflect exposure rankings, some reflect benchmark capability, some report productivity or employment effects, and some are policy framing. Those are related, but they are not the same thing.
Step 3
Use the note together with the atlas metrics
The strongest reading is comparative: look at the occupation's replacement, augmentation, physical, and disagreement scores, then use the note to understand what kind of pressure or caveat the literature adds.
Occupation families with the clearest paper trail
These are strong first candidates because the papers either name the occupations directly or describe a narrow enough occupation family to support an auditable note.
Observed Claude usage is especially concentrated in software development, writing, and analytical work, so these occupations are good candidates for paper-backed exposure notes.
These are direct occupation families where our high digital-adjacency and replacement scores can be paired with external observed-usage evidence.
Occupations requiring physical manipulation of the environment show minimal current Claude usage, making them strong examples for the physical-world insulation metric.
This gives the atlas a concrete external citation for why some low-replacement cells stay relatively green even when they are economically large.
When Anthropic factors in task success rates, data entry keyers and database architects are examples where Claude appears capable across a large share of the job.
These are unusually clean occupation-level hooks for the modal because the report names them directly and says something more specific than a generic exposure score.
Felten, Raj, and Seamans rank telemarketers and several HR, analyst, finance, and postsecondary-teaching roles among the occupations most exposed to language modeling.
This gives the atlas a clean exposure-ranking hook for office-heavy occupations that predate the later product-usage studies.
Brynjolfsson, Li, and Raymond report that AI assistance raised customer-support productivity by about 14% on average and much more for novice or lower-skilled agents.
This is one of the clearest examples where high exposure does not automatically mean immediate displacement because measured augmentation gains arrive first.
Noy and Zhang show large productivity gains on professional writing tasks, making writing, marketing, and communications work strong candidates for productivity-style notes rather than pure replacement claims.
This adds experimental task evidence to occupations that otherwise only show up under generic exposure or usage notes.
Hui, Reshef, and Zhou find lower employment and earnings in highly affected freelance occupations, which is the clearest early warning for modular text and image work sold in external markets.
This is the cleanest way to put an actual measured downside next to occupations that are easy to unbundle and buy on demand.
The ILO's refined global index keeps clerical occupations at the top of exposure rankings and frames the likely outcome as job transformation more often than outright replacement.
This supplies an official cross-country anchor for office and clerical occupations that otherwise rely mostly on vendor-side or platform-side studies.
Other papers and reports worth reading on this topic
Each of these sources now has a local markdown summary in the research library. Some of them also feed the occupation-note layer where the mapping is narrow enough to stay defensible, while the rest remain broader framing and cross-check material.
2023-2024
Best starting point for the high-level claim that task exposure can be broad even when realized labor-market effects remain uncertain.
2023
Useful if you want a fast ranking-oriented view of which occupations and industries are likely to move first.
2023 / 2025
Still the strongest workplace evidence for augmentation and skill-gap compression in a structured service setting.
2023
Good companion to the field evidence if you want a clean experimental case for large productivity gains on codifiable language work.
2024
One of the clearest papers for why the first negative labor effects show up in freelance and task-market settings.
2025
Best official source for global occupational exposure ranges, clerical concentration, and gender differences in exposure.
2024
Strongest single report for comparing exposure across advanced, emerging-market, and low-income economies.
2023
Best official corrective to overclaiming near-term job destruction from exposure statistics alone.