Based on research from Anthropic / Princeton / Max Planck Institute

WhyTasteMatters

How Resource Constraints Forge Irreplaceable Human Judgment

A counterintuitive conclusion: the human brain makes judgments LLMs cannot, not because it is more powerful, but because it is more constrained.

~4,255x

Human Brain Compression

~71x

LLM Compression

The human brain compresses experience far harder than LLMs. Taste lives in that gap.

2026-04 · Two rounds of fact-checking passed · 41 claims verified

The Origin

A Physics Professor's Discovery

Harvard physics professor Matthew Schwartz used Claude Opus 4.5 to complete a quantum field theory paper in two weeks. The same work would typically take a professor and graduate student one to two years.

Claude was extraordinarily capable, but Schwartz identified one critical missing ability:

"I am more confident that the bottleneck is not creativity. LLMs are profoundly creative. They simply lack a sense of which paths might be fruitful before walking them. I think we can distill what is missing in current LLMs to a single word: Taste."

— Matthew Schwartz, "Vibe Physics: The AI Grad Student", Anthropic Research, 2026-03-23

Taste is not a creativity problem. Claude's creativity is impressive. The bottleneck is filtering and judgment: which directions are worth pursuing, and which are not. Schwartz can quickly judge whether a research direction has potential; Claude mechanically attempts all possible directions.

This judgment is universally applicable: science, engineering, design, investing, any domain requiring directional decisions under uncertainty.

Surface vs Depth

What If LLM's Advantage Is the Obstacle?

On the surface, taste deficiency has several explanations: training data only shows successful paths, training objectives optimize for "looking right" rather than "judging right", lack of exploration drive. These are largely solvable engineering problems.

But there's a deeper question: what if the LLM's architectural advantages, high capacity and full information access, are themselves the structural barrier to producing taste?

This leads to the core thesis: Resource constraints are an effective path to producing Taste. Humans are locked into this path by cognitive architecture, while LLMs have not yet found an equally effective alternative.

The Math of Compression

A Simple Arithmetic Problem

	Input	Storage	Compression Ratio
LLaMA 3.1 405B	~57.7 TB text	~810 GB (BF16)	~71x
Human Brain (order-of-magnitude)	~250 PB sensory input	~58.75 TB (100T synapses × 4.7 bits)	~4,255x

The LLM's 71x compression is essentially statistical pattern extraction: memorizing "which words co-occur." The brain's 4000x+ compression necessarily involves massive abstraction, hierarchical organization, and cross-modal integration. The same experience, the brain is forced to fit into a far smaller "storage budget."

This gap is unlikely to be accidental. To explain where it comes from, there are four independent research threads; connect them and you get a chain running from physical-layer constraint to directional judgment.

Important caveat: This comparison is an illustrative analogy, not a strict like-for-like measurement. The two "compressions" differ fundamentally at the operational level: LLMs encode text statistics into parameters in a one-shot batch process; the brain encodes multimodal sensory streams into synaptic connections through decades of online learning. The human-side "input" is also raw sensory throughput full of redundancy, while the LLM uses a cleaned corpus, which systematically inflates the human ratio. Placing these ratios side by side assumes they are comparable at the abstract level of "extracting usable representations from large inputs."

Sources: Synaptic 4.7 bits — Bartol et al. (2015, eLife, Salk Institute), hippocampal measurement, extrapolation to whole brain is a simplifying assumption; Sensory bandwidth ~1 Gbps — Zheng & Meister (Caltech/Neuron, 2024); LLM weights are not strictly equivalent to "compressed storage."

Core Argument

From Constraint to Taste in Four Steps

Four independent research threads that, assembled together, form a logical chain from physical-layer constraints to directional judgment. Each step has its empirical basis, but connecting them into a causal chain is this article's original synthesis, not any single paper's conclusion.

Gigerenzer & Brighton, Max Planck Institute, "Homo Heuristicus" (2009)

Less Is More: Ignoring Information Can Improve Judgment

A counterintuitive finding: using less information yields higher prediction accuracy. The take-the-best heuristic, using only one cue, achieves higher average prediction accuracy than multiple regression across 20 cross-domain datasets (Czerlinski et al. 1999), and also outperforms neural networks (Brighton 2006).

The mechanism is the bias-variance tradeoff: full-information models have low bias but high variance, leading to overfitting. Heuristics have slightly higher bias but dramatically lower variance, enabling better generalization.

This provides the endpoint for the entire argument: if "decisively ignoring most information" can be an effective strategy under high uncertainty, then the mechanism that produces this "ignoring ability" is worth investigating.

Academic controversy: Less-is-More is not consensus. The Kahneman camp argues heuristics often produce systematic biases; the effect is also highly conditional on environmental structure (noise level, cue redundancy and weight skew, sample size). Dougherty et al. (2008) show it holds only under specific conditions, and Bröder (2010) reanalyzed Gigerenzer's own data to question its generality. This article adopts the framework for its explanatory power regarding "why constraints might be advantageous," but this is an active debate rather than settled science, and its conditionality propagates all the way to this article's conclusion.

Turner & Arumugam, Princeton 2025 + Lieder & Griffiths, Resource Rationality (2020)

Capacity Constraints Force Tradeoffs; Tradeoffs Produce Structure

Turner & Arumugam's formal conclusion: capacity constraints create structural trade-offs between tasks, forcing agents to choose which distinctions are worth preserving (state chunking). The paper treats this as an upper bound on the scope of intelligence, a cost, not an advantage — the constrained agent can only coarse-grain its environment, losing performance on specific tasks.

Lieder & Griffiths argue the same point from another angle: under limited computational resources, the optimal strategy is not exact solutions but "good enough" approximations, which correspond to the heuristics observed in cognitive psychology.

The human brain's working memory holds roughly 3-5 chunks (Cowan 2001), forcing it to retain only the most essential patterns. LLMs' high capacity means they don't need to make such tradeoffs.

This article's inference (not the paper's conclusion): Constraint-forced coarse-graining may, in open environments, produce exactly the kind of abstraction useful for directional judgment. The paper itself only establishes "constraints bring tradeoffs and a performance cost"; the step from cost to useful judgment is this article's reasoning leap, which readers should note.

Webb, Frankland & Cohen, Trends in Cognitive Sciences, 2024 (review)

Bottlenecks Force Networks to Learn Relations, Not Surfaces

A line of work (Webb et al. 2020, Kerg et al. 2022, Altabaa et al. 2023, surveyed by the 2024 review above) finds that adding a "relational bottleneck" to neural networks — blocking direct access to individual attributes and allowing only relational information through — causes them to automatically extract relational structure rather than surface features, with markedly better generalization.

One distinction matters here: this "bottleneck" constrains the type of information (only relations pass), which is not the same as the capacity/volume bottleneck the earlier steps discuss. One limits "what kind of information," the other limits "how much." Different mechanisms, but the same direction: a constrained system is forced to learn structure rather than memorize detail.

The human brain is subject to both: hierarchical capacity bottlenecks (sensation → attention → working memory → long-term memory, each layer enforcing compression) layered on top of a bias toward relational structure. The standard Transformer is weaker on both — self-attention lets every token access all others, placing almost no gate on information flow at inference time. (The last sentence is this article's extension, not the paper's conclusion.)

Core: A bottleneck is not a defect; it is an inductive bias that forces structural reasoning over memorization.

Karl Friston, UCL, Free Energy Principle (2006/2009/2010)

Prediction-Driven Compression Naturally Tends Toward Parsimonious Causal Models

FEP holds that the brain's fundamental function is minimizing prediction error: building an internal model that predicts the most phenomena with the least information. Successful predictions are "explained away"; only surprises propagate upward.

This naturally tends toward Occam's razor, the biological implementation of "explain the most with the fewest assumptions." The brain doesn't try to remember every detail of the world; it continuously compresses toward the simplest-but-sufficient world model.

LLMs also "predict the next token," but their compression objective (fidelity/perplexity) differs fundamentally from the brain's (behavioral fitness). Whether LLMs form internal causal models remains debated (Li et al. 2023).

Core: Prediction-driven compression produces parsimonious causal models, the intuition for "which variables truly matter," which is the cognitive foundation of judgment. Two notes: as a grand unified brain theory, FEP itself faces "unfalsifiable / circular" critiques (e.g. Williams 2018); this article takes only its more specific, predictive-coding-consistent layer that "prediction-driven compression tends toward parsimonious models." Generalizing from a perceptual theory to "directional judgment" is this article's extension.

Four steps combined: Constraint → Tradeoff → Structural representation → Parsimonious causal model → Directional intuition = Taste

The Reversal

Why the LLM Path Doesn't Work

Run the chain in reverse for an LLM and it becomes clear the problem isn't "not good enough yet." Compute is abundant, so the model needn't compress aggressively; not forced to abstract, it tends to retain as many statistical associations as possible; so in an uncertain, open environment it more readily overfits to surface patterns and struggles to decisively prune irrelevant directions. The result is the absence of Taste. In other words, the LLM's architectural advantages — high capacity, wide information flow — are precisely the structural barrier to producing Taste.

One distinction from Gigerenzer pins down the crux:

Accuracy-effort tradeoff: "More information is too expensive, so use heuristics." This is a cost problem, solvable with cheaper compute.

Less-is-more: "More information itself reduces accuracy." This cannot be solved with more compute.

LLMs face the latter. This is not a compute problem.

Boundary Conditions

But "Constraint Produces Taste" Overstates It

Two systems look like counterexamples and must be addressed before the thesis can stand.

AlphaGo / AlphaProof: AlphaGo developed superhuman "intuition" through self-play; AlphaProof reached silver-medal level at the 2024 International Mathematical Olympiad (28/42 points, one point short of gold). Aren't these unconstrained systems producing Taste?

The key variable may not be "constraint" itself, but feedback density × environmental closure. Go and competition math are hard, but share two features: clear success/failure signals and the ability to generate training samples at scale. This makes self-play an equivalent "compression path" — the value function produced by massive trial-and-error is functionally similar to human intuitive judgment.

The Taste Schwartz describes appears at the other end: feedback delayed by months or years, no clear right/wrong signal, impossible to generate samples at scale. In such open environments, human constraint-based compression (decades of slowly accumulated intuition) remains the only known effective path.

Refined boundary: This article's thesis is not "constraint is the only source of Taste," but rather "in open environments with sparse feedback, constraint-based compression is the only currently known effective path." AI has found alternative paths in feedback-dense closed and semi-closed domains (self-play), but not yet in truly open ones.

Positive Signals & Falsifiability

What Evidence Would Falsify It

Narrowing the boundary this far actually makes the thesis sturdier, but it is still a judgment that could be overturned. First, a signal that is approaching the boundary.

The Fudan / OpenMOSS RLCF work (arXiv 2603.14473, 2026-03) constructed roughly 700K positive/negative sample pairs from citation differentials to train a Scientific Judge that surpasses GPT-5.2 and Gemini-3 Pro at judging paper impact. Notably, the same work also used that Judge as a reward model to train a Scientific Thinker that actively proposes research ideas of higher potential impact, not merely judging existing ones — and that step lands squarely on the first falsifiability condition listed below.

Yet it still falls short of overturning the thesis: the Thinker's "high impact" is still scored by that citation-trained Judge, a closed proxy loop, not the delayed, sparse, real-world feedback of an open environment; and it operates on abstract-level preference pairs rather than the long-horizon judgment Schwartz describes. It is genuine pressure on the thesis, but not yet a counterexample.

Falsifiability conditions for this article's thesis: the argument is seriously weakened if either occurs:

1. An LLM without special constraint-based design, with no human directional guidance, systematically makes correct directional judgments in open domains with sparse feedback (e.g., theoretical physics, original product design), and this performance is systematic rather than statistical coincidence;

2. Model distillation or information bottleneck training is shown to systematically produce Taste-like directional judgment (this would actually support "constraint works," but show human biological constraint isn't the only path).

The second is partly touched by RLCF; the first has not occurred. Researchers should keep testing against these conditions rather than treating this article's thesis as an unfalsifiable axiom.

Until falsified, this judgment has direct implications for everyone who works with AI. That ends the theory; what follows is its concrete shape in everyday work.

Practice

Human + AI Collaboration Principles

Reframing the Relationship

Two Complementary Cognitive Systems

If resource constraint really is the source of Taste, then the human-AI relationship shouldn't be framed as "who replaces whom": their cognitive structures are simply pinned at different positions.

Popular narratives either cast "AI as a tool, human as user" (treating AI as a hammer) or "AI replaces humans" (putting both on the same axis of competition). Neither is quite right. A more accurate framing: given the current capability distribution, humans and AI each hold comparative advantages on different types of cognitive tasks.

AI's comparative advantage: Exhaustive search, logical deduction, tireless execution, broad coverage, akin to Kahneman's "slow thinking" (but far faster than humans).

Human's comparative advantage: Directional intuition, anomaly detection, fast judgment under constraint, the product of decades of trial-and-error compressed into heuristics.

This is not a direct application of Kahneman's System 1/2 theory (which describes two processing modes within the same agent), but rather a comparative-advantage observation based on current capability boundaries: Schwartz completing a paper in two weeks was a professor with taste handling directional judgment, plus a tireless AI grad student handling execution, exceeding what either could achieve alone.

Your Core Value: Being AI's System 1

Direction judgment: Is this path worth walking? (Taste)

Anomaly detection: This result "feels wrong." Schwartz discovered Claude fabricating data through exactly this intuition.

Quality standards: What counts as "good enough"? What "looks good but is actually problematic"?

Problem definition: The hardest part isn't solving problems, it's knowing which problem to solve.

"Expert wrongness-detection is the key human capability — the ability to feel when something is wrong."

— Nova Spivack

Collaboration Framework

The Human-AI Cognitive Stack

The higher up, the greater the uncertainty and the more taste is needed. The lower down, the more AI can operate autonomously.

Problem Definition & Direction Selection

What problem should we solve? Which path?

Human-led

Solution Design & Quality Standards

What counts as good? Where are the traps?

Human + AI

Execution & Implementation

Build it in this direction

AI + Human review

Detail Polish & Formatting

Fix formatting, minor issues

AI-led

Schwartz's practice perfectly fits this model: he chose topics, set directions, and judged correctness; Claude did computations, wrote code, and generated figures.

Evidence

From One Case to 400K Sessions

Schwartz is persuasive, but in the end he is one person's experience. If this division of labor is real, it should show the same shape across a large sample. In June 2026, a study from Anthropic's Economic Research team supplied exactly that scale: a privacy-preserving analysis of roughly 235K people and about 400K interactive Claude Code sessions (October 2025 to April 2026). The real-world division of labor turned out to be almost exactly the cognitive stack above.

70% / 20%

Decisions the human leads

At the median, people made ~70% of the "what to build" (planning) decisions but only ~20% of the "how to build it" (execution) ones. Direction to the human, execution to the agent.

2×+

Expert vs novice success

The expert tier reached verified success more than twice as often as the novice tier. Task-level domain expertise is what decides the outcome.

<7pp

Spread across top occupations

Among code-producing sessions, success rates across the top ten occupations fell within 7 percentage points. A coding background barely decides success.

The most telling part is how the study defined "expertise": unrelated to job title and task-level, with three signals — two of them being what the user asked the AI to verify, and whether the user was correcting the AI or the AI correcting the user. That is exactly the directional judgment and anomaly detection this article describes. Schwartz used it to catch Claude fabricating data; across 400K sessions, it bought a 2× success rate.

A 2× success rate is the upside of judgment. But the same division of labor rests on a quiet condition: it holds only as long as the human keeps holding that top layer of directional judgment. And the more conveniently you hand work to AI, the more quietly that very layer atrophies.

The Paradox

The More You Use AI, the Easier It Is to Lose Taste

Three risks observed in research:

-17%

mastery decline · approx. two letter grades

Skill Atrophy

AI-assisted groups showed 17% lower programming mastery than hand-coding groups. The largest gap was in debugging, precisely the skill needed to catch AI errors.

-5.2pp

decline in detecting missing context

Scrutiny Erosion

The more polished AI output becomes, the less humans question its reasoning (-3.1pp) and the less they notice missing context. Better-looking output leads to less critical thinking.

∞

self-dissolving loop · not a measured value

Oversight Paradox

Effective AI use requires oversight ability → oversight comes from hands-on experience → AI use reduces hands-on opportunities. A self-dissolving loop.

"Having my skills atrophy is primarily gonna be problematic with respect to my ability to safely use AI for the tasks that I care about."

— Anthropic engineer, Internal Study, 2025-12

Action Guide

Three Principles for Preserving Taste

Since the more you use it the easier it is to lose, preserving Taste can't rely on good intentions; it needs a few deliberate rules.

Principle 1: Be the Rider, Not the Passenger

Spivack's rider-horse metaphor: the rider doesn't just give direction, they continuously sense the horse's state — hesitation, drift, overconfidence. Don't "send a prompt and wait for results." Stay engaged throughout, correcting in real time. Break work into stages, and at each stage examine direction, challenge assumptions, correct course.

Principle 2: Maintain the "Having Done It" State

The compression-ratio insight: taste comes from the compression of firsthand trial-and-error, not from reading. Periodically do work "without AI," deliberately maintaining your System 1 judgment. Schwartz spent 25 years in theoretical physics, which is why he could catch Claude's errors.

"Every once in a while, even if I know that Claude can nail a problem, I will not ask it to. It helps me keep myself sharp."

— Anthropic engineer

Principle 3: Spend Energy Where AI Is Weakest

Don't burn attention on the execution layer where AI excels. Concentrate cognitive resources on problem definition, direction selection, anomaly detection, quality judgment. The higher the level of judgment, the less delegatable it is: strategy > design > implementation > details.

How It Lands by Role

Researchers / Experts

Your moat is not "how much you know" but "being able to judge what's worth doing." Use AI to expand execution bandwidth, while periodically doing deep work without AI to keep your feel for directional judgment.

Managers / Decision-Makers

Don't only measure "how much efficiency AI added"; also watch "whether the team's judgment is atrophying." Give teams (especially junior members) space to practice without AI, and redeploy people from the execution layer to the judgment layer.

Students / Newcomers

The most dangerous path is over-relying on AI from the start, skipping the trial-and-error period needed to build taste. First build basic judgment through hands-on work, then introduce AI as an accelerator.

"Get to know these models. Learn what they are good at and what they fail at."

— Matthew Schwartz

Conclusion

Constraints Shape Wisdom

Resource constraints are an effective path to producing Taste, possibly the only verified path in open environments so far. The brain took this path not because it chose the better algorithm, but because it had no choice.

The brain, forced to retain only the most essential patterns from an information flood, developed fast-judgment heuristics. LLMs, which tend to retain far more statistical associations than the brain, have not yet found an equivalent path to directional judgment in feedback-sparse open environments.

This is not an eternal conclusion; it is a judgment based on current evidence, accompanied by explicit falsifiability conditions. But until falsified, it has practical implications for everyone who uses AI:

In the AI era, human value lies not in doing more or faster, but in judging what is worth doing. Taste is the product of decades of trial-and-error compressed to its essence; it lets you reject in one second a direction that AI would need an hour to falsify. Preserving this judgment is the scarcest resource you contribute to the human-AI cognitive partnership.

Appendix

Common Objections

Three facts that seem to directly refute this article's thesis, and the responses.

1. Reasoning models (o1/o3/Extended Thinking) introduce "deep thinking" — isn't that an artificial information bottleneck?

CoT is a compute-time constraint on reasoning, not a capacity constraint on representation. The model still retains its full parameter space and all knowledge; CoT just makes it "think longer," not "forced to discard information." Analogy: a librarian spending more time selecting books (CoT) vs. a reader with only one shelf forced to curate their collection (the brain) — different sources of "taste." Reasoning models improve search efficiency, but the Taste Schwartz observed is knowing the direction without needing to search, a representation-level capability, not a search-level one.

2. Scaling Laws show larger models consistently generalize better — if "constraint produces wisdom," why does more parameters mean stronger?

The two operate at different levels. Larger models generalize better on known tasks (text prediction, reasoning, QA), which is undisputed. But the Taste deficit Schwartz observed appeared precisely when the model was already extremely powerful (Opus 4.5), suggesting what scaling solves and the Taste problem are not on the same dimension. Taste is not "better generalization" but "judging which direction to pursue under uncertainty," a meta-level directional intuition, not object-level task performance.

3. Isn't model distillation "forced compression"?

Distillation is indeed rate-distortion compression, but with a key difference: its optimization target is "preserve the teacher model's output distribution as faithfully as possible," i.e., fidelity. The brain's compression has no "fidelity" target; it only has indirect fitness feedback. Precisely because there's no fidelity constraint, the brain's compression produces highly subjective, ecologically-tuned heuristics. Distillation produces "a smaller generalist"; the brain produces "a biased specialist," and the latter is Taste.

References

Sources

Core Sources

Schwartz, M.D. "Vibe Physics: The AI Grad Student." Anthropic Research, 2026-03-23
Tong, J. et al. "AI Can Learn Scientific Taste." arXiv:2603.14473, 2026-03 (Fudan / OpenMOSS). Trains both a Scientific Judge (connoisseurship) and a Scientific Thinker (actively proposing high-impact ideas).
Pan, L. et al. "Large Language Models Think Too Fast To Explore Effectively." arXiv:2501.18009, 2025 (Georgia Tech)
Spivack, N. "The Horse Has No Rider: Why Autonomous AI Science Gets It Wrong." 2026-03-23
Ding, A.W. & Li, S. "Generative AI lacks the human creativity to achieve scientific discovery from scratch." Nature Scientific Reports, 2025-03

Cognitive Science Theory

Gigerenzer, G. & Brighton, H. "Homo Heuristicus: Why Biased Minds Make Better Inferences." Topics in Cognitive Science, 2009 (Max Planck Institute)
Bröder, A. Commentary and reanalysis of "Homo Heuristicus," Topics in Cognitive Science, 2010 (a representative challenge to heuristic generality)
Lieder, F. & Griffiths, T.L. "Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources." Behavioral and Brain Sciences, 43, e1, 2020
Turner, C.R. & Arumugam, D. "Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence." Princeton, 2025
Webb, T.W., Frankland, S.M. & Cohen, J.D. "The Relational Bottleneck as an Inductive Bias for Efficient Abstraction." Trends in Cognitive Sciences, 28(9):829–843, 2024 (review; primary results in Webb et al. 2020, Kerg et al. 2022, Altabaa et al. 2023)
Friston, K. "The free-energy principle: a unified brain theory?" Nature Reviews Neuroscience, 2010 (UCL)
Williams, D. and others on the "unfalsifiable / circular" critique of the Free Energy Principle (2018; see also Colombo & Wright 2021)
Bartol, T.M. et al. "Nanoconnectomic upper bound on the variability of synaptic plasticity." eLife, 2015 (Salk Institute)
Cowan, N. "The magical number 4 in short-term memory." Behavioral and Brain Sciences, 24, 87-114, 2001
Zheng, J. & Meister, M. "The Unbearable Slowness of Being." Neuron, 2024 (Caltech)

Anthropic Research

Huang, S. et al. "How AI Is Transforming Work at Anthropic." Anthropic Research, 2025-12-02
Hitzig, Z., Massenkoff, M., Lyubich, E., Heller, R. & McCrory, P. "Agentic Coding and Persistent Returns to Expertise." Anthropic Economic Research, 2026-06-16 (~235K people / ~400K Claude Code sessions)
Shen, J.H. & Tamkin, A. "How AI Assistance Impacts the Formation of Coding Skills." arXiv:2601.20245, 2026
Swanson, K. et al. "Anthropic Education Report: The AI Fluency Index." Anthropic Research, 2026-02-23

Other

Delétang, G. et al. "Language Modeling Is Compression." DeepMind, 2023 (ICLR 2024)
Shannon, C.E. "Prediction and Entropy of Printed English." Bell System Technical Journal, 1951