Milestones to reach AGI

August 2, 2025
blog image

François Chollet is a widely recognized figure in the field of artificial intelligence, not only for creating Keras, one of the most widely used deep learning libraries in the world, but also for challenging the mainstream trajectory of AI research. While much of the field celebrates scaling laws and massive models as the path forward, Chollet has consistently pushed back, arguing that this approach represents a fundamental misunderstanding of what intelligence really is. His work goes beyond engineering; it is a call to rethink the foundations of the discipline and to anchor progress in a scientifically rigorous and philosophically coherent definition of intelligence. For Chollet, the problem is not that current models are powerful—they are—but that they are powerful in narrow, brittle ways that fail to address the real challenge of generalization and abstraction.

From his perspective, the current obsession with large language models and multimodal architectures reflects a dangerous illusion: that bigger datasets and more parameters will inevitably lead us to human-level general intelligence. These systems achieve state-of-the-art performance because they have absorbed nearly all available human-generated text and optimized over billions of gradient steps, not because they can autonomously reason about novel situations. Their competence is statistical, not conceptual; they interpolate within patterns rather than extrapolate beyond them. For Chollet, this is the crux of the issue. General intelligence cannot emerge from memorization, no matter how vast the dataset. True intelligence requires the ability to operate in open-ended domains, to form abstractions that compress experience into rules, and to apply these rules flexibly in entirely new contexts. Scaling brute force is a dead end because it sidesteps these requirements.

To provide a solid theoretical grounding for his critique, Chollet proposed one of the most precise formal definitions of intelligence to date: “Intelligence is a measure of skill acquisition efficiency over a scope of tasks, relative to priors, experience, and generalization difficulty.” This definition reframes the conversation by emphasizing efficiency, adaptability, and scope rather than raw performance. Intelligence, under this lens, is not the sum of skills a system possesses but the process that generates those skills efficiently. Humans are not born knowing language or algebra; they are born with the ability to learn these skills rapidly under constraints. By contrast, today’s AI systems require millions of labeled examples and enormous compute budgets to approximate abilities that humans learn from a handful of demonstrations. Chollet’s definition exposes this gap and shows why current evaluation metrics are inadequate for measuring real progress toward AGI.

To operationalize this vision, Chollet introduced the Abstraction and Reasoning Corpus (ARC), a benchmark explicitly designed to test the ability to generalize to unseen tasks using minimal examples. Unlike conventional benchmarks that can be conquered through memorization or pretraining on massive datasets, ARC presents problems drawn from a combinatorial design space so vast that no dataset can cover it. Solving ARC requires discovering abstract structural rules—such as symmetry, color grouping, or object persistence—from just three to five demonstrations and applying them to new cases. These are the very cognitive moves humans make instinctively. Yet, despite years of progress in deep learning, ARC remains a steep challenge for AI systems: humans routinely score above 95%, while cutting-edge models languish below 40%. This persistent gap is not an accident; it reveals what Chollet considers the true bottleneck for AGI—our failure to build systems capable of abstraction, compositionality, and causal reasoning.

Based on these insights, Chollet has laid out a roadmap of milestones that must be achieved for human-level general intelligence to become attainable. First, research priorities must shift away from static benchmarks and brute-force scaling toward dynamic evaluations of generalization. Systems should be measured by their ability to learn quickly, reason flexibly, and adapt autonomously, not by their ability to memorize ever-larger datasets. Second, AI architectures must incorporate mechanisms for autonomous abstraction formation—the ability to synthesize new concepts from raw observations without explicit programming. Third, compositional reasoning must be a core design principle: intelligent systems should build complex solutions by recombining simpler elements, mimicking the combinatorial creativity of human thought. Without these capabilities, models will remain trapped in statistical mimicry, unable to transcend the confines of their training distributions.

Equally important, Chollet insists that efficiency—not scale—defines intelligence. Current models consume terabytes of text and petaflops of compute to achieve competence in language tasks, whereas humans achieve comparable mastery of natural language through a few years of sparse experience. The path to HGI requires architectures that are data-frugal, compute-efficient, and energy-conscious, reflecting the astonishing economy of the human brain. This efficiency imperative extends beyond resources to knowledge representation: systems must encode experience in modular, reusable abstractions rather than sprawling, entangled weight matrices. Without compact and transferable internal structures, adaptation will remain prohibitively expensive, making lifelong learning impossible.

Another pillar of Chollet’s vision is structural cognition: the integration of neural and symbolic paradigms. Pure pattern-matching systems, no matter how large, lack the algorithmic scaffolding required for systematic reasoning and causal inference. By combining the perceptual strengths of neural networks with the structured logic of symbolic systems, we can create architectures capable of both recognizing patterns and manipulating rules. This hybrid approach, coupled with meta-learning, would enable systems to reflect on their own strategies, improving their learning processes over time. For Chollet, meta-learning is the real lever of intelligence, because it transforms experience into acceleration: each solved problem makes the next one easier, closing the loop toward autonomous self-improvement.

Finally, Chollet argues that intelligence is purposive. It is not enough for systems to respond passively to prompts; they must demonstrate agency—the ability to generate goals, prioritize them under constraints, and navigate trade-offs under uncertainty. Creativity emerges from this agency: the capacity to produce novel, useful, and contextually appropriate solutions beyond memorized patterns. Risk-aware decision-making, dynamic goal management, and autonomous planning are not peripheral features but core requirements for any system aspiring to human-level generality. In this sense, the milestones Chollet envisions are not incremental extensions of today’s deep learning—they demand a paradigm shift. From static pattern recognition to dynamic, adaptive, and self-directed intelligence; from monolithic architectures to modular hybrids; from brute-force scaling to elegant, resource-efficient design.

Chollet’s roadmap is both sobering and inspiring. It rejects the seductive simplicity of “just make it bigger” and calls for deeper questions: How do we formalize abstraction? How do we measure generalization fairly? How do we architect systems that learn as efficiently as humans? These questions define not only a technical challenge but a philosophical stance: that intelligence is a process, not a dataset; a system’s capacity to continually invent solutions, not its ability to replay them. For Chollet, the journey to AGI is not about adding layers but about building systems that learn how to learn, reason about their reasoning, and evolve their own capabilities with minimal supervision. Until we meet these milestones, what we call “intelligence” will remain an illusion painted by scale.

Summary

1. Fundamental Shift in AI Research Focus

Core Idea: Stop chasing benchmark scores and scaling models; start prioritizing generalization and skill acquisition efficiency.


2. Autonomous Abstraction & Concept Formation

Core Idea: Intelligence = ability to create and manipulate abstractions autonomously.


3. Data & Resource Efficiency

Core Idea: Intelligence = doing more with less.


4. Symbolic-Neural Hybrid & Structural Design

Core Idea: Pure deep learning won’t reach HGI; structured cognition is essential.


5. Autonomous Meta-Learning & Self-Improvement

Core Idea: HGI systems must learn how to learn—and do it autonomously.


6. Agency, Goal-Directedness & Creativity

Core Idea: Intelligence is purposive, not reactive.


The Milestones in Detail

Group 1: Fundamental Shift in AI Research Focus

Group Definition and Context

Chollet repeatedly stresses that the AI community’s current trajectory—dominated by scaling up deep learning and pursuing benchmark scores—is insufficient and misaligned with achieving AGI. Current systems excel at narrow tasks and pattern recognition but fail catastrophically at generalization and abstraction. This failure is central to his ARC work and his critique of benchmark-driven progress.

He argues that true intelligence is not about task-specific mastery but about the efficiency of acquiring new skills across a wide variety of novel tasks under resource constraints. To achieve this, he calls for a paradigm shift in research priorities away from “brute-force scaling” toward generalization-centric, resource-efficient, and autonomy-driven AI research.


Observation 1: Prioritize Generalization Over Narrow Skills

Definition

Redirect AI research from achieving high performance on narrow, well-defined benchmarks to building systems that generalize to new, unseen tasks, requiring minimal retraining and leveraging abstract reasoning.

Logic (Chollet’s Argument)

Implementation / Measurement

Current AGI Status


Observation 2: Abandon Pure Scaling and Memorization Approaches

Definition

Stop relying on brute-force data and compute scaling as the primary strategy for progress toward AGI.

Logic

Implementation / Measurement

Current AGI Status


Observation 3: Explicitly Measure Intelligence Relative to Priors

Definition

Design benchmarks that explicitly account for priors (innate knowledge) used by the system, ensuring fair comparisons across architectures and aligning with human cognition.

Logic

Implementation / Measurement

Current AGI Status


Observation 4: Pursue Open-Ended Intelligence Challenges

Definition

Develop benchmarks and frameworks that simulate open-ended problem spaces, forcing AI systems to tackle genuinely novel and diverse tasks that cannot be memorized or brute-forced.

Logic

Implementation / Measurement

Current AGI Status


Summary for Group 1:

Chollet insists that achieving AGI requires a foundational paradigm shift:

Bottom line: Stop measuring “who memorizes better”; start measuring who learns faster, reasons deeper, and adapts more flexibly with minimal priors and resources.


Group 2: Autonomous Abstraction & Concept Formation

Group Definition and Context

Chollet repeatedly emphasizes that abstraction is the engine of intelligence. While current AI models excel at pattern recognition, they fundamentally fail at creating new abstractions. This gap explains why models like GPT-4 can mimic reasoning patterns in text but collapse when facing tasks requiring genuine conceptual synthesis (e.g., ARC puzzles).
To achieve AGI, systems must autonomously discover representations, form abstractions, and recombine them compositionally across domains.


Observation 5: Develop Autonomous Abstraction Capabilities

Definition

Build systems capable of autonomously generating abstract rules or concepts from raw observations, without explicit human-coded templates or brute-force memorization.

Logic (Chollet’s Argument)

Implementation / Measurement

Current AGI Status


Observation 6: Emphasize Explicit Compositional Reasoning

Definition

Enable systems to compose new ideas or solutions by combining simpler concepts already known, producing novel but structured outputs.

Logic

Implementation / Measurement

Current AGI Status


Observation 7: Explicitly Improve Representation Learning

Definition

Equip AI systems to autonomously build efficient, interpretable internal representations that capture structure and enable reasoning.

Logic

Implementation / Measurement

Current AGI Status


Observation 8: Implement Explicit Hierarchical Reasoning

Definition

Design systems to reason across multiple abstraction layers, decomposing complex tasks into smaller steps and integrating sub-solutions.

Logic

Implementation / Measurement

Current AGI Status


Summary for Group 2

AGI demands abstraction as its foundation. Chollet insists that without autonomous abstraction formation, explicit compositionality, strong internal representation learning, and hierarchical reasoning, scaling will hit a wall. Current AI systems:

Bottom line: To reach AGI, we must replace “brute-force pattern fitting” with structured, self-directed concept formation and combinatorial reasoning architectures.


Group 3: Data & Resource Efficiency

Group Definition and Context

Chollet explicitly defines intelligence as skill-acquisition efficiency, which inherently involves minimizing the cost of learning and problem-solving in terms of data, compute, energy, and memory.
Current large-scale AI systems achieve impressive results, but their approach—brute-force scaling—contradicts efficiency principles. GPT-4’s massive training regime (trillions of tokens, megawatts of energy) is an example of what Chollet argues is a dead end for achieving AGI.

To reach AGI, research must pivot from bigger models → smarter algorithms, emphasizing architectures that learn fast, reason with little data, and use resources optimally.


Observation 9: Explicitly Prioritize Data Efficiency

Definition

Build systems that can learn robust abstractions and generalize to unseen tasks using minimal examples—as humans do.

Logic (Chollet’s View)

Implementation / Measurement

Current AGI Status


Observation 10: Develop Explicit Computational Efficiency

Definition

Enable models to reason and learn using minimal compute, rather than relying on massive parameter counts and training steps.

Logic

Implementation / Measurement

Current AGI Status


Observation 11: Explicitly Reduce Energy Consumption

Definition

Design AI systems that minimize energy per inference and per training epoch, approximating the energy efficiency of biological systems.

Logic

Implementation / Measurement

Current AGI Status


Observation 12: Explicitly Optimize Memory Use

Definition

Ensure systems store and retrieve knowledge compactly, modularly, and with minimal redundancy.

Logic

Implementation / Measurement

Current AGI Status


Summary for Group 3

AGI cannot be brute-forced by throwing more compute, data, and energy at the problem.
Chollet insists progress depends on:

Bottom line: AGI must be elegant—a system that does more with less, like the human brain.


Group 4: Symbolic-Neural Hybrid & Structural Design

Group Definition and Context

Chollet argues that current deep learning models lack structured reasoning and operate almost entirely through pattern interpolation, which is insufficient for true generalization and abstraction.
He emphasizes that hybrid systems—combining symbolic reasoning with the representational power of neural networks—are essential for AGI. Why? Because human intelligence relies on:


Observation 13: Explicitly Combine Symbolic and Neural Models

Definition

Develop architectures that integrate the statistical strength of neural networks with the structured, rule-based reasoning of symbolic systems.

Logic (Chollet’s Argument)

Implementation / Measurement

Current AGI Status


Observation 14: Incorporate Symbolic Meta-learning

Definition

Enable systems to adapt their learning strategies by dynamically building symbolic abstractions about tasks and reasoning paths.

Logic

Implementation / Measurement

Current AGI Status


Observation 15: Design Modular Architectures for Reuse

Definition

Build architectures where reasoning and perception modules are separable and reusable across domains, enabling flexible recombination of learned skills.

Logic

Implementation / Measurement

Current AGI Status


Observation 16: Integrate Symbolic Reasoning for Causality

Definition

Equip systems with explicit causal reasoning engines, moving beyond statistical correlation to genuine cause-effect understanding.

Logic

Implementation / Measurement

Current AGI Status


Summary for Group 4

AGI cannot emerge from monolithic pattern-matching models. Chollet prescribes structural intelligence built on:

Bottom line: Brains are structured, modular, causal; AGI systems must be too.


Group 5: Autonomous Meta-Learning & Self-Improvement

Group Definition and Context

Chollet argues that intelligence is not just the ability to learn but the ability to improve learning itself—what he calls skill-acquisition efficiency. Humans excel because they learn how to learn: every new experience refines our meta-strategies, enabling faster, more general adaptation in the future.

Current AI systems lack this capability. They “learn” through static optimization on massive datasets, then freeze their parameters. Any update = costly retraining, not autonomous refinement. For AGI, AI must:


Observation 17: Explicitly Implement Autonomous Meta-Learning

Definition

Enable systems to improve their own learning processes autonomously over time, using experience from diverse tasks to generalize faster in new ones.

Logic (Chollet’s View)

Implementation / Measurement

Current AGI Status


Observation 18: Prioritize Self-Reflection and Introspection

Definition

Develop AI that can evaluate its own knowledge gaps, reasoning errors, and confidence levels, enabling corrective action autonomously.

Logic

Implementation / Measurement

Current AGI Status


Observation 19: Create Explicitly Adaptive Systems

Definition

Systems must adapt incrementally to new tasks without catastrophic forgetting or full retraining, maintaining performance across old and new domains.

Logic

Implementation / Measurement

Current AGI Status


Observation 20: Foster Autonomous Error Detection and Correction

Definition

Equip systems with mechanisms to identify mistakes, infer causes, and generate self-corrections—without external re-labeling or retraining.

Logic

Implementation / Measurement

Current AGI Status


Summary for Group 5

HGI demands systems that:

Bottom line: True general intelligence is self-improving, not frozen at training time.
Chollet: “As long as learning is static, there is no intelligence—only a database of patterns.”


Group 6: Agency, Goal-Directedness & Creativity

Group Definition and Context

Chollet argues that intelligence is inherently active and purposive. Humans don’t just react—they set goals, plan, adapt strategies, and create novel solutions.
Current AI systems lack genuine agency: they execute externally specified tasks without self-generated objectives or adaptive decision-making. Chollet emphasizes that to reach AGI, systems must:


Observation 21: Explicitly Foster Autonomous Goal Setting

Definition

Develop systems capable of generating, prioritizing, and modifying goals without explicit external commands, aligned with high-level objectives.

Logic (Chollet’s Argument)

Implementation / Measurement

Current AGI Status


Observation 22: Train Systems for Flexible Goal Management

Definition

Enable AI to reprioritize objectives dynamically, managing multiple goals under evolving constraints.

Logic

Implementation / Measurement

Current AGI Status


Observation 23: Prioritize Explicit Creativity and Innovation

Definition

Equip AI to produce genuinely novel, useful, and context-appropriate ideas, not just recombinations of memorized patterns.

Logic

Implementation / Measurement

Current AGI Status


Observation 24: Teach Systems Risk-Aware Decision Making

Definition

Develop systems capable of balancing exploration and exploitation, reasoning under uncertainty, and evaluating trade-offs between risks and rewards.

Logic

Implementation / Measurement

Current AGI Status


Summary for Group 6

To achieve HGI, systems must go beyond passive pattern completion. They must:

Bottom line: True general intelligence = active, purposive, and creative.
Chollet: “Without agency, adaptation and creativity, intelligence is an illusion.”