
August 4, 2025
In recent years, rapid advancements in artificial intelligence have prompted researchers to revisit fundamental questions about the nature and structure of intelligence itself. Despite extraordinary achievements, modern AI systems remain surprisingly brittle, excelling within narrow domains yet faltering dramatically when faced with genuinely novel situations. This reality underscores a crucial distinction: while specialized skill and intelligence are often conflated, they are fundamentally different constructs. Intelligence, in essence, refers to the capacity for efficient adaptation, abstraction, and flexible problem-solving across diverse and unforeseen circumstances, rather than the mere accumulation of static skills or memorized knowledge.
General intelligence (GI), therefore, involves more than just mastering predefined tasks—it encompasses the broader cognitive competencies that allow an agent, human or artificial, to consistently adapt and thrive in dynamic environments. Central to this adaptability is the ability to form abstractions: the cognitive capability to distill complex, high-dimensional information into simplified, reusable mental models. Abstraction not only facilitates efficient generalization but also underpins a system's ability to rapidly acquire and recombine skills in creative ways. Without this critical foundation, intelligence remains rigid and limited, bound by the constraints of prior experiences and unable to break free from learned patterns.
Understanding the capacities required for general intelligence thus necessitates an exploration of core cognitive faculties. Researchers, notably François Chollet, have argued convincingly that genuine intelligence is best characterized by skill-acquisition efficiency—how quickly and effectively an entity can learn to solve new problems given minimal prior exposure or data. This conception shifts the goalpost for AI from performing well on extensively trained tasks to developing an intrinsic capability to grasp entirely new problems swiftly, employing minimal built-in cognitive priors such as basic notions of causality, numerosity, object permanence, and agentness.
In tandem with these foundational cognitive capabilities, general intelligence also demands substantial capacities for reasoning and hierarchical problem-solving. Complex real-world scenarios typically involve layers of interconnected sub-tasks and abstractions, demanding systematic hierarchical decomposition and compositional reasoning. Humans intuitively solve problems by dividing them into manageable pieces, leveraging prior knowledge to identify suitable abstractions, and flexibly recombining those abstractions in novel ways. Replicating this hierarchical and compositional reasoning in artificial systems remains a profound yet critical challenge on the path toward achieving GI.
Furthermore, general intelligence necessitates exceptional flexibility in managing uncertainty, ambiguity, and novelty—hallmarks of authentic real-world environments. True intelligence cannot rely on extensive training sets or exhaustive exposure to every potential scenario; it must robustly handle distributional shifts, incomplete information, and dynamic task requirements. Systems that fail under slight perturbations or minor variations, as contemporary deep learning models frequently do, illustrate clearly that current approaches lack the crucial resilience and adaptability required for genuinely generalizable intelligence.
Efficiency in resource utilization—computational, data-based, and representational—also emerges as a vital characteristic of genuinely intelligent systems. Human cognition demonstrates remarkable information efficiency, capable of mastering tasks quickly from sparse examples, often performing feats of abstraction and reasoning with minimal cognitive load. Replicating such efficiency in AI systems would mean moving beyond brute-force scaling of computational resources and toward approaches that emphasize compact, optimized representations, efficient memory usage, and economical learning strategies. Achieving intelligence thus involves not merely performing effectively but doing so within resource constraints comparable to human cognition.
Lastly, autonomy and the ability to self-modify define perhaps the most profound capacities underlying general intelligence. True GI systems must possess the capability for self-reflection, introspection, and continuous, incremental improvement. They must autonomously identify gaps in their own knowledge, detect and correct errors independently, and dynamically adapt their internal representations and reasoning methods without external reprogramming. Such autonomous self-improvement would mark a transformative shift in AI research, enabling artificial systems to progressively approach, and perhaps even surpass, the generality and flexibility of human intelligence. Exploring these critical dimensions not only clarifies the boundaries of our current achievements but also charts a meaningful path toward genuine artificial general intelligence.
Core cognitive capacities are foundational mental abilities necessary for intelligent reasoning, abstraction, and adaptation, independent of extensive experience.
Skill Acquisition Efficiency: Intelligence as the speed and effectiveness of acquiring new skills from limited experience.
Generalization Ability: Handling novel situations beyond prior experiences.
Abstraction Formation: Distilling complex data into simpler reusable patterns without explicit guidance.
Adaptability: Dynamically adjusting behaviors without external retraining.
Core Knowledge Utilization (Innate Priors): Leveraging innate minimal priors (e.g., causality, numerosity) to reason effectively.
Hierarchical Reasoning: Decomposing problems systematically across multiple abstraction levels.
Meta-Learning (Learning to Learn): Improving the skill acquisition process through experience.
Compositionality: Combining simpler learned concepts into novel complex solutions.
Causal Reasoning: Reasoning explicitly about cause-effect relationships.
Commonsense Understanding: Intuitive understanding of everyday physical and social dynamics.
Generalization & flexibility represent the capacity to handle new scenarios effectively, especially under uncertainty or novel circumstances.
Open-Ended Generalization: Robustly handling entirely new tasks without prior guidance.
Robustness to Distribution Shifts: Maintaining performance under significant changes from training conditions.
Cross-Domain Transfer: Applying learned knowledge effectively across fundamentally different domains.
Zero-Shot and Few-Shot Learning: Quickly performing new tasks with minimal examples.
Handling Uncertainty and Noise: Reasoning effectively despite incomplete or ambiguous information.
Continuous Adaptation: Incrementally updating knowledge without losing previously learned skills.
Flexible Goal Management: Dynamically adjusting and prioritizing goals based on changing conditions.
Novelty Detection: Explicitly recognizing and responding appropriately to novel inputs.
This group emphasizes the intelligent system's efficient use of computational, energy, data, memory, and representational resources.
Computational Efficiency: Achieving effective reasoning with minimal computational resources.
Energy Efficiency: Minimizing physical energy consumption in task completion and reasoning.
Data Efficiency: Learning effectively from minimal data examples.
Memory Optimization: Efficiently storing and managing learned information to minimize redundancy.
Representation Learning: Discovering meaningful internal abstractions autonomously from raw data.
Risk-Aware Decision Making: Balancing exploration and exploitation under uncertainty, optimizing risk and reward.
Autonomy & self-modification covers a system’s capacity for independent management, enhancement, and optimization without external intervention.
Autonomous Skill Generation: Independently identifying and acquiring new skills.
Self-Reflection and Introspection: Evaluating and reasoning about internal processes and performance autonomously.
Self-Update & Adaptation: Dynamically modifying internal structures to enhance learning efficiency autonomously.
Goal-Directed Behavior: Autonomously setting and dynamically pursuing explicit objectives.
Error Detection and Correction: Recognizing and rectifying errors or misconceptions autonomously.
Creativity and Innovation: Generating novel, contextually appropriate solutions independently.
Group Definition:
Core cognitive capacities represent foundational mental capabilities required to process, understand, reason about, and interact intelligently with various environments. They form the fundamental "building blocks" upon which intelligent behaviors and generalization rest, encompassing the autonomous ability to form abstractions, adapt efficiently, and reason using innate or minimal cognitive priors.
Definition:
Skill acquisition efficiency is the ability of a system to quickly and effectively acquire new skills. It is defined explicitly by Chollet as the central metric of intelligence: the more efficiently a system can acquire new skills, especially given limited experience and priors, the higher its intelligence.
Logic:
Intelligence fundamentally isn’t about possessing fixed or static skills (e.g., memorized chess openings). Instead, it’s about the speed and ease of acquiring new skills. The core argument Chollet emphasizes is that intelligence is a "skill-acquisition process," analogous to a factory that produces skills rather than the skills themselves.
Measurement:
Efficiency of few-shot or zero-shot learning tasks.
Performance on tasks like ARC, explicitly designed to measure rapid acquisition of novel concepts from minimal examples.
Learning curves: how quickly performance improves over minimal experiences.
Current AGI Performance:
Currently moderate-to-low. Models like GPT-4 perform adequately on tasks they’ve indirectly encountered in training but struggle severely with genuinely novel tasks, requiring significant data or task-specific fine-tuning.
Definition:
Generalization is the capability of a system to extend its knowledge beyond its prior experiences to handle situations or problems it has never encountered.
Logic:
Generalization is crucial to intelligence as it captures the essence of dealing with novelty and uncertainty. Chollet strongly argues that true intelligence lies in the capacity to generalize from limited data, using abstraction to handle previously unseen scenarios.
Measurement:
ARC benchmark, specifically designed to measure generalization to completely novel tasks.
Tests involving significant distribution shifts or out-of-distribution generalization challenges.
Current AGI Performance:
Generally weak. Chollet repeatedly points out (notably in the ARC papers and talks) that current AGI fails dramatically at ARC-style generalization, indicating it is far from human-level general intelligence.
Definition:
Abstraction formation is the autonomous ability to distill complex, high-dimensional data into simpler, reusable concepts or patterns independently, without explicit guidance from external agents.
Logic:
Chollet describes abstraction as the "engine" behind generalization. True abstraction means independently discovering representations of complex scenarios that facilitate efficient reasoning and skill transfer to new problems. It is fundamentally tied to intelligence because it transforms knowledge into flexible conceptual structures.
Measurement:
ARC challenges that explicitly require identifying abstract rules from minimal examples.
Bongard problems, Raven’s Progressive Matrices, or analogous tests measuring abstract pattern recognition and reasoning.
Current AGI Performance:
Poor. Current AGI, especially deep learning systems (including GPT-4), generally rely on learned statistical patterns rather than genuine independent abstraction formation. They struggle notably with ARC and other abstract reasoning benchmarks.
Definition:
Adaptability is the ability of a system to adjust its behavior, knowledge representations, or problem-solving strategies dynamically, without retraining or re-engineering by an external human developer.
Logic:
Chollet emphasizes adaptability as a hallmark of intelligent systems because real-world environments constantly evolve. Intelligence demands real-time adjustments to maintain functionality without losing performance due to changed circumstances.
Measurement:
Performance under distribution shifts and dynamic, continually evolving tasks.
Continuous learning tasks (online learning), evaluating if AGIs can incrementally adapt without forgetting or catastrophic degradation.
Current AGI Performance:
Limited. AGIs today largely require additional data, retraining, or explicit re-programming to effectively adapt to genuinely new or altered conditions, clearly outlined as a limitation in Chollet’s analysis of current AI failures.
Definition:
Core knowledge utilization refers to effectively leveraging minimal, innate cognitive priors (such as objectness, causality, numerosity, geometry, and agentness) to facilitate intelligent reasoning and learning.
Logic:
Inspired explicitly by Elizabeth Spelke’s work on human cognitive priors (core knowledge systems), Chollet emphasizes that intelligence should be measured relative to minimal, explicitly defined priors. Human intelligence relies on basic built-in knowledge structures to bootstrap learning, providing efficiency and stability for cognitive development.
Measurement:
Tasks explicitly constructed around core knowledge priors (object permanence, basic causality tests, intuitive numerosity).
ARC tasks explicitly testing understanding of elementary geometry, topology, and numerical concepts without external training data.
Current AGI Performance:
Moderate to superficial. Current AGIs can implicitly leverage certain priors (e.g., object detection), but lack explicit reasoning about core knowledge. Chollet highlights ARC as demonstrating severe weaknesses in explicitly understanding and manipulating these minimal cognitive priors.
Definition:
Hierarchical reasoning refers to the ability to reason about problems through multiple interconnected layers or levels of abstraction, systematically decomposing complex problems into simpler sub-problems.
Logic:
Chollet emphasizes hierarchical reasoning as essential for intelligent behavior because complex real-world problems inherently involve multiple abstraction layers. Effective intelligence integrates these layers smoothly to simplify and tackle complex scenarios efficiently.
Measurement:
Multi-step reasoning benchmarks (e.g., complex ARC tasks or math problems requiring explicit decomposition into simpler sub-steps).
Tasks explicitly designed to test hierarchical decomposition (e.g., complex Raven’s matrices, analogy-making tasks).
Current AGI Performance:
Limited-to-moderate. GPT-4 and related systems can perform step-by-step reasoning through carefully structured prompting (chain-of-thought), yet autonomous hierarchical decomposition and systematic multi-level reasoning remain significant weaknesses, as repeatedly noted in Chollet’s critiques and ARC analyses.
Definition:
Meta-learning, or "learning to learn," is the ability of a system to improve its overall skill-acquisition efficiency based on experiences from previously encountered learning tasks, making future learning processes faster or more efficient.
Logic:
Chollet explicitly frames intelligence as the process of acquiring skills. Therefore, meta-learning—the improvement of that very process—represents a fundamental form of intelligence. By continually refining its learning process, a truly intelligent system not only learns new tasks quickly but also learns how to become even better at learning over time.
Measurement:
Few-shot learning performance, specifically tasks designed to evaluate improvements across sequential tasks (e.g., continual learning benchmarks).
Efficiency improvements across repeated exposures to structured learning tasks.
Current AGI Performance:
Limited. Chollet argues current deep-learning approaches exhibit minimal genuine meta-learning capability, instead typically relying on memorizing or retrieving learned patterns. Systems today largely do not autonomously improve learning efficiency in meaningful ways across novel, unseen scenarios.
Definition:
Compositionality refers to the capability of a system to construct complex ideas, representations, or solutions by combining simpler, previously learned components or concepts in a novel manner.
Logic:
Chollet highlights compositionality as crucial because intelligent solutions are often inherently compositional: complex reasoning problems require breaking down and recombining simpler, previously acquired knowledge. A truly intelligent system autonomously leverages compositionality, flexibly recombining concepts across diverse tasks and contexts.
Measurement:
ARC-type tasks explicitly designed around compositional reasoning (e.g., complex input-output puzzles requiring multiple abstractions).
Tests like Raven’s Progressive Matrices or analogy tasks where new solutions require recombining simpler components.
Current AGI Performance:
Weak. Chollet emphasizes repeatedly that current AI largely lacks deep compositionality. Instead, it tends to solve problems through memorization of complex statistical patterns, not genuine recombination of learned simpler concepts.
Definition:
Causal reasoning is the ability to infer, reason about, and manipulate cause-and-effect relationships within and across diverse contexts, going beyond correlation or superficial statistical association.
Logic:
Chollet explicitly identifies causal reasoning as fundamental for intelligence because intelligent agents must navigate environments structured by cause-effect relationships. Efficient skill acquisition and transfer rely heavily on recognizing and leveraging causal relationships rather than mere patterns or correlations.
Measurement:
Tasks explicitly designed to test causal inference (e.g., Pearl’s causality benchmarks or tasks inspired by developmental psychology experiments).
Scenarios that explicitly require manipulating variables to achieve desired outcomes, demonstrating explicit causal understanding.
Current AGI Performance:
Limited. Chollet frequently notes current AGI’s superficial grasp of causality, often mistaking correlation or statistical patterns for true causal relationships. Explicit causal reasoning benchmarks typically reveal substantial weaknesses.
Definition:
Commonsense understanding refers to the intuitive, baseline grasp of how the physical and social world operates, encompassing everyday expectations about objects, agents, physics, interactions, and basic reasoning about the natural world.
Logic:
Chollet clearly emphasizes the importance of commonsense as a critical baseline for general intelligence, as intelligence in real-world scenarios relies on fundamental assumptions about object permanence, basic physics, agent behaviors, and social contexts. Without commonsense, systems cannot reliably handle even simple real-world situations.
Measurement:
Commonsense reasoning benchmarks (e.g., Winograd Schema Challenge, ARC tasks focused on intuitive physics and everyday reasoning).
Evaluations explicitly testing agentness, basic object interactions, and simple intuitive physical predictions.
Current AGI Performance:
Limited-to-moderate but superficial. Chollet consistently criticizes current AI as displaying only superficial or "simulated" commonsense, largely acquired through memorizing training patterns rather than genuine understanding. When explicitly tested (e.g., ARC intuitive physics tasks), performance remains significantly below human baselines.
Group Definition:
Generalization & Flexibility represents the capacity of an intelligent system to handle tasks and scenarios that significantly differ from its previous experiences, maintaining robust performance under novel conditions, uncertainties, and distributional shifts. Chollet strongly argues that true intelligence is defined by this generalized capability, rather than performance on narrowly specified or previously encountered tasks.
Definition:
Open-ended generalization describes the capacity of a system to robustly handle entirely new, unforeseen tasks without explicit prior training or guidance.
Logic:
Chollet explicitly positions open-ended generalization as the central hallmark of intelligence. Intelligence isn't about memorizing skills but rather forming abstractions that let a system reason through entirely new scenarios autonomously. Chollet’s ARC benchmark explicitly tests for this form of generalization, highlighting its importance as the essence of intelligence.
Measurement:
ARC-AGI benchmark tasks specifically designed to test generalization to completely novel situations.
Tasks never encountered during training, assessing zero-shot or few-shot performance.
Current AGI Performance:
Poor. Current systems still significantly struggle on Chollet’s ARC tasks and similar open-ended benchmarks. Chollet’s recent ARC competition documents explicitly state state-of-the-art scores (around 55%) remain significantly below human levels (97–99%), highlighting deep limitations in open-ended generalization.
Definition:
Robustness to distribution shifts measures an intelligent system's resilience in maintaining stable, effective performance when encountering inputs or environments substantially different from its original training distribution.
Logic:
Chollet stresses repeatedly that intelligence must handle realistic environments, where changes in data distribution frequently occur. Robustness is thus a critical property distinguishing genuinely intelligent systems from brittle models trained only on narrowly defined distributions.
Measurement:
Out-of-distribution (OOD) generalization tests explicitly designed to test performance degradation when input distribution significantly shifts.
Tasks with adversarial perturbations or scenarios evolving significantly from training conditions.
Current AGI Performance:
Limited. Chollet explicitly critiques current deep-learning systems as brittle, failing rapidly under even minor distribution shifts. ARC and related benchmarks strongly reveal that current models struggle significantly with robustness.
Definition:
Cross-domain transfer describes the ability of a system to apply concepts, knowledge, or skills learned in one domain effectively to entirely different, unrelated domains.
Logic:
Intelligence fundamentally involves extracting abstract, generalizable concepts from specific experiences. Cross-domain transfer is a clear test of whether learning involves genuine abstraction or mere memorization of domain-specific patterns.
Measurement:
Benchmarks explicitly designed for cross-domain generalization, applying concepts learned in one domain to fundamentally different tasks (e.g., visual reasoning tasks transferring to symbolic tasks).
Few-shot tasks explicitly requiring application of knowledge from unrelated domains.
Current AGI Performance:
Weak. Chollet repeatedly emphasizes the minimal genuine cross-domain capability of current models. AGIs predominantly rely on superficial, domain-specific knowledge acquired during training, with limited genuine cross-domain applicability.
Definition:
Zero-shot and few-shot learning measure the system's ability to understand and perform well on tasks given very minimal or even no explicit examples.
Logic:
True intelligence, according to Chollet, means efficiently leveraging abstraction and minimal priors to solve new tasks quickly and effectively without extensive training examples. Zero-shot/few-shot performance is thus a critical indicator of true intelligence.
Measurement:
ARC-type tasks with explicitly limited training examples, measuring rapid skill acquisition.
Few-shot reasoning tasks such as analogies, Raven’s matrices, and tasks requiring inference from minimal examples.
Current AGI Performance:
Moderate but superficial. While some AGIs (like GPT-4) superficially handle few-shot tasks if closely resembling previously seen examples, genuinely novel ARC-style few-shot problems expose significant deficiencies.
Definition:
The capability of a system to maintain effective reasoning, decision-making, and performance when faced with incomplete, noisy, or ambiguous information.
Logic:
Chollet emphasizes uncertainty and ambiguity as inherent characteristics of real-world environments. Thus, true intelligence must be robustly able to reason under uncertainty rather than relying exclusively on clean, certain data.
Measurement:
Tasks explicitly constructed to introduce ambiguity or incomplete information (e.g., noisy ARC tasks, ambiguous reasoning benchmarks).
Probabilistic reasoning tasks explicitly designed around uncertainty handling.
Current AGI Performance:
Limited-to-moderate. Chollet consistently criticizes current models as superficially handling uncertainty, often relying on statistical shortcuts rather than genuine reasoning about ambiguity or incomplete data. Performance in genuinely uncertain scenarios remains weak.
Definition:
Continuous adaptation is the capability of a system to incrementally update and modify its knowledge or skills continuously and autonomously, without losing previously learned information or requiring full retraining.
Logic:
Chollet stresses adaptability as essential, since environments constantly evolve. An intelligent system must continuously adapt in real-time, efficiently integrating new knowledge without catastrophic forgetting.
Measurement:
Continual learning benchmarks explicitly testing performance retention over multiple sequential tasks.
Dynamic tasks explicitly designed to introduce incremental, evolving changes requiring real-time adaptation.
Current AGI Performance:
Poor-to-moderate. Current AGI systems typically suffer from catastrophic forgetting or limited incremental adaptability, requiring extensive retraining or explicit intervention from developers.
Definition:
Flexible goal management involves autonomously setting, adjusting, prioritizing, and pursuing multiple goals based on changing conditions or contexts.
Logic:
Chollet explicitly argues intelligence requires dynamically shifting and managing goals in complex environments, continually adapting priorities based on evolving circumstances.
Measurement:
Tasks explicitly designed to test goal adjustments and reprioritization in changing environments.
Reinforcement learning benchmarks explicitly testing multi-objective optimization and dynamic goal changes.
Current AGI Performance:
Weak. Chollet notes explicitly that current AI systems usually follow static or pre-specified goals, rarely autonomously shifting or reprioritizing goals effectively without significant developer guidance.
Definition:
Novelty detection is the capacity to autonomously recognize and explicitly identify novel, unfamiliar patterns or scenarios not previously encountered.
Logic:
Chollet identifies novelty detection as fundamental because intelligent systems must first recognize when they encounter genuinely novel situations before effectively responding or learning from them.
Measurement:
Benchmarks explicitly designed to introduce novel inputs that differ substantially from training data.
Tests evaluating how well systems distinguish between known and unknown tasks or inputs.
Current AGI Performance:
Limited. Chollet highlights that current AI models typically treat novel inputs as familiar based on superficial statistical patterns. They often fail to explicitly identify or adapt to genuine novelty, highlighting significant gaps in true novelty detection capabilities.
Group Definition:
Information Efficiency & Resource Optimization describes an intelligent system's capability to use available resources—including computation, memory, energy, and data—in an optimized, economical way. Chollet repeatedly emphasizes efficiency as a hallmark of intelligence, arguing explicitly that intelligent behavior is not just about achieving outcomes, but doing so using minimal resources and information.
Chollet’s definition of intelligence directly incorporates the concept of information efficiency: intelligence is skill-acquisition efficiency relative to experience, priors, and task complexity. Thus, resource optimization is intrinsically linked to the very definition of intelligence.
Definition:
Computational efficiency refers to the ability to perform reasoning, decision-making, and skill acquisition using minimal computational resources, measured typically by time complexity or computational operations.
Logic (from Chollet):
Chollet strongly emphasizes computational efficiency as central to intelligence because a genuinely intelligent system must achieve skill acquisition and reasoning outcomes without resorting to exhaustive computations or brute-force search. True intelligence involves finding solutions efficiently, reflecting an optimized skill-acquisition algorithm rather than exhaustive search or brute-force memorization.
Measurement:
Benchmarks explicitly measuring algorithmic efficiency, complexity, or runtime on tasks (e.g., ARC tasks solved under strict computational constraints).
Performance in terms of computational budget: minimal required computations for solving novel reasoning tasks.
Current AGI Performance (Chollet’s evaluation):
Moderate-to-poor. Current AGIs typically rely on enormous computational resources, brute-force scaling, and extensive training data. Chollet explicitly criticizes current models as extremely inefficient, achieving skill primarily via massive computational scaling rather than genuine efficient reasoning or abstraction.
Definition:
Energy efficiency represents the intelligent system’s ability to minimize physical energy consumption (in biological or electronic implementations) to perform tasks or acquire skills effectively.
Logic (from Chollet):
While Chollet doesn't extensively discuss physical energy specifically, the logic naturally follows from his resource-efficiency argument. Intelligence inherently involves performing reasoning and skill acquisition with minimal resource use—including energy. Biological intelligence (e.g., human brains) is explicitly highlighted as energy-efficient, contrasting strongly with today's highly energy-intensive AI training.
Measurement:
Energy consumption metrics explicitly evaluating total energy usage per task solved or skill acquired.
Comparative benchmarks between biological intelligence (human brains) and AI models in terms of energy per reasoning operation or abstraction formation.
Current AGI Performance (Chollet’s evaluation):
Poor. Current AGI systems require massive energy expenditures (data centers, GPUs) to train and operate. Chollet consistently highlights these inefficiencies implicitly through his critique of brute-force scaling.
Definition:
Data efficiency refers to achieving robust learning, generalization, and skill acquisition from minimal data inputs or examples.
Logic (from Chollet):
Chollet explicitly frames intelligence in terms of minimal data dependence. True intelligence acquires skills through efficient abstraction and reasoning, not vast memorization of training data. Hence, a key hallmark of intelligence is the minimal data required for generalizing robustly and effectively to novel tasks.
Measurement:
Tasks designed explicitly to test performance on extremely limited data (e.g., ARC tasks requiring minimal training examples).
Evaluations measuring rapid learning curves and generalization from few examples (few-shot or zero-shot benchmarks).
Current AGI Performance (Chollet’s evaluation):
Weak. Chollet consistently critiques modern deep-learning systems for requiring vast amounts of training data and explicitly points to poor performance on ARC-like few-shot tasks. AGIs currently show poor genuine data efficiency.
Definition:
Memory optimization involves the efficient storage, recall, and management of learned knowledge or experiences, ensuring that memory resources are utilized optimally without excessive redundancy.
Logic (from Chollet):
Chollet emphasizes that intelligent systems must compactly and efficiently store abstractions and knowledge. True intelligence creates optimized internal representations that minimize redundancy and maximize reuse across tasks, enhancing generalization and adaptability.
Measurement:
Evaluations explicitly measuring the compactness of internal representations (e.g., minimal program-length or description-length measures in ARC).
Tasks evaluating efficient retrieval and utilization of stored abstractions without extensive redundancy.
Current AGI Performance (Chollet’s evaluation):
Moderate-to-low. Current AGI systems rely on massive, redundant internal representations (large neural network weights). Chollet explicitly highlights their lack of optimized or compact representations, pointing to ARC tasks to demonstrate current limitations in this regard.
Definition:
Representation learning describes autonomously discovering efficient, meaningful, and reusable internal abstractions or representations from raw sensory inputs or data.
Logic (from Chollet):
Chollet explicitly identifies representation learning as a cornerstone of intelligence. Effective abstraction and representation learning enable robust generalization, efficient skill acquisition, and resource-efficient reasoning. Intelligent systems autonomously form efficient internal representations rather than relying solely on explicit human-crafted features or extensive data memorization.
Measurement:
ARC benchmarks explicitly testing abstraction and representation quality.
Tasks measuring generalization quality based on learned internal representations (e.g., transfer-learning scenarios explicitly testing reuse of representations).
Current AGI Performance (Chollet’s evaluation):
Limited. Chollet explicitly critiques current AGIs for superficially learning representations based on statistical pattern matching from vast datasets rather than genuine abstract, reusable internal representations. ARC results directly reveal these deficiencies.
Definition:
Risk-aware decision-making refers to intelligently balancing exploration and exploitation under uncertainty, carefully considering potential risks and rewards in uncertain or ambiguous scenarios.
Logic (from Chollet):
Chollet explicitly emphasizes the critical role of uncertainty handling in intelligence. Intelligent systems must autonomously reason about potential risks and rewards under uncertainty, optimizing their decisions dynamically. Effective intelligence involves carefully balancing risks (potential losses) against opportunities (potential gains).
Measurement:
Explicitly constructed decision-making tasks that require reasoning about risk and uncertainty (probabilistic reasoning tasks, exploration/exploitation tests).
Benchmarks testing adaptability under conditions of ambiguity or incomplete information explicitly designed around risks.
Current AGI Performance (Chollet’s evaluation):
Moderate-to-poor. Chollet critiques current AI explicitly for handling uncertainty superficially, typically using brute-force or statistical shortcuts rather than genuine risk-aware reasoning. Explicit uncertainty tasks reveal current models' inability to autonomously reason deeply about potential risks and rewards.
Group Definition:
Autonomy & Self-Modification describes a system’s ability to independently manage, evaluate, and enhance its own learning processes, goals, skills, and internal structures without external intervention. Chollet explicitly highlights autonomy as central to genuine intelligence—intelligent systems must autonomously discover abstractions, dynamically adjust goals, and improve their own learning methods, independently of external guidance or manual intervention.
Definition:
Autonomous skill generation is the capacity of a system to independently identify, formulate, and acquire entirely new skills or procedures without explicit external instructions or intervention.
Logic (from Chollet):
Chollet explicitly positions the autonomous generation of skills as crucial for genuine intelligence, differentiating it clearly from systems dependent on human-engineered skill acquisition. Intelligence involves autonomously identifying skill-gaps and independently formulating skill-acquisition strategies.
Measurement:
Tasks explicitly designed to test systems' abilities to autonomously discover and acquire new skills (e.g., open-ended problem-solving tasks or ARC tasks requiring novel skill generation without explicit guidance).
Evaluations of autonomous curriculum or skill-selection methods in learning.
Current AGI Performance (Chollet’s evaluation):
Weak. Chollet explicitly critiques current systems for lacking genuine autonomous skill generation, emphasizing that modern AGI typically depends heavily on externally provided tasks, curricula, or instructions.
Definition:
Self-reflection and introspection describe a system’s capacity to autonomously evaluate, reason about, and understand its own internal states, processes, knowledge limitations, and performance capabilities.
Logic (from Chollet):
Chollet explicitly identifies self-reflection as integral to intelligence, enabling autonomous identification of learning deficiencies, effective error detection, and continual improvement of learning processes. Genuine intelligence thus requires reflective self-evaluation capabilities.
Measurement:
Tasks explicitly constructed to test meta-cognition, where systems must evaluate their confidence or uncertainty about their own predictions (confidence calibration tasks, error detection, and reasoning tasks).
Evaluations testing explicit introspective abilities, such as identifying skill gaps autonomously.
Current AGI Performance (Chollet’s evaluation):
Limited. Chollet explicitly emphasizes current AI’s superficial or nonexistent introspection capabilities. Current models lack deep autonomous reasoning about their own knowledge limitations or internal state evaluations.
Definition:
Self-update & adaptation refers to the autonomous capability of a system to dynamically and incrementally modify or optimize its internal processes, algorithms, or representations without explicit external re-programming or retraining.
Logic (from Chollet):
Chollet stresses explicitly that intelligence involves autonomous self-improvement, as intelligent systems must dynamically adapt learning methods or internal representations to improve their future learning efficiency independently of external intervention.
Measurement:
Continual learning benchmarks explicitly testing incremental self-modifications without catastrophic forgetting or extensive external intervention.
Explicit measures of improvement across sequential tasks based purely on autonomous self-updates (e.g., meta-learning and ARC-type incremental tasks).
Current AGI Performance (Chollet’s evaluation):
Limited-to-poor. Chollet explicitly critiques modern AI systems for depending heavily on external retraining, fine-tuning, or reprogramming, lacking genuinely autonomous internal updates or adaptations.
Definition:
Goal-directed behavior involves autonomously pursuing and achieving explicitly formulated objectives, dynamically adjusting strategies and plans to optimize goal attainment.
Logic (from Chollet):
Chollet explicitly identifies goal-directedness as integral to intelligence. True intelligence independently formulates goals, plans strategies, and dynamically adapts actions toward achieving objectives, reflecting genuine autonomous agency.
Measurement:
Explicit goal-directed tasks evaluating how effectively and autonomously a system sets and pursues objectives (e.g., complex reinforcement learning tasks explicitly requiring autonomous planning and strategy adjustments).
Evaluations explicitly assessing autonomous adaptation to dynamic goal changes.
Current AGI Performance (Chollet’s evaluation):
Moderate-to-weak. Chollet explicitly argues current AGIs primarily rely on externally fixed goals, lacking deep autonomous goal formulation or strategic adaptation, clearly shown in dynamic or open-ended ARC-style tasks.
Definition:
Error detection and correction is the autonomous ability to identify, reason about, and correct mistakes, misconceptions, or performance gaps independently, without external intervention.
Logic (from Chollet):
Chollet explicitly argues that genuine intelligence requires recognizing and autonomously rectifying errors. An intelligent system must autonomously detect reasoning mistakes, internal contradictions, or misunderstandings and independently initiate corrections or further learning.
Measurement:
Explicit benchmarks testing error detection capabilities, evaluating systems' abilities to autonomously reason about their own mistakes or knowledge gaps.
Tasks explicitly constructed around error identification, reasoning, and autonomous correction (e.g., self-supervised learning tasks designed to measure autonomous error correction explicitly).
Current AGI Performance (Chollet’s evaluation):
Limited. Chollet explicitly criticizes current models for shallow error awareness, often relying on brute-force solutions or statistical correlations rather than genuine autonomous error reasoning or rectification.
Definition:
Creativity and innovation represent the capacity of a system to autonomously generate genuinely novel, useful, and contextually appropriate solutions, ideas, or concepts beyond learned patterns or memorized data.
Logic (from Chollet):
Chollet explicitly views creativity and innovation as fundamental to intelligence, emphasizing that genuinely intelligent systems autonomously produce novel abstractions, solutions, and skill-generation strategies that were neither explicitly provided nor previously encountered.
Measurement:
Explicit creativity benchmarks (e.g., open-ended ARC tasks specifically testing novel solution generation without training precedents).
Evaluations explicitly measuring originality, novelty, and practical usefulness of generated ideas, abstractions, or problem-solving methods.
Current AGI Performance (Chollet’s evaluation):
Weak. Chollet consistently emphasizes that current AGIs show limited genuine creativity, typically generating outputs based on memorized patterns rather than genuinely novel autonomous innovations. ARC-style benchmarks clearly demonstrate current limitations in explicit creative problem-solving.