Maithra Raghu | On AGI and Self-Improvement

In just this past week, we have seen the release of President Biden’s executive order on AI, and the convening of the first ever AI Safety Summit, hosted by the UK and with leaders across industry, research and governments attending.

These landmark events were hosted by the pressing need to discuss and take action on AI Safety — the goal of developing this powerful new technology in a safe and responsible manner. What does it mean to develop AI safely, and what are the risks?

This is a hotly contested topic, with multiple prominent members of the AI community arguing that powerful AI could present an existential risk — threaten the survival of the entire human species — while others (including the authors) are more modest in predictions on AI risk.

Central to this debate is the notion of Artificial General Intelligence (AGI). First coined in the late 90s to refer to systems that could surpass human brains in speed and complexity, AGI in the past decade has been a more nebulous term, roughly referring to highly capable, general-purpose AI at the level of humans, and distinguishing it from previous generations of AI developments (“good old-fashioned AI”.)

In recent years however, there have been efforts to crystallize its definition, as a general purpose system capable of expert level performance on many (all) human tasks. For example, OpenAI has put forward the following definition:

Artificial General Intelligence (AGI) is defined as an AI that is capable of outperforming humans at most (all) tasks of economic value.

And concurrent to this post, a recent paper, Levels of AGI, proposes an AGI taxonomy, where the OpenAI definition of AGI can be seen as mapping to Expert or Virtuoso level AGI — AIs that can outperform highly skilled humans on a large variety of tasks, and where value is broader than strict “economic value”.

Concretely, what would this look like? Such an AI would diagnose medical conditions better than the best doctor, manage customer relationships better than the best salesperson, navigate complex codebases like a 10x programmer, create masterful works of art or literature, and so on. Any task conducted by a highly skilled human, the AI would do better.

Note that this is still about tasks, and not about automating entire jobs — the AGI would be a highly capable assistant, and might dramatically reduce the need for human labor (indeed a realistic risk of AGI), but there would still be human input in defining the goal, determining tradeoffs, and key longterm outcomes.

This need for human input could be the key difference between expert level AGI, and Artificial SuperIntelligence, where the AI outperforms humans to such a degree that it can form goals and make inferences that could be beyond human comprehension. This is a very abstract definition, and at this stage less relevant than the more concrete goals of expert AGI, which is the focus of this post.

Achieving expert AGI would be both inspiring and humbling. But is this even a realistic goal for our current state of AI? And if so, what would we need to get there? These questions are also hotly debated, with some arguing that there are fundamental flaws in our current AI design, while others contest that we have all that we need, and further scaling will lead to AGI.

Certainly the past few years have shown phenomenal successes in scaling AI, with LLMs developing increasingly advanced and diverse capabilities. But scaling AI is also very challenging. Leading AI labs have raised many billions of dollars, which goes to support compute clusters, costly iteration on model architectures and training methodology (optimization process, feedback mechanisms, hyperparameters), and, of course, data collection.

This latter point is an important one when it comes to considering the viability of AGI. AI’s advanced capabilities in this scaling era have been closely tied to collecting and training on high quality data. Serious AI efforts have spent many millions creating rich datasets through hired annotators, which can be used to provide granular feedback to the AI and substantially improve quality. For example, models such as GPT-4 and CodeLlama with strong programming abilities have been trained with carefully collected/processed code data.

Extrapolating this directly, achieving AGI would require high quality data on many, many tasks of economic value. This would be a very challenging endeavor for any single organization! Data for many tasks of economic value is hard to obtain for sensitivity, privacy or competitiveness reasons, and for some tasks may not even exist (all knowledge/experience in the task is contained in humans). Even if all of this data could somehow be collected, there would have to be a human directed training process of the AI for all of these new and diverse tasks, which is already an enormously difficult and expensive effort at our current stage.

In summary, this approach to AGI would simply be intractable.

Is there any alternate to this? The only possibility to reduce the immense human effort in getting to AGI would be self-improvement. In self-improvement, the AI learns and adapts by itself.

Learning and Adaptation Gaps between Humans and AI

While modern AI has become increasingly capable, this is one area where humans and AI have a clear gap. Humans are able to learn and change “on the fly”, while this remains a challenge for AI.

For example, suppose a high school student is learning basic Euclidean geometry for mathematical olympiad problems. Having gained proficiency in solving such problems with the techniques they’ve learned, they then come across a more powerful geometry technique (e.g. barycentric coordinates). They would quickly learn the details of this new technique, and be able to apply this to future problems.

As another example closer to home, suppose a data scientist had learned many different techniques (decision trees, SVMs, naive bayes, gradient boosting), for a family of classification problems. If they then came across a simple, new, neural network based approach, they would again be able to pick up the specifics and have a different approach to apply to future problems.

This rapid learning and adaptation is still a challenge for AI. For simple AI adaptations, in-context learning (prompting) can enable “on the fly” adaptations. But prompting does not persist (not permanently learned), and is not a substitute for more complex learning, where human guided training/finetuning must be used. Furthermore, it’s hard to even measure AI adaptations accurately. Training datasets for AI contain a vast amount of human knowledge — all of the internet and more! So what looks like “learning on the fly” might just be the AI outputting memorized knowledge from its training data.

Indeed, creating benchmarks that can truly measure AI learning on the fly is a key research problem for our times. One possible template is given by the examples above. Define a complex task, and then provide the AI truly new knowledge to help with solving that task. Can the AI incorporate this new knowledge? To make sure the new knowledge is not contained in the training data, the AI could be trained with a strict time cutoff, and the new knowledge drawn from more recent events. Alternatively, if the knowledge is sufficiently niche enough, it could be filtered from the training dataset. Of course, such approaches are easier described than done(!) and these are important problems to explore.

Self-Improvement

Supposing we could measure AI adaptation, what would the mechanics of self-improvement look like? Crucially, self-improvement must have AI update its own parameters. So a possible process could look like the following:

A human specifies a new task they want AI to learn
The AI (with the help of tools) assembles a training and evaluation dataset, along with appropriate feedback
The AI devises a training process for itself. The training process must allow the AI to learn the new task while maintaining high performance on existing capabilities (i.e. avoid catastrophic forgetting)
The AI evaluates itself
The AI repeats data collection, training and evaluation until high performance is achieved AI declares completion when ready

(In recursive self-improvement, the AI itself defines the tasks it wants to learn.)

Many of these steps have connections to ongoing lines of research. Synthetic data generation has been used to create training data, neural world models to fuzzily generate environments, and AI based annotation to do automatic labeling. Work on learned optimizers has looked at automating parts of the training process such as hyperparameter selection, and critique LLMs used to automatically evaluate LLM output. Calibration methods have teased out AI’s estimated uncertainties.

But while each of these pieces has seen progress, with some meta-learning approaches even exploring end to end self improvement for narrowly defined tasks, it’s crucial to expand the complexity and diversity of tasks amenable to self-improvement. For example, can we use self-improvement to generalize to unseen data modalities (e.g. from language and vision to waveforms)? These developments will have a crucial role to play if we are to get to AGI.

Can self-improvement take us to AGI?

But can we get to AGI with self-improvement? The answer depends on how good self-improvement can get. Particularly, can we use self-improvement to “start from scratch” and get to “expert level” accuracy?

The process of starting from scratch and getting to expert (superhuman) accuracy is reminiscent of Reinforcement Learning (RL) successes in games, such as AlphaZero. However, games can be perfectly simulated, making it possible to do numerous play throughs for data (trajectory) collection and RL rewards, as well as employing hybrid neural search strategies. Transferring this to general real world tasks, with the lack of high-fidelity simulation, has been challenging, and we have yet to see superhuman AI abilities developed purely with RL. Unless neural world models see unprecedented improvements, or real world interaction data becomes incredibly cheap at the volumes needed for RL, this is likely to remain the case.

Some of the biggest recent successes of RL have been RLHF (Reinforcement Learning from Human Feedback), where a powerful pretrained LLM is further tuned with RL on human preferences and other feedback. It’s possible that self-improvement will exhibit similar characteristics — it will be hard to solely use self-improvement to go from scratch to expert level, but self-improvement will be very helpful when starting at a moderate performance level.

If this is the case, it seems likely that instead of one, centralized AGI, we may have many expert level AI systems, each covering a family of tasks. These expert AIs of the future will be much broader than our current superhuman narrow AIs, but not general enough to be an AGI.

In Conclusion

The crux of the recent debates on AI and AI safety has been predictions on the roadmap and timeline for AGI. Current definitions of AGI see it as an AI system that can outperform humans on all tasks of value. But while scaling AI has shown tremendous progress, its needs for high quality data and human supervised training processes make AGI a very difficult goal. Crucial for the possibility of AGI will be self-improvement, which finally addresses a longstanding gap in enabling AI to learn and adapt “on the fly”. While parts of self-improvement have been studied in different lines of research, there is potential to dramatically improve its potency, with many pressing open questions. Whether we reach AGI may hinge on how effective we can make self-improvement — a truly exciting topic for our times!