Long AI, Short 'AGI'?

Published on Dec 03, 2024

Nearly a year and a half ago, I wrote an (admittedly poorly titled) essay, “AGI’s already here, it’s just not evenly organised”, in which I laid out a somewhat contrarian – but what I felt was a painfully obvious – thesis on the (then) current state and future development of safe autonomous systems. Here’s the first observation of that piece, from July 2023:

The existence of some form of weak or strong process knowledge (high-quality data that renders its owner expertise) will hinder the most common way the big AI Labs are approaching building AGI: by brute-forcing ever larger foundational language models with low- (internet corpora) to somewhat high- (SFT and RLHF) signal generalised data. Basically, there are some intuitive processes that humans currently perform that you can’t expect an AI to ‘know’ and, therefore, replicate/outperform without better information, especially that which is not precisely available in neat, well-defined packages. This could be anything from how the best traders set stop-losses and manage downside risk and the subtleties in balance sheets and annual reports that the best equity investors and stock pickers look for to how the best drivers drive and a physician’s intuitive understanding of therapeutic best practices for his patients.

Depending on the validity of this premise and the observed strength of the notion of ‘process knowledge’, a centralised model checkpoint achieving ultra-human general performance using current methods would be, at the limit, perhaps impossible and, at the very least, operationally inefficient. GPT will achieve driving capabilities (if it does at all) long after we have self-driving cars.

I’m happy with how the essay has aged, particularly given recent developments, but I’ve had a striking case of writer’s remorse ever since I hit publish. The writing felt a little too contrived and in dire need of better structure and flow. I’m attempting to slay those demons here while also updating on current evidence and concretizing the pitch a little further.

Here are some facts as I see them:

As of December 2024, it looks as though scaling pre-training is yielding, at best, diminishing returns. There have been reports aplenty on this subject, but it’s stark from some of its most ardent proponents hedging their words and narratives as well.
This is a death knell for synthetic data (narrowly defined) – it may have its uses, but it does not appear to be the yellow brick road – and the other bits of discordant thinking that reigned all year. Pinning hopes on extreme gains from transfer learning at a larger scale also seems wishful.
The labs have seemingly moved on to a new paradigm. Scaling “test-time compute” is the topic du jour these days. Just make the models think for longer and all will be good in the world, right? Maybe not. Aidan McLaughlin makes a comprehensive case against unfettered progress from this paradigm here, but a simpler (perhaps too naive) version is as follows: if synthetic data is inadequate for scaling pre-training, it should perhaps cause us to update negatively towards these models, whose primary calling card can be analogized as something akin to fancier synthetic ICL/COT.
That line of thinking assumes that spinning up RL environments for these models will be a tenuous proposition. Of course, there are domains where this doesn’t appear to be the case – math and coding most prominently. One can expect progress in those domains, conditional on the robustness of current models to navigate the associated search spaces.

I hope my unenthusiastic demeanor isn’t in any way taken as general bearishness on the field^[1] as a whole. The title of this essay (and the previous) should be evidence enough – AI will be tremendously valuable – but the value will possibly be more diffused (across vertical winners) than current convention proselytizes. Again, my only contention with this piece is to formalize my current thought – as an investment exercise – and receive helpful commentary on the errors that may lie within.

Edge case robustness is the thing. I wrote previously:

With the premises in mind, I think the better way to approach the AGI problem – and deliver on its immense promise – is to look at it as a set of discrete tractable tasks, each with its process knowledge (high-signal data and fine-tuning) requirements. I also think developing superhuman agentic systems in narrow domains – eventually all working together to automate the majority of intellectual labour – is possible without any mind-bending breakthroughs to today’s (July 2023) technology. We know what to do. This also has some anthropomorphic hints: humans, after all, only command expertise over very narrow domains of complex systems.

Viewing our approach in the context of Dan’s example (mentioned in the first paragraph above): sufficient data quality, granularity and specificity should yield performance indistinguishable from that of the control group. In this case, this would mean giving the amateur (an AI) a recipe that contained everything he could need (high-signal information/methodology) – it would tell him exactly how to cut the vegetables, visual cues for spotting doneness, and every other infinitesimally small written and unwritten detail that separates the best of chefs from the dilettantes. This should, in theory, allow him to come up with a dish that rivals that of an expert. Maybe training general foundational models gets to this point too but one would assume – if they do at all – that they will arrive much later than the narrow and focused model.

The stoic and itinerant progress of self-driving cars over the past decade or so should illustrate the magnitude of the long tail of edge cases that require tackling. Both Waymo and Cruise (when they get there, Tesla will too) rely on remote intervention teams who must navigate the model through out-of-distribution events — and who will continue to do so for the foreseeable future. Whatever the length of that tail is, we can all safely assume that it’ll be much longer for a purported AGI.

Footnotes:

Especially the labs. They’ve already done, and will hopefully continue to do, tremendously innovative work. ↩

Feel free to reach out on Twitter/X @sportquant if you have any suggestions/points of contention.