March 2025

Tech Talk

Situational Awareness: Understanding the rapid advancement of AGI By Michael E. Duffy L ast month, I wrote about the state of developments in Artificial Intelligence (AI), and the drive toward Artificial

required by some part of the LLM, meaning you can do more with whatever amount of computation used. Unhobbling improvements are harder to quantify, though. Again, from the article, “Imagine if when asked to solve a hard math problem, you had to instantly answer with the very first thing that came to mind. It seems obvious that you would have a hard time, except for the simplest problems. But, until recently, that’s how we had LLMs solve math problems.” Asking an LLM to explain its approach step-by- step, the so-called “Chain of Thought,” is one approach used to “unhobble” LLMs. Another, “Reinforcement Learning from Human Feedback,” is what

General Intelligence (AGI), software that is as smart (or smarter!) than a human being across a broad range of subjects. Despite my skepticism about AGI, this paper (situational-awareness. ai/) from June 2024 makes some compelling supporting arguments that it’s nearly here. So, I thought I’d break it down for my readers in a way that cuts through the jargon. The author, Leopold Aschenbrenner, contends that there are currently a few hundred people in the world with the situational

awareness (the title of his paper) to really predict what is likely to happen with AI/AGI. Aschenbrenner is a smart twenty-something—graduated from Columbia University in 2021 at age 19 as valedictorian—who worked at OpenAI (he was fired in April 2024 for ostensibly leaking information), and believes he is one of those people. His first argument is that AGI will become a reality by 2027. Specifically, Aschenbrenner states, “it is strikingly plausible that by 2027, models will be able to do the work of an AI researcher/engineer.” He bases this on the roughly four-year span between GPT-2 (2019) and today’s GPT-4, which he describes as a change in “intelligence” from a preschooler that could “string together a few plausible sentences” to a smart high school student that is “beating the vast majority of high schoolers on whatever tests we can give it.” This argument is based on “counting the OOMs,” OOM being shorthand for Order Of Magnitude, i.e. a 10x change in something. To quote the article, “With each OOM of effective compute, models predictably, reliably get better. If we can count the OOMs, we can (roughly, qualitatively) extrapolate capability improvements.” Of course, this begs the questions: a) what is “effective compute,” and b) does it really track a model getting “better” (which is a loose term). The amount of work (“compute”) actually used to train large-language models (LLMs) like GPT can be measured in terms of the number of computing operations performed. It appears that the difference between GPT-2 and GPT-4 is about five OOMs, or 100,000 times as much, enabled by faster hardware and coupled together in larger and larger clusters. On top of this actual measurement, Aschenbrenner introduces the idea of “effective compute,” which are changes that effectively multiply the actual compute cycles involved. He cites two distinct multipliers: algorithmic improvements (i.e. using the actual compute cycles in a more productive way), and various “unhobbling,” improvements, representing strategic changes. Aschenbrenner explains algorithmic improvements thusly: “For example, a better algorithm might allow us to achieve the same performance but with 10x less training compute. In turn, that would act as a 10x (1 OOM) increase in effective compute.” Algorithm improvements reduce the amount of computation

you might expect—having humans tell the models what they get wrong. Unhobbling doesn’t require extra computing power—it just allows an LLM to work smarter, not harder. Combining these three factors—more compute, algorithmic advances and unhobbling—Aschenbrenner estimates that by 2027, GPT’s effective compute power will increase by another five orders of magnitude (100,000x). Aschenbrenner asserts that no great leap of faith is required to believe this. I’m going to re-quote him, adding a bit more context in the second sentence: “It is strikingly plausible that by 2027, models will be able to do the work of an AI researcher/engineer. That doesn’t require believing in sci-fi; it just requires believing in straight lines on a graph.” That five OOM jump, taking LLMs from smart high school student to autonomous AI researcher, is important. Imagine a tireless research worker, focused on the problem of improving their own software “brain.” That means the rate of improvement may potentially skyrocket. It is both awesome, and terrifying. Maybe Aschenbrenner is wrong in his estimates of how quickly we are, as he puts it, “moving through the OOMs.” And there are possible blocks to further progress, foremost being running out of data to train new generations of LLMs, which are already trained on a significant proportion of the entire internet’s worth of content. But the trendline is very clear, and—barring problems—AGI isn’t just a possibility; it’s a looming reality. I know this has been a pretty dense read. But I want my readers to have a good understanding of what could be the greatest technological transition they’ve ever experienced—true thinking machines. With this gentle introduction in hand, I encourage you to read the original paper and form your own opinions. g

Michael E. Duffy is a senior software engineer and lives in Sonoma County. He has been writing about technology and business for NorthBay biz since 2001.

March 2025

NorthBaybiz 21

Made with FlippingBook Online newsletter maker