[ExI] AI is developing so fast it is becoming hard to measure

Fri Feb 27 20:17:55 UTC 2026

On Thu, 26 Feb 2026 at 19:26, Jason Resch via extropy-chat <
extropy-chat at lists.extropy.org> wrote:
> Indeed, the change I've noticed in AI in the past few months feels like
today's tools are on a completely different level than they were from a few
months ago.
>
> <snip>
> Jason
> _______________________________________________

I asked Gemini to discuss the exponential rate of AI improvement.
BillK

Gemini 3.1 Pro -

The article you shared captures a pivotal moment in the "AI timeline." To
expand on this, we have to look past the word "fast" and understand the
specific mechanics of *exponential growth*—and why the experts cited in
that article (from DeepMind and METR) are behaving as if they are watching
a tidal wave approach while the rest of the world is still sunbathing.

Here is a deeper look at the three pillars of this imminent exponential
progress.
1. The Shift from "Chat" to "Time Horizons" (The Agentic Leap)

The most significant data point in the article is the *METR "Time Horizon"
benchmark*.

   -

   *The Old Way:* We used to measure AI by "accuracy" on short questions
   (e.g., "Can it pass the Bar Exam?").
   -

   *The New Way:* We now measure AI by *autonomous duration*. In mid-2024,
   the best AIs could only "focus" on a task for a few minutes before losing
   the thread or hallucinating.
   -

   *The Exponential Jump:* As of early 2026, models like *Claude Opus 4.6*
   have hit a "50% time horizon" of *14.5 hours*. This means the AI can
   work autonomously on a messy, complex software project for nearly two full
   workdays and succeed half the time.

*1. Why this is exponential:* If the doubling rate of ~7 months holds, by
2027, an AI might have a "time horizon" of a month. It won't just be an
assistant; it will be an autonomous "digital employee" capable of managing
entire projects from start to finish.

2. The Measurement Crisis: "Flying Blind"

The article mentions that researchers are "nervous" about their own charts.
This is because AI is now *outpacing its own yardsticks*.

   -

   *Benchmark Saturation:* For years, tests like MMLU (general knowledge)
   were the gold standard. AIs have now effectively "maxed out" these tests.
   -

   *The Confidence Interval Problem:* When a model is as capable as Claude
   4.6, it’s hard to find tasks that are difficult enough to test its limits.
   This creates a "measurement fog." We know the curve is steep, but we are
   losing the ability to see exactly where the top is.
   -

   *Implication:* We might reach Superintelligence (ASI) or AGI (Artificial
   General Intelligence) before we even have a test capable of proving it
   exists.

3. The "Everything" Moment vs. The "Energy Wall"

The discussion of "Nothing, nothing, nothing, everything" refers to the
deceptive nature of exponential curves.

   -

   *The Economic Lag:* Currently, the economy feels "normal" because a tool
   that fails 50% of the time is hard to integrate into a bank or a hospital.
   However, on an exponential curve, the jump from "50% success" to "99%
   success" happens much faster than the jump from 0% to 50%.
   -

   *The Energy Bottleneck:* This is the only "brake" on the exponential.
   While the software is getting 2x better every few months, the *power
   grid* is not. Data centers are hitting a "Power Wall." Reports from
   early 2026 show that the electricity required for the next generation of
   training runs could rival the energy consumption of entire nations.

4. The "10x Industrial Revolution" Impact

When Demis Hassabis (DeepMind) says AI will have *10 times the impact of
the Industrial Revolution in 1/10th of the time*, he is referring to
the *Cognitive
Surplus*.

   -

   The Industrial Revolution replaced *muscle*.
   -

   The AI Revolution is replacing *problem-solving*.

If progress is truly exponential, we are entering a "Phase Shift." In the
Industrial Revolution, society had 100 years to adapt. If this happens in
10 years, the "chaos" the METR researcher mentioned refers to the fact that
our laws, schools, and tax systems are designed for linear change, not a
world where the "state of the art" resets every seven months.
Summary: Is the Hype Real?

The data from *METR* suggests the hype is backed by a very specific,
measurable trend line (an R² of 0.93, meaning the trend is statistically
"clean"). The question isn't whether the progress is happening—it’s whether
our physical infrastructure (electricity) and our social structures
(jobs/laws) can survive the speed of the "everything" moment.

--------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260227/0740168c/attachment.htm>