Article

Enter AI agents: why 2025 rewrote the AI timeline

May 2026 / long read

Author

Gary Robinson, Joint manager, US Growth Trust

Key points

AI agents are handling longer, more complex tasks reliably
As agents do more real work, demand for chips, datacentres and digital tools could rise sharply
Anthropic, NVIDIA and software infrastructure holdings could benefit as older software models come under pressure

Koshiro K - stock.adobe.com

As with any investment, your capital is at risk

In late 2025, artificial intelligence (AI) models crossed a new threshold. We have been bullish on AI for some time, but what happened has caused us to revise our assumptions upward. We now think AI will have a large impact on the economy much earlier than we previously expected.

Agents are here

Large language models (LLMs), such as those built by Anthropic and OpenAI, are trained on vast amounts of text to predict the next word in a sequence. In doing so, they develop internal representations of language, logic, and knowledge that turn out to be surprisingly general.

This was the breakthrough that led to ChatGPT. These early models could write fluently and answer questions, making them effective as chatbots. However, they struggled to hold a plan in mind over many steps, couldn’t recover from mistakes, and were limited in their ability to use external tools.

Two innovations changed this. The first was chain-of-thought reasoning, which launched with OpenAI’s o1 model. Rather than blurt out an answer immediately, reasoning models could tackle problems methodically, step-by-step. This made them better at solving hard problems, particularly in areas that require complex logic, such as maths and coding.

The second big breakthrough was tool use. This gave models the ability to interact with the digital world by writing code, searching the web, reading and writing files, and controlling software.

These developments enabled AI systems to work autonomously at solving real problems. They could take a goal, break it down into sub-tasks, write code to solve the task, check the output and iterate until the job was done. This is what is meant by ‘agents’ and ‘agentic AI’.

Agents were unreliable until recently. Then, in November last year, something changed when Anthropic released Claude Opus 4.5. This new model brought these coding capabilities together at a level that surprised almost everyone in the industry. The benchmarked performance of Opus 4.5 was good, but standard benchmarks – designed to test discrete, predefined tasks – failed to capture the step change in sustained, real-world performance across long autonomous workflows.

Previously if you’d set an application such as Claude Code on a coding task, it was hit-or-miss. Now it was mostly hit. The model could sustain autonomous coding sessions for 30 minutes-plus, navigating complex multi-file codebases, debugging its own errors and producing code good enough to push into production.

One chief executive (CEO) of an analytics platform described his ‘oh sh*t!’ moment in December to us. It was near year’s end, and his engineers finally had some time to experiment. A few of them suddenly became more productive – “10x engineers became 50-100x engineers”.

As a result, the CEO estimates the company is seeing 50 percent productivity gains in research and development (R&D) and 20 percent in sales. If the company fully deployed coding agents across its engineering organisation, consumption would increase by 25-50 times.

The CEO of another AI-related company has mentioned a similar timeline: “We reached a tipping point [late last year] based on a step change in agent capabilities in coding”.

Most traditional benchmarks struggle to reflect how quickly AI agents are improving at sustained, real-time work. One that has done a better job than most comes from the independent research group, METR. It has measured how long an AI system can work autonomously on a software task, by using the time a skilled human would need to complete the same task as the benchmark.

The chart below shows this time horizon: the human time for a software task that different LLMs can complete with a 50 percent success rate. From Opus 4.5 onward, we have leapt off the trendline. From 2019 to 2024, the length of tasks models could handle roughly doubled every seven months. It then accelerated to every three months. And the latest points on the curve indicate a doubling every two months.

This suggests that the latest systems are advancing faster than earlier generations and that overall capability growth has accelerated. Today’s leading-edge models can complete tasks that would take a skilled human roughly 12 hours, succeeding about half the time.

Three years ago, that figure was closer to four minutes.

At the higher bar of 80 percent reliability, models have moved from handling tasks of about one minute to handling tasks that would have taken a skilled human over an hour.

AI is completing more complex tasks

Time horizon of software tasks that different LLMs can complete 50% of the time

In the old paradigm, a user inputs some text, and the model generates a response. In this ‘one shot’ system, compute consumption is relatively modest, running into a few hundred or maybe a few thousand tokens (the chunk of text the model processes and which is a proxy for compute/cost).

Agents work differently. This time, the query is a goal, for example, ‘find the bug that’s causing the page to crash’, and then the model enters a loop. It reasons about where the bug might be, reads the code, forms a hypothesis, writes a fix, runs tests and checks the results.

If it fails, it runs the loop again. Each loop requires a fresh pass through the model. A task that takes an agent half an hour might loop dozens of times, consuming hundreds of thousands of tokens. A single agent session can burn through more compute than a thousand chatbot queries.

At a recent conference, a CEO at one of our portfolio companies noted that some of his engineers are now spending more on AI tokens than their annual compensation.

The return on investment (RoI) on these tokens exceeds the return from spending the same amount on employee pay, which he expects will increase: “The number of tokens will be infinite. There’s just so much to do. We’re not producing even a basis point of the tokens we should”.

We’re hearing the same message again and again. These companies are at the leading edge. Where they go, others will follow.

The effects of this inflexion are already evident in the revenue figures of the frontier labs. Most developers experienced this through Claude Code. Anthropic’s revenues have scaled from a $9bn run rate at year’s end to more than $30bn as of 6 April. It is the fastest growing software company in history. For context, the entire software-as-a-service (SaaS) industry tends to add about $2bn in net new annual recurring revenues (ARR) each quarter. Growth at this level and scale is truly unprecedented.

The hyperscalers – the big cloud providers – have responded to this acceleration in demand. Capital expenditure from the largest five (Amazon, Alphabet, Meta, Microsoft and Oracle) is now expected to be more than $750bn this calendar year, up more than 50 percent year-over-year, with about three quarters of this going on compute.

To put this into context, the Apollo program cost more than $300bn in today’s money, spread over more than a decade. All of the hyperscalers report that customers are absorbing AI capacity as fast as it can be deployed.

Going back a year or two, we were concerned that we might hit an air pocket. We no longer have this concern. Given the recent agentic shift, it now seems likely that AI demand will remain undersupplied for years.

The recent demand spike driven by agentic AI has mostly been in software development. This will likely spread to other domains.

The capabilities that made coding agents possible are general. The same model that can debug a complex piece of code can also read financial filings and build a model, read a legal contract and flag risks, or plan a logistics route. Each new domain requires fine-tuning and new tools, but both are being built at speed.

OpenClaw

In January, a software engineer asked an AI agent to buy him a car. The agent scraped dealer inventories, filled out contract forms using his phone number and email. It then spent several days playing dealers off against each other, forwarding competing PDF quotes and asking each one to beat the other's price. The final price was thousands of dollars less than the list price. The engineer just had to show up and sign the paperwork.

The agent was OpenClaw, an open-source project (formerly Clawdbot) that went viral earlier this year. OpenClaw is a locally-hosted AI model that connects to messaging apps, such as WhatsApp and can autonomously manage email, calendars, files and commands on a user's machine.

There are hundreds of stories like the car negotiation. One of the scariest developments was the formation of an agent social network called Moltbook, where almost a million autonomous agents signed up and were interacting with each other.

Whereas NVIDIA’s CEO Jensen Huang has said it’s “definitely the next ChatGPT,” another CEO we spoke to has suggested it is, "the most important thing anyone has ever created".

OpenClaw is still very rough. There are huge security risks in giving an agent unfettered access to your digital life.

But the demand for a personal assistant that can act on your behalf and save you money and time is clearly enormous. And this is, again, an incredibly computationally intensive version of AI. We are going to need a lot of compute.

OpenAI hired the solo developer behind this project, Peter Steinberger. Anthropic has been working quickly to ship the components that make up OpenClaw, such as the ability to dispatch a task from your phone and run it autonomously on your computer, and the ability for the model to take complete control of your computer’s operating system and apps.

Sizing the opportunity

The cloud computing transition was big, amounting to several hundred billion dollars per year. However, it was ultimately constrained by the size of the existing IT workloads being shifted from on-premises servers.

AI is much more transformational. Instead of migrating existing compute workloads, agents are augmenting and, in some cases, substituting human labour. The scale of the market opportunity is in the trillions, not hundreds of billions.

And if agents are more efficient than people at certain tasks, the total volume of work done could grow rather than shift from one cost line, human capital, to another.

Gross domestic product (GDP) growth itself could accelerate. US productivity growth has already picked up from a 1.4 percent ten-year average to 2.7 percent last year. Stanford economist Erik Brynjolfsson thinks AI is in part responsible.

If he's right, we could see faster GDP growth, with software spending consuming a much larger share. The question is which software companies will capture that spending.

Not all software is equal

Agentic AI triggered a large selloff in public software stocks. In the first three months of 2026, the sector is down more than 20 percent, with many names down far more than this. Agents such as Claude Code have dramatically lowered the barriers to producing software.

All other things equal, this will raise more competition and eat away at the industry’s moats. The market is also concerned about business model risk. Many software companies generate revenue via a price-per-user model.

If AI replaces people, the price per user revenue shrinks. Companies may be able to adapt by shifting their business models towards being priced by usage, but this isn’t straightforward, and customers will likely resist.

Moreover, SaaS companies have been richly valued, in part on their dependability. They generate recurring revenue and customers churn infrequently, leading to highly predictable profits far into the future. But predictable cashflows are also the most vulnerable when investors start pricing in uncertainty.

We covered our thoughts on this in the US Growth quarterly letter and in these articles, US growth investing amid AI fears, AI immunity and the acronym fallacy and Is AI eating software. In short, we think the market is being indiscriminate. There is a crucial difference between businesses that sit at the interface layer and those that provide the infrastructure, the gates and rails.

The former are at risk of becoming a commodity while the market opportunity for the latter type of company could increase in a world of agents because agents need infrastructure too. What matters is whether these companies have the will to adapt.

Pleasingly, we see this in most of our holdings. We have long favoured founder-led businesses, and it is at times like these when the attributes of an effective founder-leader are most valuable.

We shared a note last year that mentioned how Shopify has been positioning its business to benefit from agentic commerce. Shopify is an infrastructure for commerce. It’s the operating system that merchants manage their business.

As and when commerce switches from people shopping for things to agents doing it on their behalf, those agents will still need all of the functionality that Shopify provides, such as checkout, inventory management, shipping, taxes, returns handling and subscriptions. CEO Tobi Lütke thinks that half of all ecommerce will ultimately be handled by agents.

Cloudflare is another of our software infrastructure names that stands to benefit. AI has made writing software dramatically cheaper, so the number of new applications is exploding, and those apps need to be deployed somewhere.

Cloudflare has been rapidly adding functionality to its Workers service to position it as the default hosting platform for AI agents and vibe-coded applications. The company is also strategically positioned with regard to agent traffic, with 20 percent of the web sitting behind it.

It has launched AI Crawl Control to enable content owners to monitor agents and charge them for access, a critical service for websites which have historically monetised via advertising and need a new revenue model as agent traffic substitutes for human eyeballs.

Finally, Cloudflare benefits from the sheer volume of agent activity on the web. If humans look at a handful of sites when making a decision, an agent might look at thousands. In January, weekly agent requests more than doubled across its network, driving demand for its core security, performance and networking services.

Most of our large software holdings fall into the infrastructure category – Stripe, Databricks, Snowflake. They provide the infrastructure that powers the digital economy. Their revenue models are also already consumption based. With the right leadership, agents ought to be an opportunity for these companies, rather than a threat.

There are some companies which we own where there are greater question marks. Workday is one example. It’s a classic seat-based SaaS business. Its founder hasn’t been fully engaged with the business for several years. He has just announced his return.

Workday is a general-purpose HR and finance solution that serves customers across many industries with a single core product. It lacks the deep proprietary data moat that vertical SaaS businesses – those that target their products on the workflows, data and regulations of a single industry – benefit from.

So, while it is a system of record that isn’t easy to rip-and-replace, we are more cautious on this stock than our other software names and have been reducing our holding.

Lots of chips

One of the puzzling things about recent events is that NVIDIA’s share price hasn’t reacted to these developments. In our view agentic AI is very positive on the outlook for compute demand. And yet NVIDIA’s share price fell around 10 percent in February and March, alongside SaaS stocks.

It seems inconsistent to us that the market would be so bullish on AI as to mark down the prices of many software names by 20-40 percent and yet indifferent regarding the implications for NVIDIA.

NVIDIA founder Jensen Huang corroborated this positive view on a recent podcast with Lex Fridman. He made a couple of comments on the podcast, one direct and one indirect, that indicate he believes NVIDIA could ultimately be a $3tn revenue business. That’s trillions. Revenues this year are forecast to be $366bn.

He argues that NVIDIA’s revenue is not constrained by market share in a fixed market. It is determined by the size of new markets that AI creates. NVIDIA’s revenue is a function of the number of tokens generated.

As AI shifts from chatbots to autonomous agents, and from a single domain like coding to every knowledge work domain in the economy, the volume of tokens the world needs grows by orders of magnitude.

One thing which could hold back revenue growth is deflation. NVIDIA has improved system-level performance by a millionfold over the past decade. However, demand has still grown meaningfully.

And there are no signs of this ending. We consistently hear from portfolio companies that, while token costs are falling, usage is growing faster. This is Jevons’ Paradox: as something gets less expensive, people may use more of it.

NVIDIA’s Jensen’s paradox is a company sustaining hyper-growth while trading on ever-cheaper valuation multiples. As at 27 April 2026, the stock is trading at 25 times the current year’s earnings. It has a forward price-to-earnings ratio (P/E) that is about half that of Costco (49 times) and Walmart (44 times) and is growing many times faster.

Consensus estimates have the company’s growth slowing from 73 percent this year to 33 percent next year. We think a slowdown of this magnitude is unlikely.

The strongest bear case is that we’ve seen this before. Technology investment moves in cycles. Transformational demand can be real and still get pulled forward, leading to a capital expenditure (capex) overshoot and a painful correction.

The fibre optic buildout of the late 1990s was necessary. The world did need all that bandwidth. But the companies that laid the cable still went bankrupt because supply overshot demand by several years.

Could the same thing happen with AI? It’s possible. A recession that tightens enterprise budgets could cause the hyperscalers to scale back spending, and NVIDIA’s growth would decelerate sharply.

We take this seriously. The reason we are still bullish is that the demand picture today is different from what we saw even 12 months ago. The agentic shift has moved AI from something discretionary to a productivity tool with measurable RoI and potential existential implications for companies that fail to embrace it.

A year ago, we worried about competition eroding NVIDIA’s edge and about the risk of an air pocket in demand. We no longer hold these concerns.

While there is clearly more competition than there was a few years ago, the durability of AI compute demand looks solid for several years at least, and NVIDIA’s competitive position in inference – running the model to produce outputs (as opposed to training it) – has strengthened considerably.

US Growth Trust portfolio positioning

At the foundation model level, we see a three-horse race developing between Anthropic, OpenAI and Alphabet. We have added Anthropic and Alphabet to the portfolio in the last year.

Anthropic is arguably in the lead. It is now one of our largest holdings at 3.3 percent of the fund at the end of March. It is a private company.

Many of the breakthroughs that teed up this new agentic paradigm came from Anthropic. The company’s revenues have scaled faster than at any other company in history. The scary thing is, they are probably still nowhere near their potential.

In December 2025, we reinitiated a small holding in Alphabet at more than 1 percent of the fund. We think the company’s vertically integrated strategy, which combines frontier models, datacentres and chips, is promising. But this is balanced by disruption risk in the core advertising business and a culture that has been quite slow moving historically.

We have an indirect AI holding through our position in SpaceX, following its acquisition of Elon Musk’s AI venture, xAI. We are least optimistic on this foundation model player.

Unusually for a Musk-led company, xAI has struggled to attract and retain talent, and its models lag those of the leading-edge players. However, if datacentres in space turn out to be viable, xAI could be at a strategic advantage by being tied to SpaceX’s launch capability, Starlink satellite constellation and orbital infrastructure.

We also have a small holding in an AI video creation and editing company, Runway AI (0.6 percent of the fund).

As you will have gleaned from the earlier sections of this note, we are bullish on NVIDIA and have been adding recently. It’s a 4.4 percent holding at end March, and we are looking for further opportunities to add.

These are our largest direct AI plays. However, we believe many of the infrastructure software companies in the portfolio also stand to benefit. These names haven’t been in favour of late, but we are confident the market will eventually catch up as they demonstrate their value in the agent economy.

The opportunity is vast.

Important information

Baillie Gifford US Growth Trust has a significant investment in private companies. The Trust’s risk could be increased as these assets may be more difficult to sell, so changes in their price may be greater.

Investments with exposure to overseas securities can be affected by changing stock market conditions and currency exchange rates. The Trust’s exposure to a single market and currency could increase risk.

The views expressed in this article should not be considered as advice or a recommendation to buy, sell or hold a particular investment. The article contains information and opinion on investments that does not constitute independent investment research, and is therefore not subject to the protections afforded to independent research.

Some of the views expressed are not necessarily those of Baillie Gifford. Investment markets and conditions can change rapidly, therefore the views expressed should not be taken as statements of fact nor should reliance be placed on them when making investment decisions.

This communication does not constitute, and is not subject to the protections afforded to, independent research. Baillie Gifford and its staff may have dealt in the investments concerned. The views expressed are not statements of fact and should not be considered as advice or a recommendation to buy, sell or hold a particular investment.

Baillie Gifford & Co and Baillie Gifford & Co Limited are authorised and regulated by the Financial Conduct Authority (FCA). The investment trusts managed by Baillie Gifford & Co Limited are listed on the London Stock Exchange and are not authorised or regulated by the FCA.

A Key Information Document is available by visiting bailliegifford.com

Explore this page

Enter AI agents: why 2025 rewrote the AI timeline

Key points

Agents are here

AI is completing more complex tasks

OpenClaw

Sizing the opportunity

Not all software is equal

Lots of chips

US Growth Trust portfolio positioning

Important information

About the author

Joint manager, US Growth Trust

Explore this page

Agents are here

AI is completing more complex tasks

OpenClaw

Sizing the opportunity

Not all software is equal

Lots of chips

US Growth Trust portfolio positioning

Important information

About the author

Gary RobinsonJoint manager, US Growth Trust

Joint manager, US Growth Trust

Related insights

US Perspectives: the lifecycle of healthcare innovation

June 2026US Perspectives|4

US Perspectives: notes from the road

May 2026US perspectives|4

US Perspectives: fear, fundamentals and fortitude

April 2026US perspectives|6

Related funds

Baillie Gifford US Growth Trust

June 2026
US Perspectives|4

May 2026
US perspectives|4

April 2026
US perspectives|6