AI’s Next Wave: Beyond Scaling

For years, increasing compute and data has been a straightforward way to improve AI. Larger models trained on vast datasets perform better on a wide range of tasks, from language understanding to image generation. At TEDAI San Francisco in October 2024, Noam Brown, a research scientist at OpenAI, said “The incredible progress in AI over the past five years can be summarized in one word: scale.”

However, recent trends suggest that scale may no longer be enough to drive major improvements. Despite continued investments in ever-larger models, some companies are achieving competitive performance through alternative approaches. Chinese AI startup DeepSeek, for example, demonstrated that top-tier performance can be achieved with significantly lower costs, highlighting a growing shift away from the assumption that bigger is always better.

Is AI Hitting a Plateau?

Early on, increasing model size and training data produced substantial performance improvements. But as models grew larger, the benefits became smaller.

In May 2024, OpenAI CEO Sam Altman told staff he expected their generative AI model Orion to be significantly better than the last flagship model, released a year earlier. While Orion’s performance ended up exceeding that of previous models, the increase in quality was far smaller compared with the jump between GPT-3 and GPT-4. Even the names of OpenAI’s models appeared to indicate slowing progress: GPT-4, GPT-4o, o1, o3. And, in fact, Sam Altman recently announced on X that Orion will be shipped as GPT-4.5, instead of the expected GPT-5.

One challenge with AI: high-quality training data is finite. Models have already consumed much of the high-quality text available online, making additional data less impactful. Experimentation is happening with AI-generated training data but with limited success. Orion was trained in part on AI-generated data, produced by other OpenAI models. However, some fear this may cause a new issue where new models end up resembling older models in certain aspects.

And while we can technically throw more compute (e.g. scaling up to larger GPU clusters) at an AI problem, the cost of that strategy can be prohibitive. Brown said at TEDAI, “After all, are we really going to train models that cost hundreds of billions of dollars or trillions of dollars? At some point, the scaling paradigm breaks down.”

What’s Next?

As the payoff from brute-force scaling appears to wane, AI innovators are exploring new approaches. Rather than relying on ever-larger models and massive budgets, the spotlight is shifting toward more efficient and cost-effective strategies.

Smaller, Task-Specific Models

Small language models are efficient, cost-effective, and can be fine-tuned for specific applications like fraud detection or personalized recommendations. One data expert noted in an AlphaSense transcript a trend toward more excitement for smaller, more tailored models, with considerations for cost implications in terms of running the model and model size. Another expert predicts the emergence of new platforms that will let companies create and customize their own small language models.

Illustrating this trend is AI startup Writer, which claims its latest language model matches the performance of the largest top-tier models on many key metrics despite, in some cases, having only a 20th as many parameters.

These smaller architectures also pair well with hybrid strategies. For tasks requiring broad domain knowledge or more complex reasoning, organizations can tap into major models; for highly targeted use cases, they can rely on smaller ones.

Advancing Through Reasoning

Training bigger models isn’t the only way to boost performance. Increasingly, AI teams are exploring inference-stage innovations — optimizations applied when a trained model is used — rather than just piling on more training resources. Techniques like chain of thought reasoning allow models to break tasks into smaller steps, improving accuracy.

OpenAI’s reasoning model o1 improves its performance by using more computing resources — and taking more time — as it answers users’ questions. Yet at $15 per million input tokens — compared with $1.25 for GPT-4o — o1 is twelve times more expensive, leading many to question whether that cost is justified for most use cases.

Meanwhile, DeepSeek, founded in 2023, reached No. 1 on Apple’s App Store a week after the release of its R1 model, which works along similar lines to OpenAI’s o1. Presented with a complex challenge, DeepSeek takes time to consider alternate approaches before picking the best solution, explaining its chain of reasoning to users.

Architectural Innovations and Cost Intelligence

DeepSeek’s R1 model, mentioned above, doesn’t just match OpenAI’s o1 model in quality — it’s significantly cheaper and nearly twice as fast. For only $5.6 million in training costs — much less than OpenAI’s hundreds of millions of dollars — DeepSeek showed the world that a high-quality reasoning model doesn’t have to be extremely expensive. Part of its edge comes from multi-token predictions and advancements in floating point architecture.

In addition, DeepSeek’s V3 model, introduced a month before R1, also uses a mixture of experts (MoE) architecture, where an AI model is divided into separate sub-networks, or “experts”. Each expert specializes in a subset of the input data, allowing the model to process information more efficiently.

Apple researchers recently offered further insight into DeepSeek’s “secret sauce”, explaining that part of DeepSeek’s efficiency also comes from sparsity. In a sparse model, many weights remain inactive during certain computations, i.e. only a small subset of experts are activated, significantly reducing memory and compute overhead without sacrificing performance.

Another key technique lowering costs is distillation, where a small model learns from a larger model by asking it hundreds of thousands of questions and analyzing the answers. This allows companies to replicate much of a state-of-the-art model’s performance without incurring the same massive training overhead. OpenAI said there were indications that DeepSeek did this.

Recently, NovaSky, a research lab at the University of California, Berkeley, released a distilled version of an open-source model from Alibaba, claiming performance on par with a recent OpenAI model — for just $450. Shortly after, researchers at Stanford and the University of Washington trained and open-sourced a reasoning model s1 that was distilled from one of Google’s reasoning models for under $50.

Some Are Still Scaling Up

Not everyone has given up on brute-force scaling. As one AI expert noted in an AlphaSense transcript, the likelihood of a plateau happening is higher in the near term than the long term, as cost structures may improve over time.

In January, OpenAI announced the Stargate Project, which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. That same month, Meta revealed plans to spend up to $65 billion this year to expand its AI infrastructure. Whether these massive investments will break through the likely plateau or simply reinforce the diminishing returns of brute-force expansion remains to be seen.

The Future of AI

As the returns from brute-force scaling begin to fade, AI stands ready to evolve in fresh, meaningful ways. By focusing on architectural innovations, inference-stage improvements, and domain-tailored models, organizations can chart new paths to high performance without incurring astronomical costs. In many ways, this shift signals a new era where clever engineering and adaptability matter as much as raw computational horsepower — one that empowers even smaller players to drive breakthroughs once reserved for tech giants.

ABOUT THE AUTHOR

Sarah Hoffman
Director of Research, AI

Sarah Hoffman is Director of Research, AI at AlphaSense, where she explores artificial intelligence trends that will matter most to AlphaSense’s customers. Previously, Sarah was Vice President of AI and ML Research for Fidelity Investments, led FactSet’s ML and Language Technology team and worked as an Information Technology Analyst at Lehman Brothers. With a career spanning two decades in AI, ML, natural language processing, and other technologies, Sarah’s expertise has been featured in The Wall Street Journal, CNBC, VentureBeat, and on Bloomberg TV. Sarah holds a master’s degree from Columbia University in computer science with a focus on natural language processing, and a B.B.A. from Baruch College in computer information systems. Sarah is based in New York.

Read all posts written by Sarah Hoffman