BENGALURU: AI is a particularly well-suited tech trajectory for India: Cerebras’ Feldman

BENGALURU: AI is a particularly well-suited tech trajectory for India: Cerebras’ Feldman

BENGALURU: Andrew Feldman, co-founder and CEO of US-based Cerebras Systems, which last month released what it touts as the world’s largest and fastest artificial intelligence (AI) ‘chip’ with 4 trillion transistors, insists that while AI is not a silver bullet, it can certainly solve many of the world’s problems. In a video interview from his US office, Feldman spoke about areas where his company’s third-generation wafer-scale AI accelerator—the CS-3—scores over Nvidia’s graphics processing units (GPUs). He also shared his thoughts on how India can leverage AI and Generative AI (GenAI) tools, and whether GenAI is being oversold, among other things. Edited excerpts:

How is this large chip (CS-3) helping train AI models faster, and how do you see businesses, academic institutions and governments leveraging it?

One of the fundamental challenges in AI right now is to distribute a single model over hundreds or thousands of GPUs. You can’t fit the big matrix multipliers (matrix multiplication is a big part of the math done in deep learning models, which requires significant computing power) on a single GPU. But we can fit this on a single wafer, and so we can bring to the enterprise and the academician, the power of tens of thousands of GPUs but the programming simplicity of a single GPU, helping them do work that they wouldn’t otherwise be able to do.

We are able to tie together dozens or hundreds of these (chips) into supercomputers and make it easy to train big models. GPT-4 cited in its paper that it had 240 contributors, of which 35 are mostly doing distributed computing. Which enterprise company has 35 supercomputer jockeys, whose job it is to do distributed computing? The answer is, very few. That means it’s very difficult for them (most companies) to do big AI work. We eliminate that need (with this big chip).

Please share some examples of how companies across sectors are working with you to leverage this big chip.

Companies like GlaxoSmithKline Pharmaceuticals are using us to do genomic research and in the drug design workflow. Companies like Mayo Clinic, one of the leading medical institutions in the world, have multiple projects. Some of them are looking at using genetic data to predict which rheumatoid arthritis drug would be best for a given individual. Others are doing hospital administration—how you can predict how long a patient will stay in a hospital based on the medical history.

Customers like Total (TotalEnergies)—the giant French oil and gas company—are using us to do AI work in oil exploration. We also have government customers and those who are doing research on the Covid virus. We have government researchers who include our system in giant physics simulations, where they use what’s called a Simulation plus AI or HPC (high performance computing) plus AI (system), where the AI is doing some training work and recommending starting points for the simulator.

How’s your partnership with Abu Dhabi-based Group 42 Holding panning out for the Arabic LLM and supercomputers you’re building with them?

G42 is our strategic partner. We’ve completed two supercomputers in the US, four exaflops each, and we’ve just started the third supercomputer in Dallas, Texas. We announced that we will build nine supercomputers with them. We also saw the opportunity to train an Arabic LLM to cater to the 400 million native Arabic speakers. G42 had the data and we both had researchers who we brought together to train what is head and shoulders, the best Arabic model in the world.

We have many projects underway with them. We also trained one of the best coding models called Crystal Coder. We have also worked with M42, a JV between G42 Healthcare and Mubadala Health, and trained a medical assistant. The aspirations in the Emirates are extraordinary, and the vision and desire to be a leader in AI, exceptional.

What about India, where you have already had talks with certain companies and government officials, too?

We had lots of conversations with data centre owners, cloud providers, and with government officials in New Delhi. We have a team of about 40 engineers in Bangalore (Bengaluru) and we’re growing as fast as we can there. India has some of the great university systems—the IITs and NITs of the world. And many of the researchers working on big compute problems around the world were trained in India.

Obviously, India is one of the most exciting markets in the world, but it does not have enough supercomputers for the talent it has. So, it’s both important for sovereignty and for a collection of national issues to have better infrastructure in India to create an opportunity to keep some of its world-class talent that wants to work on the biggest supercomputers.

I think AI is a particularly well-suited technology trajectory for India. It builds on a strength in CS (computer science) and in statistics that you’ve had in your university system for generations.

Talking about your big chip, companies now appear to be focused more on fine-tuning large language models (LLMs), building smaller language models (SLMs), and doing inference (using a pre-trained AI model to make predictions or decisions on new, unseen data), rather than building large multimodal models (LMMs) and foundational models. Many such customers would do with fewer GPUs. Wouldn’t your big chip prove an overkill and too expensive for such clients?

The amount of compute you need is approximately the product of the size of the model, and times the number of tokens you train on. Now, the cost to do inference is a function of the size of the model. And so, as we’re thinking about how to deploy these models in production, there’s a preference for smaller models. However, there isn’t a preference for less accuracy. So, while the models might be 7 billion(b), 13b or 30b, the number of tokens they’re trained on is a trillion, two trillion (and more). So, the amount of compute you need hasn’t changed. In fact, in many instances, it’s gone up.

Hence, you still need huge amounts of compute, even though the models are smaller, because you’re running so many tokens through so much data. In fact, you’re trading off parameter size with data. And you’re not using less compute, you’re just allocating it differently, because that has different ramifications on the cost of inference.

I also do not subscribe to the view that as inference grows, there will be less training. As inference grows, the importance of accuracy and training increases. And so, there will be more training. If you have a good model, say, for reading pathology slides, and it’s 93% accurate, and you don’t retrain, and someone else comes up with one that’s 94% accurate, who’s going to use your model? And so, there will be tremendous pressure as these are deployed to you to be better, and better, and better. Training continues for years and years to come.

Inference will come in many flavours—there will be easy batch inference, and then there will be real-time inference in which latency matters a huge amount. If you want, the obvious example, self-driving cars that are making inference decisions in near real time. And as we move inference to harder problems, and we include it in a control system, then the inference challenge is much harder. Those are problems we’re interested in. Most of the inference problems today are pretty straightforward, and we’ve partnered with Qualcomm, because they have an excellent offering. And we wanted to be sure we could show up with a solution that did not include Nvidia.

But what about the cost comparison with GPUs?

Inference is on the rise, and our systems today are being used for what I call are real-time, very hard inference problems—predominantly for defence and security. I think there will be more of those over time. But in the next 9-12 months, the market will be dominated by much easier problems.

That said, CS-3 costs about the same as three DGX H100s (Nvidia’s AI system capable of handling demanding tasks such as generative AI, natural language processing, and deep learning recommendation models), and gives you the performance of seven or 10. And so you have a dramatic price performance advantage.

But if you want one GPU, we’re not a good choice. We begin being sort of equivalent to 40 or 50 GPUs. So, we have a higher entry point, but we’re designed for the AI practitioner who’s doing real work—you have to be interested in training models of some size, or some complexity on interesting data sets. That’s where we enter.

But Nvidia is estimated to have 80-95% global market share in AI chips, and it will be very difficult to compete in this space.

I don’t think we have to take share. I think the market is growing so fast. I mean, Nvidia added $40 billion last year to their revenue. And that market is growing unbelievably quickly. The universe of AI is expanding at an extraordinary rate. And there’ll be many winners. We did big numbers last year. We’re going to do bigger numbers this year. We’ve raised in total about $750 million till date, and the last round’s valuation was $4.1 billion.

How water and energy efficient are your big chips?

There are several interesting elements of a big chip. Each chip uses about 18 kilowatts but it replaces 40 or 50 chips that use 600 watts. Moreover, when you build one chip, it does a lot of work—you can afford more efficient cooling. Also, GPUs use air, and air is an inefficient cooler. We use water, because we can amortize a more efficient and more expensive cooling system over more compute on a wafer. And so, we generally run per-unit compute, somewhere between a third and half the power draw. Why is that? The big chip allows us both to be more efficient in our compute, to keep information on the chip, not move it and spend power in switches (electronic switches are the basic building blocks of microchips), etc. It also allows us to use a more efficient cooling mechanism.

You’ve also said that this big chip has broken Moore’s law. What exactly do you mean?

Moore’s law said the number of transistors on a single chip would double every 18 months at lower costs. But, first, that required the shrinking of the fab geometries. Second, the chips got bigger themselves. But the radical limit, which constrains everybody but Cerebras, was about 815-820 square millimetres. We obliterated the radical limits and went to 46,000 square millimetres. So, in a single chip, we were able to use more silicon to break Moore’s law. That was the insight for this workload—the cost of moving data off the chip, the cost of all these switches, and the cost that forced Nvidia to buy Mellanox (a company Nvidia acquired in March 2019 to optimize datacentre-scale workloads) which could be avoided with a big chip. While everybody else is working with 60 billion, 80 billion, 120 billion transistors, we’re at 4 trillion.

Some people believe that GenAI is being overhyped and are becoming disillusioned, given its limitations including hallucinations, lack of accuracy, copyright violations, trademarks, IP violations, etc. The other school believes that GenAI models will iron out all these kinds of problems over a period of time and achieve maturity by 2027 or 2028. What’s your view?

These arguments only exist if there are kernels of truth on both sides. AI is not a silver bullet. It allows us to attack a class of problems with computers that have historically been foreclosed to us—like images, like text. They allow us to find insight in data in a new and different way. And generally, the first step is to make existing work better—better summarization, we replace people who do character recognition, we replace analysts who looked at satellite imagery with machines, and you get a modest performance of sort of societal benefit GDP (gross domestic product) growth. But it typically takes 3-7 years, following which you begin to reorganize things around the new technology, and get the massive bump.

For instance, computers first replaced ledgers, then assistants, and then replaced typewriters. And we got a little bump in productivity. But when we moved to the cloud, and reorganized the delivery of software where you could gain access to compute anywhere, we suddenly got a huge jump in unit labour and productivity.

So, there are kernels of truth in both arguments. But to people who say this is the answer to everything, you’re clearly going to be wrong. To people who say there are obviously large opportunities to have substantial impact, you’re right.

In my view, AI is the most important technology trajectory of our generation, bar none. But it’s not going to solve every problem—it will give us answers to many problems. It solves protein folding—a problem that humans had not been able to solve until then. It has made games like chess and poker, which had been interesting to people for hundreds and hundreds of years, trivial. It will change the way wars are fought. It will change the way drugs are discovered.

But will it make me a better husband? Probably not. Will it help my friendships? Will it help my dreams and aspirations? No. Sometimes we go crazy thinking about a new technology.

Talking about crazy, what are your quick thoughts on artificial general intelligence (AGI)?

I think we will definitely have machines that can do pretty thoughtful reasoning. But I don’t think that’s AGI. That’s data-driven logical learning. I am not optimistic about AGI in the next 5-10 years, as most people constructed. I think we will get better and better at extracting insight from data, extracting logic from data, and reasoning. But I don’t think we’re close to some notion of self-awareness.

In this context, what should CXOs keep in mind when implementing AI, GenAI projects?

There are three fundamental elements when dealing with AI—the algorithm, data, and computing power. And you have to decide where you have an advantage. Many CXOs have data, and they are sitting on a gold mine if they’re invested in curated data. And AI is a mechanism to extract insight from data. This can help them think about the partnerships they need.

Consider the case of OpenAI, which had algorithms; they partnered with Microsoft for compute, and they used open-source data. GlaxoSmithKline had data, partnered with us for compute, and had internal algorithm expertise. These three parts will help your strategy, and your data will be enormously important for construction of models that solve your problems.

Leave a Comment