Trimming profits, delaying launches, begging friends. Companies are going to extreme lengths to make do with shortages of GPUs, the chips at the heart of generative AI programs.
Around 11 am Eastern on weekdays, as Europe prepares to sign off, the US East Coast hits the midday slog, and Silicon Valley fires up, Tel Aviv-based startup Astria’s AI image generator is as busy as ever. The company doesn’t profit much from this burst of activity, however.
Companies like Astria that are developing AI technologies use graphics processors (GPUs) to train software that learns patterns in photos and other media. The chips also handle inference, or the harnessing of those lessons to generate content in response to user prompts. But the global rush to integrate AI into every app and program, combined with lingering manufacturing challenges dating back to early in the pandemic, have put GPUs in short supply.
That supply crunch means that at peak times the ideal GPUs at Astria’s main cloud computing vendor (Amazon Web Services), which the startup needs to generate images for its clients, are at full capacity, and the company has to use more powerful—and more expensive—GPUs to get the job done. Costs quickly multiply. “It’s just like, how much more will you pay?” says Astria’s founder, Alon Burg, who jokes that he wonders whether investing in shares in Nvidia, the world’s largest maker of GPUs, would be more lucrative than pursuing his startup. Astria charges its customers in a way that balances out those expensive peaks, but it is still spending more than desired. “I would love to reduce costs and recruit a few more engineers,” Burg says.
There is no immediate end in sight for the GPU supply crunch. The market leader, Nvidia, which makes up about 60 to 70 percent of the global supply of AI server chips, announced yesterday that it sold a record $10.3 billion worth of data center GPUs in the second quarter, up 171 percent from a year ago, and that sales should outpace expectations again in the current quarter. “Our demand is tremendous,” CEO Jensen Huang told analysts on an earnings call. Global spending on AI-focused chips is expected to hit $53 billion this year and to more than double over the next four years, according to market researcher Gartner.
The ongoing shortages mean that companies are having to innovate to maintain access to the resources they need. Some are pooling cash to ensure that they won’t be leaving users in the lurch. Everywhere, engineering terms like “optimization” and “smaller model size” are in vogue as companies try to cut their GPU needs, and investors this year have bet hundreds of millions of dollars on startups whose software helps companies make do with the GPUs they’ve got. One of those startups, Modular, has received inquiries from over 30,000 potential customers since launching in May, according to its cofounder and president, Tim Davis. Adeptness at navigating the crunch over the next year could become a determinant of survival in the generative AI economy.
“We live in a capacity-constrained world where we have to use creativity to wedge things together, mix things together, and balance things out,” says Ben Van Roo, CEO of AI-based business writing aid Yurts. “I refuse to spend a bunch of money on compute.”
Cloud computing providers are very aware that their customers are struggling for capacity. Surging demand has “caught the industry off guard a bit,” says Chetan Kapoor, a director of product management at AWS.
The time needed to acquire and install new GPUs in their data centers have put the cloud giants behind, and the specific arrangements in highest demand also add stress. Whereas most applications can operate from processors loosely distributed across the world, the training of generative AI programs has tended to perform best when GPUs are physically clustered tightly together, sometimes 10,000 chips at a time. That ties up availability like never before.
Kapoor says AWS’ typical generative AI customer is accessing hundreds of GPUs. “If there’s an ask from a particular customer that needs 1,000 GPUs tomorrow, that’s going to take some time for us to slot them in,” Kapoor says. “But if they are flexible, we can work it out.”
AWS has suggested clients adopt more expensive, customized services through its Bedrock offering, where chip needs are baked into the offering without clients having to worry. Or customers could try AWS’ unique AI chips, Trainium and Inferentia, which have registered an unspecified uptick in adoption, Kapoor says. Retrofitting programs to operate on those chips instead of Nvidia options has traditionally been a chore, though Kapoor says moving to Trainium now takes as little as changing two lines of software code in some cases.
Challenges abound elsewhere too. Google Cloud hasn’t been able to keep up with demand for its homegrown GPU-equivalent, known as a TPU, according to an employee not authorized to speak to media. A spokesperson didn’t respond to a request for comment. Microsoft’s Azure cloud unit has dangled refunds to customers who aren’t using GPUs they reserved, the Information reported in April. Microsoft declined to comment.
Cloud companies would prefer that customers reserve capacity months to years out so those providers can better plan their own GPU purchases and installations. But startups, which generally have minimal cash and intermittent needs as they sort out their products, have been reluctant to commit, preferring buy-as-you-go plans. That has led to a surge in business for alternative cloud providers, such as Lambda Labs and CoreWeave, which have pulled in nearly $500 million from investors this year between them. Astria, the image generator startup, is among their customers.
AWS isn’t exactly happy about losing out to new market entrants, so it’s considering additional options. “We’re thinking through different solutions in the short- and the long-term to provide the experience our customers are looking for,” Kapoor says, declining to elaborate.
Shortages at the cloud vendors are cascading down to their clients, which include some big names in tech. Social media platform Pinterest is expanding its use of AI to better serve users and advertisers, according to chief technology officer Jeremy King. The company is considering using Amazon’s new chips. “We need more GPUs, like everyone,” King says. “The chip shortage is a real thing.”
OpenAI, which develops ChatGPT and licenses the underlying technology to other companies, relies heavily on chips from Azure to provide its services. GPU shortages have forced OpenAI to set usage limits on the tools it sells. That’s been unfortunate for clients, such as the company behind AI assistant Jamie, which summarizes audio from meetings using OpenAI technology. Jamie has delayed plans for a public launch by at least five months, partly because it wanted to perfect its system, but also because of usage limits, says Louis Morgner, a cofounder of the startup. The issue hasn’t abated. “We’re only a few weeks out before going public and will then need to monitor closely how well our system can scale, given the limitations of our service providers,” Morgner says.
“The industry is seeing strong demand for GPUs,” OpenAI spokesperson Niko Felix says. "We continue to work on ensuring our API customers have the capacity to meet their needs."
At this point, any connection that can give a startup access to computing power is vital. Investors, friends, neighbors—startup executives are drawing on a wide variety of relationships to get more AI firepower. Astria, for example, secured additional capacity at AWS with help from Emad Mostaque, CEO of Stability AI, which is a close partner of AWS and whose technology Astria builds upon.
Bookkeeping startup Pilot, which uses OpenAI tech for some mundane data sorting, gained early access to GPT-4 after asking for aid from university friends, employees, and venture capitalists with connections to OpenAI. Whether those ties accelerated Pilot’s move off a waiting list is unclear, but it now spends about $1,000 a month on OpenAI, and those connections could come in handy when it needs to increase its quota, CEO Waseem Daher says. “If you don’t take advantage of this [generative AI technology], someone else will, and it’s powerful enough you don’t want to risk that,” Daher says. “You want to deliver the best results for your customers and stay on top of what’s happening in the industry.”
As well as battling to get access to more power, companies are trying to do less with more. Companies experimenting with generative AI are now obsessing about “optimization"—making processing, with satisfactory results, possible on the most affordable GPUs. It’s analogous to saving money by ditching an old, energy-guzzling fridge that’s just storing a few drinks for a modern minifridge that can run on solar most of the time.”
Companies are trying to write better instructions for how chips should process programming instructions, trying to reformat and limit the amount of data used to train AI systems and then strip the inference code down to the bare minimum needed to handle the task at hand. That means building out multiple, smaller systems—perhaps one image generator that outputs animals and another that creates images of humans and switching between them depending on the user prompt.
They are also scheduling processes that are not time-sensitive to run when GPU availability is highest and making compromises to balance speed with affordability.
Speech-generating startup Resemble AI is content with taking a tenth of second longer to process a customer request on an older chip if it means spending a tenth of what higher-end options would command, with no noticeable difference in audio quality, says CEO Zohaib Ahmed. He’s also willing to look beyond Lambda and CoreWeave as their terms become less palatable—with encouragements to make longer-term commitments. CoreWeave declined to comment, and Lambda did not respond to a request for comment.
Resemble turned to FluidStack, a tiny provider that welcomes one-week or one-month GPU reservations, and has recently joined San Francisco Compute Group, a consortium of startups jointly committing to buy and split GPU capacity. “The startup ecosystem is trying to get together and try to figure out ‘How do we battle, how do we fight for compute?’ Otherwise, it would be a really unfair game. Prices are just too high,” Ahmed says.
He gets a glimmer of hope about the shortages every Monday morning, he says. A sales representative at Lambda, the cloud provider, has been writing him, asking if Resemble wants to reserve any of Nvidia’s newest chips, the H100. That there is availability is exciting, Ahmed says, but those chips have only been widely available since March, and it’s just a matter of time before companies testing them perfect the code to go all-in on them. Nvidia will come out with its latest and greatest, the second-generation GH200, next year. Then the cycle of scarcity will start all over again.
Recommended Comments
There are no comments to display.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.