year, a cloud service that tightly integrates GPUs and other hardware components with a growing suite of Nvidia software to develop and manage AI applications. For many of Nvidia’s top channel partners, DGX systems have become one of the main ways these solution providers fulfill the AI infrastructure needs of customers. Nvidia also steers partners to sell GPU systems it has certified from vendors like Hewlett Packard Enterprise and Dell Technologies. To Brett Newman, an executive at Plymouth, Mass.-based Micro- way, selling GPU-equipped systems can be lucrative because they carry a much higher average selling price than standard servers. But what makes the DGX systems even more appealing for the HPC systems integrator is that they are preconfigured and the software is highly optimized. This means that Microway doesn’t have to spend time sourcing components, testing them for compatibility and dealing with integration challenges. It also means less time is spent on the software side.As a result, DGX systems can have better margins than white-box GPU systems. “One of the blessings of the DGX systems is that they come with a certain level of hardware and solution-style integration. Yes, we have to deploy the software stack on top of it. But the time required in doing the deployment of the software stack is less time than is required on a vanilla GPU-accelerated cluster,” said Newman, who is Microway’s vice president of marketing and customer engagement for HPC and AI. Selling white-box GPU systems can come with its own margin benefits too if Microway can source components efficiently. “Both are good and healthy for companies like us,” Newman said. Nevertheless, Microway’s investment in Nvidia’s DGX systems has paid off, accounting for around one-third of its annual revenue since 2020, four years after the systems integrator first started selling the systems. “AI is a smaller base of our business, but it has this explosive growth of 50 percent or 100 percent annually and even stronger in those first days when DGX started to debut,” Newman said. Microway has grown its AI business not just with Nvidia’s hardware but its software too.The GPU designer’s growing suite of software now includes libraries, software development kits, toolkits, containers and orchestration and management platforms. This means there is a lot for customers to navigate. For Microway, this translates into training services revenue, though Newman said making money isn’t the goal. “We don’t treat it necessarily as the area where we want to make a huge profit center.We treat it as how do we do the right thing for the customer and their deployments and ensure they get the best value out of what they’re buying?” Newman said. From DGX systems and other GPU systems, Microway also has an opportunity to make money by consulting on what else a customer may need to achieve its AI goals, and this can involve other potential sources of compensation, such as recommending extra software for reselling.
“That’s been value that helps us differentiate ourselves,” he said. While Nvidia has dominated the AI computing space with its GPUs for years, the chip designer is now facing challenges on mul- tiple fronts, including large rivals like Intel andAMD and also cloud service providers like AWS designing their own chips. Even newer generations of CPUs, including Intel’s fourth-generation Xeon Scalable chips, are starting to come with built-in AI capabilities. “If you look at the last generation of CPUs, [Intel] added [Advanced Matrix Extensions] that make them useful for training. They’re not as great of a training device as an Nvidia GPU. How- ever, they’re always there in the deployment that you’re buying, so all of a sudden you can get a percentage of an Nvidia GPU worth of training with very little extra effort,” Newman said. From App Maker To Systems Integrator To AWS Rival In the realm of AI-focused systems integrators, none has had quite the journey as Lambda. Founded in 2012, the San Francisco-based startup spent its first few years developing AI software with an initial focus on facial recognition. But Lambda started down a different path when it released an AI-based image editor app called Dreamscope. The smartphone app got millions of downloads, but running all that GPU comput- ing in the cloud was getting expensive. “What we realized was we were paying AWS almost $60,000 a month in our cloud compute costs for it,” said MiteshAgarwal, Lambda’s COO. So Lambda’s team decided to build its own GPU cluster, which only cost around two months of AWS bills to assemble the collec- tion of systems, allowing the company to save significant money. This led to a realization: There was a growing number of deep learning research teams just like Lambda that could benefit from having their own on-premises GPU systems, so the company decided to pivot and start a systems integration business. But as Lambda started selling GPU systems, the company noticed a common issue among customers. It was difficult to maintain all the necessary software components. “If you upgraded CUDA, your PyTorch would break. Then if you upgraded PyTorch, some other dependencies would break. It was just a nightmare,” Agarwal said. This prompted Lambda to create a free repository of open- source AI software called Lambda Stack, which uses a one-line Linux command to install all the latest packages and manage all the dependencies. The repository’s inclusion in every GPU system gave Lambda a reputation for making products that are easy to use. “It just really helped make us stand out as a niche product,” Agarwal said. Soon enough, Lambda was racking up big names as custom- ers: Apple,Amazon, Microsoft and Sony, to name a few.This was boosted by moves to provide clusters of GPU systems and partner with ISVs to provide extra value. As a result, Lambda’s system revenue grew 60 times between 2017 and 2022.
9
JUNE 2023
Made with FlippingBook interactive PDF creator