Inside a NEW AI Cluster - Tour with NVIDIA B200 — Full Transcript (June 6, 2026)

[00:00:00] Speaker 1: This is a cluster of thousands of the newest NVIDIA B200 GPUs housed in Supermicro servers. Now we get to show you around this high-end Lambda AI cluster worth hundreds of millions of dollars, it is certainly time for another AI cluster tour. Today we're going to do something fun, touring the Lambda AI cluster in Columbus, Ohio, housed by Coologix. This is one of the first NVIDIA B200 clusters that you can get access to. And the cool thing is that you can start a cluster with just one click, ranging from 16 GPUs to over 1,500 GPUs. What's more, there is a vast and Supermicro-powered storage array that reaches into the tens of petabytes to store a vast amount of data for Lambda's AI customers. I just want to say thank you to the folks that are sponsoring and supporting this video, including Supermicro, NVIDIA, Lambda, and Coologix. We couldn't do a video like this without all their support, so thank you so much. Let's get to it. [00:00:58] Speaker 2: One of Lambda's unique differentiators is we built a cloud that not only serves machine learning teams with on-demand instances and clusters, but we also have a private cloud business, which allows hyperscalers and AI labs to deploy 50,000 to 100,000 GPU clusters at the gigawatt scale. [00:01:14] Speaker 1: We cannot tell you the exact number of NVIDIA B200 GPUs here, but what we can say is that it's already in the several thousand GPUs strong, and the installation is growing. [00:01:24] Speaker 2: Lambda's Columbus Data Center represents our second expansion in the Midwest. Ultimately, we're going to be growing into the hundreds of megawatts into 2026, and starting to deploy liquid-friendly chip technology as well. [00:01:35] Speaker 1: While liquid cooling in the next 18 months is going to be standard for AI clusters, the Supermicro HGX B200 servers can thrive even in air-cooled environments. And using air cooling means that they can be deployed faster than if a facility has to be plumbed for liquid. On each side of the cluster, there are these giant heat exchangers that look super cool. They work in a very similar manner to your car's radiator, where the goal is to extract heat efficiently. Now, these heat exchangers work in tandem with the chillers that are on the roof of the data center. We didn't get to go up there, but that's the other side of this installation. I'm now here in the power yard for this CoLogix facility. Now, this is about a 36-megawatt facility. It has its own substation over there. You'll then see on this side that we have a number of generators. These are Rolls-Royce generators that are in their own little huts. And then on this side over here, we have all the battery backup systems and UPS supplies, all that kind of stuff for the data center. Now, each one of these is something like 1.6 megawatts. All of this is regularly inspected, and it's also designed for safety and security, so that way it can continue to power the data center no matter what's happening outside. The design of this facility has cold aisles. There, the air is pulled through the Supermicro GPU servers and all the heat sinks and what have you inside, and then they go to the rear of the racks where they're in the hot aisle. In the hot aisle, the hot air is then contained and rises only to be circulated through these giant heat exchangers on the walls. Once the air goes through these heat exchangers, it's then cooler, and that allows it to be recycled over and over again through the cluster. Storage is provided by an all-flash array powered by VAST data. VAST is focused mostly on software, so here, the NVMe storage nodes are Supermicro 1U servers. Part of the magic of this array is that it's both fast as well as dense. The NVMe SSDs contained in these arrays are physically smaller than today's hard drives, but they're also much higher performance and have higher capacities than those larger hard drives. This is important since the job of this network storage is to deliver petabytes of data to the GPUs as fast as possible. The array needs to do that to make sure that the GPUs are not idle waiting for data. And the high density of flash storage also means that the arrays can reach tens of petabytes while also occupying only a fraction of the space that you would need if you wanted the same capacity and performance using hard drive arrays. Networking here is really neat. We saw the NVIDIA Quantum 2 networking, which is built around NDR InfiniBand running at 400 gigabit per second speeds. As a result, there are eight NVIDIA Connect X7 NICs in the rear of each of the Supermicro HDX B200 servers. One NIC goes with one GPU. In each aisle, there are several NVIDIA Quantum 2 switches. Fiber connects the Supermicro HDX B200 AI servers to the scale-out GPU fabric. There's also an Ethernet-based fabric. Each of the Supermicro HDX B200 servers also has an NVIDIA Bluefield 3 DPU. These DPUs are not just simple networking cards like you would find in your own systems, maybe your desktop. Instead, each Bluefield 3 DPU has 16 cores, 16 gigabytes of memory, and even runs its own operating system to efficiently offload networking tasks. These DPUs in each server can drive 400 gigabits of network bandwidth each and have built-in accelerators to ensure that these cards can keep pace. Having these high-end NICs is important in AI clusters like Lambda is building because the goal is to service multiple customers from these clusters. That just puts more stress on the network cards. And let's face it, one of the coolest parts of these clusters is simply seeing all of the fiber cabling that goes into the networking side. Now, of course, the cluster that we just saw, that is really running customer workloads right now. But there is this entire other section where even more NVIDIA HGX B200 systems from Supermicro are being built by Lambda because they are growing their clusters in Columbus. Now, we've mentioned that this is one of the first NVIDIA B200 clusters based on Supermicro's air-cooled NVIDIA HGX B200 platform. So it's probably worth looking at the system, so we pulled one out of the rack so we can show you. Now, when we say that you're provisioning a GPU, what we really mean is that you're provisioning a box like this. This is a Supermicro NVIDIA HGX B200 8GPU platform. So there are eight NVIDIA GPUs of the B200 generation in this, as well as all the CPUs, memory, networking that makes this available to the cloud infrastructure. Now, this tray that we can pull out of the system, this really has all the GPUs. So this has our NVIDIA HGX platform. It also has our eight B200 GPUs with the air cooling. Now, of course, there are liquid-cooled data centers and air-cooled data centers. This particular one that we're at today is an air-cooled data center. So you're going to see these giant heat sinks that are really there to keep a couple of components cool. The first is, of course, those massive B200 GPUs, but there's also other components like the ND-Link switches, as well as things like PCIe timers on this board. But these GPUs don't just work by themselves. They, of course, need the context of being in a host system, and that's really what the rest of this giant chassis is designed for. Now, the host system, that's really this bottom portion right here. And what you're going to see is a couple of key areas. The first right here is where we have all of our I.O. expansion cards, or at least our expansion cards for the host systems. We have our boot drives, and then we also have a storage area and a fan area. So get into those sections really quickly. Over here, we have our two boot SSDs, and these are really just there to go and boot the operating system and do those types of functions. We then have our north-south network ports, and these are really provided in this system by the NVIDIA Bluefield 3 DPU. Then we have our storage, and you'll see this giant partition here. And these SSDs are really here for our local AI storage. Of course, as you'll see in the data center, there are multi-petabyte arrays for all of the massive cluster storage, but you also need a little bit of storage in these boxes as well. The bottom portion, that's really fans, because we need to go cool the dual CPUs and all of the system memory. And then just kind of a fun little one, we have our front VGA and USB ports, because these systems generate enough heat that you definitely want to be on the cold isle when you service them. So you'll see those right here. Okay, so taking a look at the back of the system, you're going to see a number of things that are super important to this. The first thing you're going to see on top here are the fans, because we need to move a ton of air through this system, especially in an air-cooled GPU server. Now, powering this system is super important, and that's why we have six titanium-level efficiency power supplies that are each 5,250 watts. This is awesome just in terms of the amount of power they put out, but also the efficiency level. Just as something fun on these power supplies, Supermicro switched to a design where there are two power inputs per power supply a couple of generations ago, and that allows you to use fewer, larger fans, but also reduce the number of power supply modules that you need in a system like this. Now, of course, all of the fans and power supplies, you can pop out. They're hot, swappable, and designed to be redundant. But there's one other really cool feature on the back of this server that you can actually service by just pulling the thing out, and that is the NIC tray. On the very bottom here, you're going to see a lot of networking. Now, for our low-speed networking that you just need in every server, we have two NIC ports here, plus one more management port, which really provides that out-of-band management for the entire server. And on the bottom, we have eight NVIDIA ConnectX NICs to run our NVIDIA Quantum 2 InfiniBand-based networking. Each GPU in the server gets a NIC to handle its east-west network traffic. And we've shown this before to you on many videos where we've actually pulled these NIC trays out. We're not going to do that here because, well, it has all the NICs and I don't want to break it. And a fun fact on this is that these servers are so big, they now are 130 kilograms or about 286 pounds, certainly something I don't want to lift by myself. Now, of course, we're just looking at the B200 generation, but there are a lot of other types of GPUs and also a lot of different types of systems for AI out there. A good example of that, which we've looked at a number of times on the STH main site and also on our YouTube channel, is the NVIDIA GB200 NVL72 racks. Now, these, of course, are the fully integrated liquid-cooled racks. They also use a lot more power than what we're looking at here in terms of systems. And because of that, you generally need different types of facilities. Lambda actually has Supermicro GB200 NVL72 racks that are already deployed in production clusters. Example of that is this one that you're seeing now. This particular one is in Mountain View, California, just down the way from where STH started. Now, this giant system integrates 72 Blackwell GPUs into an entire domain, so it's like a giant scale-up GPU. There's also 36 NVIDIA Grace CPUs here to really manage and manage all the host processes. There are also three different types of networks. One is powered by the NVIDIA Bluefield 3DPUs. That's really for your north-south networking from your host CPUs to the rest of the cluster. Then you have your east-west networking, which, of course, can be either InfiniBand or Ethernet. And the third type of network is NVLink. Now, NVLink is what interconnects all of the GPUs and makes this entire rack operate as a single GPU. It's an extraordinarily complex feat of engineering that has the interconnects between all of the NVLink components, like the GPUs, but also the NVLink switches. In the center of this rack, you're going to see that we have our NVLink switch trays. These are flanked on top and bottom by the GB200 trays. Unlike a standard server where you have power supplies that are built into the server, you actually have the power supplies that sit in power supply shelves on the top and bottom of racks. You'll also see that this rack integrates liquid cooling. The reason for that is that to get this level of density, you obviously need to remove a lot of heat from the system, and you would need so many fans that you couldn't get the density of accelerators and really use that NVLink spine in all those cartridges. They just wouldn't reach far enough, so you literally need liquid cooling to be able to get that density and make this entire rack operate as a single GPU. Now, these racks are so complex that they need to be built and tested at Supermicro and then delivered in their entirety to customers like Lambda. That means that there's tons of telemetry data and all kinds of aspects of this that have to work right to make sure that such a complex system works reliably. And so while we're at a facility today looking at the HGX V200 systems, the key message here is that they're not the only types of AI systems that a company like Lambda is going to deploy and a company like Supermicro sells. Instead, there's a wide variety of systems. And we're at one location where we have the HGX V200, but there are other locations with things like the GB200 NVL72 racks for different types of customers, but also just geographies. Folks like to have their GPU clusters doing inference, often close to their own infrastructure or their customers. And this is just an example of where you need to locate the right systems in the right location. And so while the HGX V200 platform that we're looking at today is that HGPU platform that we've seen for generations from companies like Supermicro and NVIDIA, it's also moving to a world where we're seeing more of these integrated rack solutions. And as we move to that, the mix between the HGPU systems and also the larger integrated racks, I think it's going to change over time. But as we stand today, both types of systems are super popular. And so that's why we're looking at both. Now, there are certainly companies building their own AI clusters, but not everyone has hundreds of millions of dollars or billions of dollars to build them. Also, it takes time and expertise to procure and then also get the clusters up and running and performing well in the first place. So if you don't want to do that, instead, Lambda has an AI cluster solution that's super easy. [00:13:53] Speaker 2: It calls it OneClick. The great thing about Lambda's OneClick clusters is it allows machine learning teams to scale from 16 GPU clusters all the way to thousands, allowing them to go from R&D to production even faster. [00:14:05] Speaker 1: And of course, if you just want way more GPUs and you want a three-year commitment and stuff like that, I'm sure that you can call the Lambda sales guys and they'll be happy to talk to you about setting up your own private cluster as well. But for everybody else, this is a super easy process that avoids having to hit a salesperson and then wait. If you just want to get your project up and running and don't want that kind of delay, then this is a great solution. Hey guys, I hope you like this look at an awesome NVIDIA Blackwell cluster at Lambda Labs with Supermicro and CoLogix. Just being able to see something that's so big in terms of a cluster, but also that's so easy to go and get on, I think it's super cool. And if you did like this video, well, you should definitely share it with your friends and colleagues, but I'm going to make another ask. If you did like this video or you have other things you want to see, let me know down in the comments because we're going to be doing a bunch more of these types of tour videos in the future. In fact, I'm actually in Eastern Europe right now, hopefully getting to do something really cool tomorrow that I'm super excited to bring you. And hey, if you did like this video and you want to see some more of our content in the future, we'll definitely give this video a like, click subscribe and turn on those notifications so you can see whenever we come out with something awesome. With that, thanks for watching and have an awesome day.

Inside a NEW AI Cluster - Tour with NVIDIA B200

Related Transcripts from ServeTheHome

Transcribe Any Video or Podcast — Free