The Infrastructure Behind AI Explained — AI Factory Insider Ep. 1 — Full Transcript (June 18, 2026)

[00:00:00] Kaushik Shiretti: I am super excited about this one. This is a brand new live series from NVIDIA where we are trying to pull back the curtain on how AI factories are actually being built. I'm your host, Kaushik Shiretti, and what we will be doing is we are going to bring in folks, you know, for every episode where we're bringing folks who are actually in the trenches making AI work, folks within NVIDIA and also from the industry. And so I want to quickly introduce our guests here. Our first topic today is talking about enterprise reference architectures. And I couldn't think of a better person than bringing in Shashank Sabluk, who is our product manager in our enterprise product manager group. And Shashank is one of the architects behind these enterprise reference architectures. So Shashank, you're our first guest. So no pressure. Welcome. [00:00:53] Shashank Sabluk: That's awesome. Thank you so much for having me. [00:00:56] Kaushik Shiretti: Of course. So before I go to Shashank, since this is our first series, I wanted to give a little bit context on what we're trying to do here. What we want to do is we are going to look at the AI factory holistically, and we're going to look at different pieces that go into the AI factory. And in each episode, we are going to go deeper into different parts and components that go make that AI factory with NVIDIA solutions, as well as everything from our ecosystem. And what really has happened with AI, as most of you guys know, is, you know, in the last two years, things have changed very rapidly. ChatGPT came in, you know, and it changed the world, right? Drove massive AI adoption, DeepSeek then followed up with like model efficiencies. And now we are all blessed to live in this world of agentic AI, where things are changing constantly, and sometimes too fast, sometimes it can be too overwhelming. And, you know, with all these agent frameworks with open cloth, these agents are reasoning, they're thinking they're working continuously, the agents are not sleeping, right? You know, some of us humans have to sleep. And so what that has done is that we it has made us think about infrastructure, both software and hardware infrastructure, in a different light, the inference performance requirements have gone off 10x, 50x, 100x, 500x, and sometimes. And so, you know, what enterprise are really looking for is a predictable, scalable solution, where they can scale out these solutions, they can produce tokens in a in a consistent way. And at the same time, think about governance, security, sovereignty, and all of those things that go into building and building an AI factory. And so, you know, for the purpose of this conversation, we're going to go deep into our enterprise reference architectures. But think about our reference architectures as the blueprint for the AI factories, we have the enterprise reference architectures that focus on the hardware. And then we have the enterprise validated designs that really focus on all the goodness of software, not just what NVIDIA does, but every company, every partner in our ecosystem that is contributing to ultimately serve the purpose of AI. And so with that, Shashank, let's, let's start here. Let's go to the basics. I want to understand, you know, what was the basic genesis of enterprise reference architectures? Why did you guys think about building these ERAs? [00:03:28] Shashank Sabluk: The goal for the enterprise reference architectures, Kaushik, was straightforward. We wanted to accelerate time of deployment for enterprise customers while still getting predictable scale and performance out of their AI infrastructure. And in support of this, we, you know, developed these guidelines that we call the enterprise reference architectures, and we work with our OEM partners. The thing is, we actually build out these reference architectures internally and validate them. So the idea was to capture those patterns and make them reusable for enterprises. And they're even more so relevant now, I would say, in this agentic AI era, Kaushik, right? Because I would say we are going through something called an infant tsunami. Agents, as you said, don't sleep. And it's not like circa 2022, where you prompt your chat GPT, a tab on Chrome, and it gives you a response. Agents reason, they retrieve information, they call tools, and they keep running, right? Which means token consumption has risen dramatically. And so the whole rationale behind the AI factory is that we want token generation to be efficient. That means we get the best cost per million tokens and the best tokens per watt. And this only happens as extreme co-designed across network storage, so that we can get that close to linear efficiency. [00:04:53] Kaushik Shiretti: I think that's a great point, but I just want to make sure that for our viewers here, when you talk about the enterprise reference architecture, you're not just talking about a white paper, you're talking about the things that do you test them? Do we actually build them? How do we think [00:05:13] Shashank Sabluk: about building those things? Precisely, right? So we actually, all of our enterprise RAs, we actually test them end to end, we build it out in our labs, we validate it, we make this, we based on NVIDIA certified system servers. Those servers themselves are like a known good quantity because they've gone through single node or multi node tests before we even like in our position to use them or even recommend them. On top of that, we have standardized on this concept of a scalable unit. Now, a scalable unit is four nodes, right? So once a customer proves things out at a smaller scale, they can then grow in a very predictable manner from four nodes to eight nodes, 12, 16, and so on. [00:05:57] Kaushik Shiretti: I always, you know, whenever I see these things, I love validation, right? I love like proof points. And I love that the emphasis that we have here is on testing the system. Because, you know, a lot of times we put a bomb together, and it's supposed to work that way. But, you know, I'm sure all of us have lived through this, that it's like, you know, trying to cook a cake, you know, bake a cake here, right? You have all the ingredients, you know, the bomb. Not always it turns the same way, right? Is that how you think about in production? What makes what works? What doesn't work? [00:06:32] Shashank Sabluk: Yeah, so Kaushik, so that's the thing, we don't want the enterprise RAs just to be a list of components, or just to be a bomb on a spreadsheet, right? Or a bill of materials. We actually build out these, these clusters, and we run workloads on them. So we make design choices for these clusters across our invidia certified systems, east-west networking, north-south networking, which we also call a storage networking. And so that when our system partners, or OEM partners, we call them, you know, go to market with their AI factory solutions, our system partners being Cisco, Dell, HPE, Lenovo, Supermicro, it behaves like a known quantity. And it's not just a science project, right? And on top of that, when our partners go to market with their AI factory solutions, you know, they bring it back to us, and they go through something called the design review board, where we thoroughly review their, you know, their offering, and we give them a thumbs up, right? So I think at the end, I think, Kaushik, the summary, like you always say to me, easy is hard. And we do the hard work upfront. So our OEM partners, and our, and enterprise customers can don't have to live through that complexity themselves. And they have very clean deployments, and which ultimately [00:07:48] Kaushik Shiretti: leads to a faster time to press token. Makes sense, makes total sense. And how do you, how do we keep up with the hardware roadmap, right? I mean, yeah, I think the whole world knows, right? Jensen and the team, where we've put Nvidia on a very, very aggressive roadmap, right? We have a one year cadence, and not just for the GPUs, right? A lot of times people think it's just GPUs, it's the entire company, the software, it's all the networking, all the components in this, how do the RAs keep up [00:08:18] Shashank Sabluk: with that? No, absolutely. So we treat the RAs as dynamic, not static, not static. On the GPU side, as you said, we've moved on from Hopper to Blackwell, and now we are very much entering the Rubin era. Similarly, on the networking side for switches, we've gone from Spectrum 4 to Spectrum 6. DPUs have gone from Bluefield 2 to Bluefield 4. So yeah, we obviously evolve our, evolve our designs over time to incorporate these latest and greatest technologies, which ultimately gives you, other, you know, forms the backbone of a modern AI cluster. But Kaushik, you know what's interesting is though that, you know, these RAs, they, I've seen like adoption across, you know, horizontally across a variety of use cases and industries, right? So these, we see these RAs showing up in manufacturing, healthcare, retail, life sciences, and then of course, you know, sectors such as financial services or government where clean, you know, where performance and, [00:09:15] Kaushik Shiretti: you know, clean deployments are absolutely critical. Can you, uh, can you maybe share like, uh, one or two concrete examples, uh, for our viewers? And, um, obviously I'm not sure you can share. [00:09:27] Shashank Sabluk: Yeah, no, sure. So I think, um, I think that an example that stands out to me is this one manufacturing, um, customer we worked with, um, that had, uh, I would say 22,000 practitioners of AI. I think everyone's going to be an AI practitioner moving forward, right? Um, you know, they were, uh, they were leveraging an AI factory that was based on our enterprise are almost exactly assets. And the feedback we got was that that was exactly what they needed, right? It minimized any guesswork, guesswork, there were no missing pieces, no like running around for cables, right? And the deployment was absolutely clean. And therefore, uh, as I say, you know, the, the time to first token was, uh, was fairly, was, you know, was fast. Um, they also appreciated, for example, some of the design decisions we made in our enterprise are is, for example, they, uh, the layer three dynamically routed model and networking and not using a layer layer two, uh, heavy design. It just, at the end of the day meant that the deployment was clean and, uh, performance was optimal. And now they were in a position to, you know, test things out and then scale in the future. Yeah. And I think that's, [00:10:30] Kaushik Shiretti: that's my experience too with a lot of our customers when they first start out with their AI journey, right? It's not about just a one hardware component or the performance in the hardware component or different software layers. It's, it's, it's that factory coming together and it comes through because there's a maniacal focus from, from Nvidia, from all of our ecosystem partners to really test it out, to really, you know, think about it as a relay race, right? You like, it's amazing if you have four sprinters, but unless you have like, really you work and practice on the handoff every day when nobody's looking, that's the, when the real magic happens, that's when the compounding effect of each of the component, whether it's hardware or software, the performance kind of really, really shines through. Um, how do you think about like, you know, for connecting something too tangible, right? So when like folks understand the importance of the RA now, what are the configurations that, you know, what are the paths for, for enterprise buyers or people who are making decisions to go look at when it comes to different configurations? Yeah. So no, before I get there, I really love your [00:11:39] Shashank Sabluk: example about the relay rates. Right. And I think that that co-design between all the systems working together is exactly what the RAs are there for. Uh, but coming to the tangible examples, um, you know, if you want to buy an Nvidia AI factory today, you absolutely go, uh, work with an OEM system partner or a channel partner, but you know, enterprises were telling us they wanted to, they wanted to see these example configurations to really guide their, uh, their, their decision-making, right? So we've published three exemplar AI factories that are now also available on our enterprise, our, uh, docs page. Um, so namely the RTX pro AI factory, the HGX AI factory, and the NBL 72 AI factory. Um, I'll go to, I'll go in depth into them in a minute, but I think it's important to know that, you know, for an enterprise customer, we will see a combination of these, um, um, you know, of these AI factories. For example, they could have an RTX AI factory that perhaps they're using at a departmental level, right. Uh, for inferencing, but, uh, you know, at an organizational scale, they're using, um, an HGX AI factory for large scale, uh, model training or, um, or large scale model inference. Right. And in fact, that this, what I just described to you is very similar to what, uh, our, we do, uh, at Nvidia and IT within our AI factory, but maybe it makes sense to just dive into the three AI factories really quickly. So, um, the RTX pro AI factory is where I say is how you start, you know, it's a great on-ramp it's, uh, RTX pro, uh, Blackwell, uh, data center GPUs. It's PCIe based, uh, it's air cooled, uh, you know, fits into an existing, uh, the existing data center setup that enterprises may have, uh, perfect for agentic AI, uh, inference simulation, as well as visualization use cases. So it covers a broad variety of, uh, of use cases based on our RTX pro 6000 Blackwell server edition GPUs with, you know, frame buffer of 96 gigs. And you know, it's a, it's a great way to, I would say, start, right. And, uh, uh, and experiment. Then I would say, uh, the next AI factory, which is the HGX AI factory, uh, is how you scale. And then this is, I would say it's the high, high performance enterprise, uh, AI platform. Um, it's when you have use cases focused on, um, AI training, fine tuning, um, large scale inference. This architecture is based on our HGX, uh, eight way design and uses our Blackwell ultra, uh, B300 GPUs with, uh, you know, GPU memory of 270 gigs. And I would say it's, you know, it's more efficient, uh, from a cost per million tokens or a token per watt standpoint, and it's really designed for those large workloads. And then the third one, uh, which I say is for frontier scale AI is our NBL 72, uh, AI factory. And that's all about drag scale AI deployments delivering the ultimate performance. And you would, you would liken that to, uh, you know, NBL 72 AI factory being used for things like super large models, mixture of expert models, real time AI reasoning. And it would give you the absolute best tokens for, for what that we can deliver in a rack scale system, scaling up to hundreds of megawatts or gigawatts. Um, the NBL 72 AI factory, I think now, you know, it's based on, uh, Grey's Blackwell now and, or, Vera Rubin tomorrow. It's, it really pushes the envelope with big model counts and, uh, and context windows. And, and so, and so how I would like to think of these three in summary is that, you know, RTX pro AI factory universal acceleration, HGX AI factory, if you're super serious about high performance AI and then NBL 72 is when you are go, you know, you're going up to the frontier and you want giga, giga scale training. And again, I say that for enterprise customers, we see a combination of these, they may start in one place and then as they, as they discover more use cases or their use cases scale, they can expand into HGX or even eventually NBL 72 territory. [00:15:32] Kaushik Shiretti: Yeah. That's, that's amazing. And that's kind of crazy. Like all the nuances and all the differences that, you know, the team and all of our OEM partners, everybody has to deal with when it comes to all the different flavors. And I think that's a very important point that you said, because I want to make sure that it's, it's very clear to everybody. A lot of times when people think about AI factories, people have this image of like, we're talking about like massive gigawatt factories and we have, you know, tons of those factories, right? People are doing a hundred thousand GPU plus at like massive scale, the foundation model builders, but an AI factory can start with like a DGX spark, right? It can be an employee with a DGX spark and that's your mini AI factory where essentially you're producing tokens, you're producing intelligence, and then it can scale up to like, as you said, from RTX pro to HGX, NBL systems, NBL eight systems, uh, to NBL 72 systems. But ultimately what we're trying to figure out is you're going to always start with the use cases. You have to start with the workloads and then, you know, figure out which, what is the right path for you and for your company to go down. And we have all those flavors. But what I do love about it is that the base tenant, the base model is of the enterprise RA is very similar, right? It's based on a [00:16:44] Shashank Sabluk: basic foundation model. Exactly. Right. So, um, these AI factory configurations, it's backed by our enterprise RAs that define these hardware, hardware patterns, the networking topology and all the operational assumptions that come with it. So again, that's NVIDIA certified systems. We have the four node scalable, uh, unit concept, and they're based on media spectrum X, Ethernet, uh, uh, networking. So whether a customer, you know, starts with deploying an RTX pro AI factory, or, you know, jumps right to HGX or ultimately like challenges the frontier with NBL 72, they can be sure of that, uh, you know, that consistent validated, uh, architecture and that consistency is what reduces risks and accelerates time to deployment and, uh, you know, delivers that high performance. [00:17:28] Kaushik Shiretti: Got it. And, you know, we have, uh, uh, a lot of our viewers here and, uh, we'll have more viewers as we, uh, do more of these episodes. What are some of the key takeaways or what is the actually one takeaway that you would like our viewers to kind of leave with today? [00:17:45] Shashank Sabluk: Yeah, I think, uh, I, what I'd say is that AI factories, of course, they require, you know, the AI apps and the models, but they also need to be built with a solid infrastructure foundation. And that's exactly what the enterprise RAs give you. They're tested, they're modular, and they're ready, uh, and, you know, they're ready to deploy to our OEM partners. Again, you know, such as Cisco, Dell, HP, Lenovo, Supermicro. If you align the art, you know, if you align to the RTX pro, the HGX or the NBL 72 AI factory, you're not just buying hardware, you're buying, you're adopting an architecture that's been, you know, exercised, tuned and, you know, validated and also been set up to evolve within videos roadmap. And that's, you know, that's eventually what will shorten your time from, Hey, I have an idea to, you know, serving tokens to [00:18:31] Kaushik Shiretti: real users at, uh, at scale. I love the idea to tokens, right? Yeah. Uh, basically you've just, uh, synthesized the whole show in five words and you made my, my job easier. Uh, so thank you so much, Ashank. I think it was great having you, uh, here. Uh, this was our first episode. You broke it down, you know, the enterprise arrays very nicely and hopefully it gave our viewers a good understanding of why we build enterprise arrays. And, uh, like I always say, you cannot build any factory on shaky grounds and the enterprise array really laid the foundation of, you know, everything that, you know, our, our partners and our customers want to build when it comes to AI factories. Uh, so thank you again. And, uh, that's, uh, that's a wrap for our episode number one of AI factory insider. And if this was useful, um, please, uh, like it, please share it with, uh, your friends who are knee deep in, uh, building, uh, out AI factories and, uh, you know, we will be back with, uh, more conversations, uh, going deep on AI factories. So thank you again.

The Infrastructure Behind AI Explained — AI Factory Insider Ep. 1

Related Transcripts from NVIDIA

Transcribe Any Video or Podcast — Free