Try Free

Tech Talk: The future of liquid cooling for data centers

HPE June 14, 2026 25m 4,833 words
▶ Watch original video

About this transcript: This is a full AI-generated transcript of Tech Talk: The future of liquid cooling for data centers from HPE, published June 14, 2026. The transcript contains 4,833 words with timestamps and was generated using Whisper AI.

"okay good morning everybody I'm happy to see the turnout to listen about liquid cooling today this is a topic I care very much about I'm a thermal engineer actually at HPE so this is a my kind of my bread and butter yeah so today we're gonna be talking about liquid cooling a bit a bit about how we..."

[00:00:00] Speaker 1: okay good morning everybody I'm happy to see the turnout to listen about liquid cooling today this is a topic I care very much about I'm a thermal engineer actually at HPE so this is a my kind of my bread and butter yeah so today we're gonna be talking about liquid cooling a bit a bit about how we got here where we are today talk a little about about efficiency and how those things are measured in a data center and at the IT level we'll talk a little bit about HPE solutions in the liquid cooling space and I hope that by the end of this session you'll have a little bit better understanding of technology some of the benefits and how HP can can help you achieve those sustainability and efficiency goals in your data center so I'd like to start just with a very simple statement that kind of defines the essence of this presentation which is that cooling matters more than ever today basically if you go back 10 15 years in the space that I was where I began my career in 2015 with SGI liquid cooling was more of a value-add service that we did on systems these are for basically the customers that are one of the biggest and fastest machines the government labs university research centers it was far from a commodity that has really changed today so now it's liquid cooling is no longer just a nice to have but it's really a thermal necessity and even a design restriction in some cases so I think the nice the next slide shows that quite nicely so I like the slide a lot as I said I started back in 2015 so this slide kind of this chart shows two things number one it shows the GPU powers and CPU powers going up over time but number two it shows more or less the difficulty of my career since I started so it's kind of a fun slide yeah like I said 10 years ago the the top CPU processor SKUs that we were selling were 85 watts 150 watts a year later GPUs were around 300 watts of power and in just 10 years now it's it's really gone up CPUs where we're already selling today CPUs at 500 watts GPUs are pushing a thousand watts basically what happened was the the chip manufacturing we can't make the transistors any smaller so what we do is put more transistors in the same space that's really the only way we can get to higher powers higher performances and so that allows us to get higher performance in a smaller footprint but at a cost so you may have heard a term an industry term called key T case that's the case temperature of a device basically just defines what's the maximum temperature that this device can operate at and function not over temp not overheat those numbers are actually going down as well so as the power is going up the maximum allowable temperature of the device is going down and then what we do is layer in a third complexity which is that to meet these sustainability sustainability efficiency goals and data centers we have warmer and warmer facility coolant coming into the data center it used to be 20 C in when I first started now it's 32 is kind of the standard we're seeing 40 even higher than that in some cases so basically three things are happening the power is going up on the devices the temp limits so the T case of the device is going down and the cooling water temperatures so the coolant temperature going into the server is going up so you have kind of this squeezing effect that's making it very very difficult to do the actual heat transfer at the device it can be done we do it but it's it's it's more and more difficult with each generation even with liquid cooling so I talked about the liquid cooling being kind of a necessity today so there's there's benefits beyond liquid cooling that that don't just come from it being required or a necessity to to get the performance you need number one is performance so you have this enables you to have the top skews the highest power devices the highest power devices highest performance devices the cooler the chips are the happier they are they the better they perform so it just has to do with getting the heat out of the device which is like I said extremely dense now so it's just getting the heat out it's just physics you can't cheat physics I always say So speaking of physics the coolant itself so a water-based coolant has almost four times the specific heat capacity of air so it's just is absorbing the heat from the from the heat source almost four times better than air does and it has a thousand times the density so the heat transfer math math itself per kilogram of fluid moved a liquid cooled solution is three to four thousand times more efficient So that's that's the number one reason why this is this is a topic today is because the performance aspect to use a an example that Antonio likes to use there's a reason that you know when you burn your finger you run it underwater you don't blow on your finger it's it's the fastest way to cool your finger down and the same is true with with high performance computing so number two is density this is more or less just how many chips can I get in a rack how many how many how What's my kilowatt per square meter per square foot of the data center traditional air cooling you have 10 15 sometimes 20 kilowatts per rack server density that's just due to the capacity the performance of the air handlers in the in the traditional data center space with liquid cooling and you can see some of these products on the floor with liquid cooling we can deploy 50 60 80 kilowatts in that same space depending on the technology and that's just by utilizing liquid cooling so it's the density story is a lot better so you can get the same performance with less than half of the racks in the data center and lastly is efficiency I think this ties in really nicely with the other If you can get the heat out of the device at the source before you convert it through different mediums like you do with air cooling you just get better heat transfer effectiveness and with the fluid properties we talked about you don't have to move as much fluid in order to get the heat out so it's just an efficient very efficient way to do cooling So what does this look like I'm a visual person so I like to just kind of see exactly what that looks like physically like I said mechanical engineer by training so that's kind of where that comes from What you see here is the EX 2500 this is what I think of as the little brother to the EX 4000 which is on the showroom floor if you hadn't had a chance to look at that I highly suggest it's a very impressive piece of equipment But it's the same servers EX server same cooling technologies smaller system so basically just a lower barrier of entry to get into the EX supercomputing space and I'm gonna start it kind of on the right side of the of the slide because I'm as a cooling engineer this is more my area of expertise so what you see is the In the middle with holistic server blade cooling What we mean by that is not only the CPUs and GPUs which are to be honest kind of the low-hanging fruit in terms of liquid cooling today A lot of people offer this we offer it as well but these are the devices that if you put a liquid cooling solution on your server you can get 70% of the heat out with just the CPUs and GPUs in the device the more difficult things the last 30% so the memory whether it's dim memory that you find with CPUs CPUs the high bandwidth memory that's in GPUs stacks or the power distribution these are the things that are a bit more challenging to get the heat out they require a little bit more intricate design same with the fabric So the slingshot switch that we have is 100% liquid cooled local storage disk drive cooling all of those that extra 30% brings us to what we call the 100% fanless direct liquid cooling So this can sit in your data center with zero air flow quiet as a whistle and this will actually run with just water cooling which is a Yeah, it's a it's a difficult thing to do, but it's the highest performance you can get essentially So that the very bottom of this rack you see the coolant distribution unit this is the call the cdu in the trade speak this is the device that collects the heat on the secondary side which is the IT side we circulated coolant water-based coolant through the system collects that heat and it transfers the heat to a facility cooling system by means of a brazed plate heat exchanger is a very high efficiency heat exchanger so that the two fluids the fluid that we cool are It with is not interacting it's not touching the coolant that the data centers bringing in Okay, so that's just the cooling side from a holistic point of view you have integrated software monitoring services on the solution that puts you in control I mentioned the slingshot interconnect liquid cooled this is for node-to-node communication and one interesting thing about the EX is it is a completely agnostic system design so the latest accelerators that you need for your workload can be put in to this design the same form factor across generations so So this is a slide looking at not the EX but rather the XD 2000 this is an example I always say one test result is worth a thousand extra extra opinion so if we actually look at an example of how this works we with the XD we sell a liquid-cooled version that is 70% liquid-cooled like I mentioned earlier CPUs and GPUs we also sell a liquid-cooled version that is 70% liquid-cooled We also sell an air-cooled version so it's a perfect apples comparison you can't really get any better than that this study was done with the XD 2000 chassis specifically for XD 220v compute nodes and what we see same performance Excuse me same benchmarks same performance we get almost 15% less chassis power so at the server itself we're consuming less power simply by virtue of running or running liquid cooling into the server instead of running fans Not only that you get a cooler device temperatures you get slightly better performance and when you combine that you get when you combine the power consumed with the performance of the chip It's almost a 21% improvement performance per kilowatt spent so that's just the server side that's impact at the server what does that mean for a data center well we did a study basically this is this is an example study so obviously this depends on the exact data center but in this example We get 86% cost savings just from the operational cost of the data center you're not paying the cost to move all of that air you're not paying the cost of the air handlers in the data center or if you have chilled water running into these air handlers very expensive system to run So it's it can be up to 86% savings operational cost depending on where that energy comes from speaking of sustainability this is a 87% carbon reduction in the in your data center for the same performance actually better performance I mentioned so bringing that together it's it's it's it's really not possible to achieve the sustainability and efficiency goals here at data center without seriously considering liquid cooling So let's just kind of look at that same example but from a financial perspective so depending on where you are in the world obviously your electricity price may vary but looking at in this example us versus UK this is with this example is 10 racks of XD to the fairly small deployment of servers actually but depending where you are we're looking at hundreds of thousands to millions of thousands to millions of dollars of operational cost savings simply by using liquid cooling and that comes from that 15% chassis energy reduction we talked about so So I mentioned at the beginning we talked a little bit about efficiency so the the way we've always thought about efficiency and measured efficiency was with a metric called PUE which stands for power usage effectiveness this is another way of looking at what you get so your IT performance versus what you pay for which is everything else this is your cooling power distribution lighting of the data center everything lower is better in this case and So basically it's saying for every if I have a PU of 1.2 for example that means every 100 watts I spend on the server I'm spending 20 watts to run the infrastructure to run that server so a total of 120 watts Moving up kind of moving up kind of from this legacy air cool data center this is pretty typical this is pretty typical this 1.25 to 1.35 this is the kind of the old way of doing things with an air cool data center as we move up we get better and better air cooling technologies so using free air cooling instead of instead of chillers for example or any kind of direct expansion Moving up to an optimized air cool data center for example for example for example for example for example for example for example point two five to one point three five this is the kind of the old way of doing things with an air-cooled data center as we move up we get better and better air cooling technologies so using free air cooling instead of instead of chillers for example or any kind of direct expansion moving up to an optimized air-cooled data centers for example hot aisle cold out containment that can get you a little better but still not close to what we get with water-cooled servers which is really the best power usage effectiveness you can you can achieve in your data center so that the problem with PUE is it is missing something it doesn't actually look at how efficiently the server itself is using its resources to do the computation so there's another metric I'm going to jump into which is called ITUE so IT usage effectiveness this basically looks inside the server and asks how much energy am I wasting on stuff that is not doing the computation works the same way as PUE lower is better as an example of PUE of 1.1 means for every watt I spend on the CPU the GPU the memory etc I'm spending 100 milliwatts point one watts on the fans or the power conversion inside the server etc so it's really just looking at how efficient is the IT at producing the result right with liquid cooling you no longer are spending all of this energy to move the air run the fans so your ITU is also going up with liquid cooling getting heat out of the device more efficiently is improving this ITUE and we saw that in the previous example as well so bringing those two metrics together is what we see on this slide so this will be the last metric I talk about today we'll kind of move on to some more interesting things after that but when you boil it all down it comes down to what is the system value you get out of your IT system it comes from enabling the highest performance at the lowest cost of operation it's what you get versus what you pay for kind of the simplest way to define efficiency so in this case the productivity of your system they is a the performance or the time to solution however you want to measure that over the total cost of ownership the total cost of ownership is more or less the product of these two PUE and IT we just talked about okay so this measures the total operating efficiency of the data center so the interesting thing that I'll talk about in just a second here is that these can be competing metric metrics and we as HPE only really have control over the ITUE so we can deliver a system that is highly efficient in utilizing server resources to produce computation what we don't have control over usually is the PUE the facility side choices and again I said I'm kind of a visual learner I like this graphic a lot for that reason the ITUE is what we can control on the design side and the facility side is in the blue box what the data center provider is is giving us they meet at the heat exchanger in the CDU that we talked about so the heat exchanger itself is very efficient at transferring heat between the two cooling mediums but the point of this slide is that the choices on either side and the green box and the blue box they do have a total effect on that system value that they think they contribute to the the efficiency of your system in different ways so I'm gonna walk through an example to kind of illustrate that so this is something we see pretty common especially today with sustainability goals we have a trend toward more efficient facility cooling systems so instead of using cooling towers instead of using chillers we move toward dry coolers so we're not evaporating that facility coolant into the environment we're enabling free cooling wherever you are in the world so it's just sensible cooling with the local climate the local air so 40c facility coolant is not uncommon now this is something that will increase your efficiency of the of the facility so your PUE goes down remember lower is better in this case but now let's look at how that affects the IT usage effectiveness so what that means is many times we have to compensate for that we bring in warmer coolant to the CDU we have to run the secondary coolant faster increase our flow rates in order to compensate for that higher temperature in this case our IT we goes up so now we're actually less efficient on the server side so we bring kind of two of these things together let's look at a third example lowering the primary flow rate when I say primary that's the industry speak for the facility flow rate going into the CDU lowering that flow rate will increase the secondary temperature on our IT cooling side that's just the physics of the heat exchanger there's nothing we can do about that like I said earlier this can't cheat physics so that will increase the that will increase the secondary temperature in order to compensate for that then we again need to spin our pumps up we need to pump more coolant so you have these competing metrics ITU is going up PUE is going down so these are the types of things that are really important to think about when you're designing the holistic data center a final note on this slide these things are not necessarily one-to-one so the arrows are the same size here but in in reality it depending completely on the design of the data center these will be different order magnitude in some cases so it's really important to understand exactly what the goal is for a holistic system efficiency as you design for liquid cooling so I'm going to kind of come out of the weeds now we saw the back end of this technology and so what does it look like on the front end what is this what do we offer what does HP do in terms of liquid cooling the important thing to remember here is that there is no one size fits all with liquid cooling there is a spectrum and each of these technologies has benefits each has drawbacks and it really depends on the workload and the type of the type of computation you're doing in your data center so in the x-axis here we have cooling capacity this is another way to think about density this is kilowatts per rack on the very far left you see like a kind of our ProLiant stuff these are not typically very dense solutions but we do offer closed loop liquid cooling meaning that we can liquid cool just the CPUs in that device we can enable higher the latest devices the higher TDP thermal design power skews we can do that with liquid cooling but then ultimately we still reject it to the data center air so it's not very effective which brings me to the y-axis the y-axis on this graph shows basically how effective is the system at pulling the heat generated out of out of the chips and how efficiently does it does it do it how efficiently does it transfer it back to the facility so moving up that spectrum like I said there's kind of benefits and drawbacks to each but an easy way to get into liquid liquid cooling is with a rear door heat exchanger we actually have one of these on the showroom floor if you you might have seen it already but this is very simple you have your air cooled racks in the server you can mount this literally it's a door with a radiator coil on the rear of it you can mount this to the back of your rack you can run fluid through the door and all of the heat generator by your servers is then transferred into a coolant that goes through the door so this is a very accessible way to get into liquid cooling hybrid liquid cooling we call it because it's not doing direct at the chip liquid cooling moving up more we have the arcs this is the adaptive rack cooling system i like this system a lot because there's a small but important difference between this and the rear door the arcs is closed loop meaning we can bring warm water into this we contain the hot air in the system so that way you can still run these 25 27 sometimes 30 32 facility water run your air cooled servers inside the rack and still collect all of that heat and put it into the water without raising the temperature in the data center so that's the big distinction there and then moving on to the cray xd line which we talked about earlier this is more or less 70 liquid cooled so this is really targeting the cpus and gpus a lot of people do this we do as well so this is a product that we offer both with air and liquid cooling options and then moving to the far right the far top this is really my my background my bread and butter this is the cray ex also a system that we have on the showroom floor if you haven't had a chance to see it i highly recommend is very impressive this is a system that can do 400 kilowatts direct liquid cooled 100 fanless it's a larger system there's a caveat it is a it's a a wider rack 1200 millimeters versus 600 millimeter standard rack but it is um it's the highest performance highest density you can get so so let's kind of bring this all together how do i get to liquid cooling in my data center today well we know and like i said earlier that there is not a one size fits all solution for liquid cooling which is why hp offers multiple paths to get there so the retrofitting data center the new data center this is like i said my um this is my bread and butter this is where i work this is with the ex stuff these are your tend to be the larger deployments when you build a data center from scratch or a retrofit one specifically to support liquid cooling and you want to own the hardware you want to run it in your data center that's where these two things come into play we also have co-location so the idea being you want to own the system but you don't want to sign up to support the facility cooling system and all of the complexity with that this is where you would basically work with work in an agreement with a service provider um to have your hardware and somebody else's data center and you just get access to it somebody else maintains a liquid the liquid cooling infrastructure modular data centers another are another uh really effective way to get into liquid cooling i like this a lot if you get a chance there's an exhibit with uh danfoss we have on the floor um this is i really like this solution because this allows us to start with a clean sheet of paper and basically say if we want to cool this it and we had complete control over the facility cooling system how can we optimize uh those those two metrics we talked about earlier how can we get the how can we find that optimum point between the facility efficiency and the it efficiency so the modular data center is essentially putting the hardware in a more or less a shipping container that is designed to be deployed quickly bring the facility cooling extremely close proximity to the it that you're trying to cool and in doing so enabling extremely high effectiveness high efficiency of the over the overall solution and then lastly of course we've talked about h by hpe ai cloud a lot um but more or less this is uh as a surface so you don't you're not owning the computer but you do want access to the latest uh ai hardware the latest um gpus to for your workload we have an ability to offer that as well so all of these paths will get you to liquid cooling and not all of them involve building a data center from scratch like i said there's no one size fits all so it's uh it's important to have flexibility as you consider liquid cooling so in closing here i guess i would say that like i said i've been doing this for 10 years which isn't isn't necessarily long but more than ever today we have a lot of noise in the industry around liquid cooling it's really really gotten to be very confusing where do you look for the right information there's a lot of people saying this or saying that a lot of competing narratives i would argue that hpe is uniquely positioned in this space and the reason is like i said i started with sgi which was an acquisition of hp back in 2017 i have many colleagues from cray supercomputing these are the people that were doing liquid cooling all the way back in the 1980s when it was again a necessity for the type of technology at the time so we've had we have literally decades of liquid cooling experience that we've been able to leverage in our in all of our offerings and we uh we've learned a thing or two along the way you know we weren't always perfect but we've always learned from our mistakes and use that information to build better products and services for our customers so i invite you to take advantage of that there are some liquid cooling experts including myself on the floor to answer any questions you may have um i would yeah i would i would close with um asking to leverage that history um look at the options you can you can uh explore to get liquid cooling in your data center because it is truly the most effective way to do liquid cooling not only now but it will be uh required in the future so thank you

Transcribe Any Video or Podcast — Free

Paste a URL and get a full AI-powered transcript in minutes. Try ScribeHawk →