Processing-intensive applications like artificial intelligence (AI) require the ultra-low latency only high-density racks can deliver. As businesses across multiple industries turn to AI to create competitive advantage, more data centers are making plans to support rack densities above 30 kilowatts (kW).
While in-row and containment solutions increase thermal transfer capacity, liquid cooling is an alternative approach worth considering for efficiently removing heat from these high-density environments. Cooling design should be optimized to avoid having the central processing units (CPUs) and graphics processing units (GPUs) that power the application from throttling back their clock speed to prevent overheating and degradation of application performance. The design should also factor in the impact of high-density racks on data center power usage effectiveness (PUE) and operating costs.
Liquid cooling offers an efficient and effective solution, yet many data center operators still have a strong emotional reaction to the idea of bringing liquid to the rack. That’s understandable considering the risks associated with exposing electronic equipment to liquids. However, the current generation of liquid cooling technologies can be deployed to minimize both the risks of leaks and the potential consequences of any leaks that do occur. The key is integrating risk mitigation into every step of the system design process.
Data Center Liquid Cooling Fluid Selection
Most readers are probably aware that there are three main types of liquid cooling technology being used today: rear-door heat exchangers, direct-to-chip cold plate cooling, and immersion cooling. The “liquid” used in a liquid cooling system will be different depending on the technology being employed and whether a single- or two-phase process is being used, as is the case with direct-to-chip and immersion cooling. For a review of liquid cooling technologies, see the Vertiv white paper, Understanding Data Center Liquid Cooling Options and Infrastructure Requirements.
Fluid selection is an important decision that should be made as early in the process as possible. Different fluids have different costs, thermal capacities, and chemical compositions that must be considered during system design. For example, water delivers the highest heat capture capacity, but is often mixed with glycol and corrosion inhibitor packages to protect the wetted materials, but thus reducing the heat capture capacity.
Non-conductive dielectric fluids can be used in direct-to-chip cooling and are required for liquid immersion cooling. These fluids eliminate the potential for equipment damage from a fluid leak, but they are expensive and may have environmental, health and safety considerations, so similar risk mitigation strategies should be employed with these fluids as with a water/glycol mixture. Service and maintenance requirements vary according to the fluid selected, which also need to be accounted for in the design.
Fluid Distribution in a Data Center Liquid Cooling System
The key issue in liquid cooling system design is ensuring that fluids can be safely and efficiently distributed to racks with minimal risk of leaks. Being able to detect leaks if they occur is essential, but designing the system to minimize the likelihood of leaks is the key to a successful deployment. This requires careful attention to plumbing material compatibility, fittings, and system infrastructure design.
Any material that is in contact with the fluids used must be confirmed for wetted material compatibility based not only on the specific chemical composition of the fluid, but also on system temperatures and pressures. Fittings should get extra scrutiny, as poorly designed fittings can represent a weak spot in the fluid distribution system. Quick disconnect fittings are generally recommended to enable serviceability, and shutoff valves should be designed into the system to enable fitting disconnection and leak intervention.
While liquid cooling is new to many data centers, it has been used in high-performance computing (HPC) environments for many years. The experience gained regarding material compatibility and fitting design in these applications is useful in mitigating risk in today’s liquid cooling deployments. Much of that experience is shared in a white paper published by the Open Compute Project. It’s also smart to seek out vendors and contractors with experience in liquid cooling beyond the technology used in the equipment rack.
Another piece of the puzzle is the design of key infrastructure components, especially cooling distribution units (CDUs). CDUs form the foundation of the secondary cooling loop that delivers fluid to liquid cooling systems and removes heat from the fluid being used. These systems play an important role in risk mitigation. By separating the liquid cooling system from the facility water system, the CDU provides more precise control of fluid volumes and pressure to minimize the potential impact of any leaks that do occur. Today’s CDUs also support leak detection sensors and can maintain liquid cooling supply temperature above the data center dew point to prevent condensation, which can trigger false alarms in leak detection systems.
Leak Detection and Intervention
A leak detection system should be considered an essential component of every liquid cooling system and should be integrated into the system design. The robustness of a particular system can, however, be tailored to an organization’s comfort level with liquid cooling technology and risk tolerance.
Some organizations rely on indirect methods of leak detection in which pressures and flow are monitored across the fluid distribution system, and small changes in these precisely controlled variables are considered indicative of a potential leak. More commonly, direct leak detection systems are employed. Using strategically located sensors, or a cable that can detect leaks across the entire distribution system, these hardware systems trigger alarms when fluids are detected, enabling early intervention.
The key to any successful leak detection system is minimizing false alarms without compromising the system’s ability to detect actual leaks that require intervention. Your infrastructure partner can help configure and tune the leak detection system to your application. Intervention is usually performed manually, but automated intervention systems are available. When intervention is automated, the control system will trigger appropriate responses when a leak is detected, such as shutting off liquid flow, or if the leak is close to the rack, powering down IT equipment.
The risks associated with bringing liquid to the rack can’t be ignored or taken lightly, but neither should they prevent data center operators from moving forward with liquid cooling. If you’re being asked to support racks with densities of 30 kW or higher, liquid cooling offers a viable approach to protecting the performance and availability of the applications those racks support.
When operators and their partners take the time to perform a detailed risk assessment at the front end of a project and incorporate risk mitigation strategies into every phase of the design, the risks associated with liquid cooling are far outweighed by the benefits realized by the data center and the business. For more information on liquid cooling and risk mitigation, contact your Vertiv representative.