Using SkySpark for Ongoing Commissioning in High Performance Data Centers

Using analytics software, like SkySpark, in ongoing commissioning allows for a unique opportunity to build ongoing performance metrics to maintain energy savings. By integrating multiple monitoring systems including meters, the building automation system, and data center control points, we were able to monitor and correlate performance from a variety of systems. The customization of SkySpark allowed us to create metrics that were flexible to the rapidly changing high performance data center.

Perhaps the most important benefits of a flexible analytics package to ongoing commissioning are tracking and communicating performance. This means looking for changes, tuning new and existing systems, but also showing maintained efficiency measures and tying site improvements to better operations. The SkySpark package at NERSC is key to providing suggestions and tracking successes.

Data Center System Background

The NERSC facility does not use chilled water for any of its cooling. Cooling tower water is used to cool water in a closed loop. The closed loop cooling water is used to cool the air down between the server racks in the CRAY computers and at the air handling units serving the data center. The majority of the mechanical power required for cooling is spent by the cooling tower fans, the tower water pumps, the closed loop cooling water pumps and the CRAY fans.

Sustainable Berkeley Lab worked with the NERSC team to identify and meter key energy using components. They developed a new calculation for the PUE and identified other key metrics that could better track efficiency and performance.

SkySpark Performance Metrics

In order to provide ongoing performance analysis for the system we developed a series of metrics and corresponding views in SkySpark. These views are used to track system and energy performance at the facility. Some of the views are tied directly to measures that are implemented and others provide opportunities for ongoing improvements.

skyspark-ongoing-commissioning-PUE-webbulb-temp-scatter-chart-energy-efficiency

Figure 1: PUE vs. Outside Air Wetbulb Temperature – Scatter Chart

PUE

One metric calculated in the SkySpark package is PUE. PUE is the ratio of total energy used at a data center to compute energy used at a data center. It allows us to make sure that there are no significant changes in energy use relative to the computing load.

Because SkySpark has a flexible platform for developing trend views we were able to adjust the way we looked at PUE to help us better identify possible issues. These are the ways we modified the view to better monitor the performance.

  • We developed a scatter chart of PUE vs outside air wetbulb temperature, rather than the standard drybulb temperature. Because the cooling in the facility is dependent on cooling tower operation and load is mostly independent of outside air conditions, wetbulb provides a better prediction of load than drybulb temperature.
  • We established a baseline of PUE performance between May and July of 2019. This gave us a good range of outside air temperatures with good performance from the facility
  • The 95th percentile of binned average for each whole wetbulb temperature.
  • The most recent month of data is displayed on the same graph with the baseline and binned 95th percentile of the PUE data.
  • A filter is applied to the PUE trend based on compute load in order to exclude unusual periods where there are no computers running or a surge in load. This filter is displayed with the outside air wetbulb temperature and PUE denominator (equivalent to the compute load) on a line graph for the same trend period.
  • The data can be viewed as a scatter chart or time series, the view can be changed with a simple drop-down selector.

Additional Efficiency Performance Metrics

A variety of other metrics and related views were developed to track performance of specific efficiency measures or systems. Most of the views are used together to diagnose energy drift or anomolous periods at the facility. The process for analysis can vary, but lets walk through an example.

In Figure 1: PUE vs. Outside Air Wetbulb Temperature there is a small cluster of PUE trends which appear above the 95th percentile baseline. In order to track down an isolated anomoly, something which appears constraned to a short time period, we would look at the time series data for that period.

skyspark-time-series-chart-ongoing-commissioning-PUE-wetbulb-temp-energy-efficiency

Figure 2: PUE vs. Outside Air Wetbulb Temperature – Time Series

From this point we might look for the problem causing this shift in performance. Another metrics graph is helpful for figuring out which energy using system might be the culprit.

skyspark-cooling-plant-data-center-ongoing-commissioning-power-monitoring-energy-efficiency

Figure 3: Component Power Monitoring – All Points

This is a fairly simple example of effective analytics. Looking at the sum totals of each of the major energy using mechanical system components and comparing those components to their baseline operation we can easily identify which system is leading to higher energy use. In this case, the tower water pump power (in purple) has a period of much higher operation than the baseline.

skyspark-data-center-total-power-water-pumps-monitoring-energy-efficiency

Figure 4: Component Power Monitoring – Tower Water Pumps – Before

Narrowing in on the time period where the anomaly occurred and after, we see that though the anomaly appeared short on the PUE graph, there is actually a smaller shift in tower water pump power that continues after the anomalous operation.

We have developed a number of graphs for diagnosis which can give us a clue as to why there was a higher tower water pump power.

skyspark-dashboard-data-center-pump-control-trends-energy-efficiency

Figure 5: Pump Control Trends

We can see that an increase in pump power is coincident with a float, or lack of control for the cooling water supply temperature. We know enough from this to reach out to the site staff and explain what we are seeing and inquire about any intentional or accidental changes in operation. Knowing the date of the anomaly and being able to explain the energy cost associated with the behavior that follows allows us to help the site staff see the reason for our inquiry and helps them pinpoint any change made on site. In this case there was a valve closed off for heat exchanger maintenance, but this was followed by a slight increase in minimum flow during the balance than followed.

We were able to work with the site staff to get the minimum flow adjusted. The same metrics graph from the next month were able to show the entire team the benefits.

skyspark data center before total power water pumps monitoring energy efficiency

Figure 6: Component Power Monitoring – Tower Water Pumps – After

Using SkySpark for advanced analytics in our ongoing commissioning efforts at LBNL NERSC has provided us an invaluable tool for tracking performance, tuning operations, and communicating both the needs for tuning and the wins provided by this effort.

Thank you to Sustainable Berkeley Lab and the National Energy Research Scientific Computer Center (NERSC) Team at LBNL for the use of the images in this post.

Scroll to Top