ARM Data Center Reaches One Petabyte


20 years to the first milestone is expected now to be achieved yearly

Giri Prakash, Oak Ridge National Laboratory

In late December 2016, ARM reached a milestone of 1 petabyte (PB) of stored data archived in the ARM Data Center (ADC), beginning with the earliest data file from September 1992.

To put that in context, 1 PB could hold approximately 20 million 4-drawer filing cabinets of text.

Bigger than Ben Hur

The ARM Data Center currently holds just shy of 10,000 continuous datastreams and 1,412 principal investigator-contributed and field campaign data products.  Data have been flooding into the ARM Facility with the increased types and amounts of instrumentation, much of it running continuously in the field.

Perhaps the most amazing data statistic of all is that it took 20 years for ARM to reach the first petabyte, whereas soon ARM expects to be archiving the same amount every year!

Giri Prakash, ARM Data Services and Operations Manager (Oak Ridge National Laboratory, or ORNL), is enthusiastic about this growth. “It helps us provide scientific users with long-term and high-resolution observational data of various atmospheric states that span over 24 years. With our continued commitment to providing high-quality data, users can perform multi-year research with ease.”

Staying Ahead of the Wave

For the future, Prakash explains, the ARM Data Center implemented a scalable computer architecture to handle the ever-growing data.

The ARM Data Center uses Oak Ridge’s high-performance storage system to deep-archive the data. The storage system has a current capacity of 50 petabytes. The ARM Data Center also substantially increased the online storage resource as part of its recent upgrade.

ARM is now also deploying two high-performance computing clusters to help with data processing and analysis.

With these newly increased capacities in storage and computing clusters, both infrastructure and research users will be able to do big-data processing, analysis, and visualization.

User Friendly with a Vengeance

ARM users are responding: Use is rising in synchronicity with the arrival and availability of more data. It’s a textbook case of “if we offer it, they will want it.”

At present, over 2 million files per month are being requested by the scientific community, which amounts to over 30 terabytes of data. So ARM data are seriously popular.

Asked if the ARM Data Center is the largest collection climate data that’s open to the public, Prakash responds with a scientist’s concern for accuracy.“I don’t think we are the largest climate data center in the world.

“There are bigger climate data collections such as climate model repositories and NASA satellite products.” Prakash continues, “But I am confident that we are one of the largest data centers for high-resolution, climate observational data derived from sensors from various experimental sites.”

That’s a good platform to build on, and the data flowing in and out of the ADC at ever increasing rates prove that ARM intends to do just that.

The ARM Climate Research Facility is a national scientific user facility funded through the U.S. Department of Energy’s Office of Science. The ARM Facility is operated by nine Department of Energy national laboratories, including Oak Ridge National Laboratory.