ARM Data Center Moves Forward on Priorities in Fiscal Year 2019

Published: 19 October 2019

Efforts include push toward new Data Discovery, enhanced high-performance computing access

Data Discovery search, new and old
Search results are shown in the new version of Data Discovery (top) and the current version.

The ARM Data Center remained busy in fiscal year 2019 (FY2019), archiving and delivering about 20 terabytes of data per month while working on several projects to improve the ARM data user experience.

Along with some key achievements tied to its high-performance computing resources and various tool developments, the ARM Data Center spent most of FY2019 developing a new version of ARM’s Data Discovery browser. Data Discovery helps users find and order ARM data.

“As part of managing Data Discovery, we have been continuously gathering feedback from our user community,” says Giri Prakash, ARM data services manager at Oak Ridge National Laboratory (ORNL) in Tennessee. “Also, during the last Triennial Review (in 2017), the reviewers encouraged us to continuously improve Data Discovery. All this led us to go back to the drawing board, and we wanted to start fresh to ensure we give the best experience to the users.”

The beta version is expected to be released alongside the current Data Discovery in November 2019.  The ARM Data Center will update the new tool based on user feedback while phasing out the old version.

The new Data Discovery should be fully operational in the second quarter of FY2020.

“From the data perspective, we will provide the best data product that will satisfy the user up front,” says Prakash, “and the new user interface will be very intuitive to navigate over 11,000 data products that are currently archived at the ARM Data Center.”

Stakeholders Steer New Data Discovery

Data Discovery home, new and old
The home pages are shown for the new version of Data Discovery (top) and the current version.

Stakeholder groups have been—and will be—involved in every step of the Data Discovery process.

In early FY2019, the ARM Data Center compiled recommendations on Data Discovery from more than 20 stakeholder groups, including the ARM User Executive Committee and other constituent groups, working groups, and data translators. The ARM Data Center then developed a wireframe, or mockup of the new interface, that Prakash presented during the June 2019 Joint ARM User Facility/Atmospheric System Research (ASR) Principal Investigators Meeting.

Based on wireframe feedback from stakeholder groups, software development for the new Data Discovery using modern technologies is underway. When the development is completed, the stakeholder groups will test the software and provide additional feedback before the beta version is made available to the wider user community.

The new Data Discovery will save time for users by reducing the number of results they see during a search.

Users can also customize their search by a specific instrument, measurement, or area of interest, allowing them to drill down to the ARM data they want.

“Actually, the new system will be recommending data products based on the search that they’re doing,” says Prakash.

High-Performance Computing Efforts

High-volume data users have come to rely on the ARM Data Center’s two high-performance computing clusters, Stratus and Cumulus. Introduced in 2017, the clusters support ARM operations, model simulations, petascale data storage, and big-data analytics.

In FY2019, the ARM Data Center made it possible to tap in remotely for large-scale data analysis. Wherever they are, users can access, develop, and run code on a cluster via Jupyter notebook, an open-source application.

“They will be able to see the results from the luxury of their home computer,” says Prakash.

Another accomplishment this year was the migration of the Atmospheric Emitted Radiance Interferometer Optimal Estimation (AERIOE) value-added product (VAP) operational workflow from Pacific Northwest National Laboratory (PNNL) in Washington state to the Stratus cluster at ORNL.

AERIOE, which provides boundary layer profiles of temperature and water vapor mixing ratio and liquid cloud retrievals, is the first VAP running in routine operational mode on a cluster. The time it saves is significant.

“Stratus is allowing AERIOE runs to be completed in six hours, which would otherwise take about 3,888 hours if done as a serial process on a traditional system,” says Jitu Kumar, ARM Data Center high-performance computing lead.

Kumar and Zach Price from the ARM Data Center coordinated the migration effort with Tim Shippert and Brian Ermold, VAP developers from PNNL.

Bolstering Research and Data Support

Giri Prakash with Cumulus
ARM Data Services Manager Giri Prakash stands in front of Cumulus, the high-performance computing cluster used for operations of the Large-Eddy Simulation (LES) ARM Symbiotic Simulation and Observation (LASSO) workflow.

The ARM Data Center also led other activities to enhance research support in FY2019.

Before the Multidisciplinary Drifting Observatory for the Study of Arctic Climate (MOSAiC) field campaign started in September, the ARM Data Center registered ARM MOSAiC metadata data with SensorWeb, the central portal for managing and sharing MOSAiC project data.

“This will give greater visibility of ARM data with international participants,” says Prakash.

For instrument mentors and scientists who need remote access to instruments, the ARM Site Data Systems team is providing flexible, secure access through a new tool called Beyond Trust. For the first time, users can enter their ARM credentials to connect to instruments through a web interface or an application installed on their computers. They can then monitor instruments and address issues in real time.

“The new system allows access from anywhere, eliminating the need for firewall updates and VPN (virtual private networks), and reduces the number of usernames and passwords our users have to remember,” says Cory Stuart, Site Data Systems team lead.

The Site Data Systems team also doubled the bandwidth of communications systems connected to the intermediate facilities at ARM’s Southern Great Plains atmospheric observatory.

“The biggest gain is that it will allow for more data to be sent, thus providing more flexibility and opening more potential use cases at those facilities,” says Stuart.

To streamline operations, the ARM Data Center combined databases running at five sites into a single database cluster. Located in the ARM Data Center, the cluster supports more than a dozen research support and infrastructure applications.

# # #

ARM is a DOE Office of Science user facility operated by nine DOE national laboratories, including Oak Ridge National Laboratory.