New Capability Offered as Part of Next-Generation ARM

 
Published: 15 May 2017

For Data Users, New Clusters Widen the ARM Data Cloud

ARM’s new high-performance computing facility integrates multiple levels of computing resources and storage systems.

Starting May 15, ARM data users who need more storage capacity and computational power can apply for direct access to ARM computing resources and data. This public software development space enables users to work with large volumes of ARM data without having to download them.

“As our data volume grows,” says ARM Technical Director Jim Mather, “we would like to provide the means to do intensive processing where the data are.”

To that end, the ARM Data Center has integrated two computational clusters it calls Stratus and Cumulus. The goal: Make it easy for users to migrate their computational scripts from single workstations to one of the new computing clusters.

The ARM computing clusters are available to ARM Facility science users who work with very high volumes of ARM data. They have the capability to support model simulations, petascale data storage, and big-data analytics for successful ARM science research.

For Big Data, the New Stratus

Most users will access Stratus, a small-scale Cray cluster that has 30 nodes, each with 256 gigabytes of memory per node and 1,080 processing cores. Its tiered storage architecture enables high-performance parallel processing, includes 1.92 terabytes of solid state drive per node and can be used for active data read and write.

The Stratus cluster also has 100 terabytes of Lustre parallel storage for temporarily staging and retaining data and 400 terabytes in the NFS file server for moving data from the ARM Data Center shared storage.

Giri Prakash, ARM Data Services and Strategy Team Manager at Oak Ridge National Laboratory

Giri Prakash, ARM Data Services and Strategy Team Manager at Oak Ridge National Laboratory, put Stratus in perspective.

“The typical workstation has 16 cores, with less than 10 terabytes of storage. It’s meant for small-scale data processing and visualizations,” he says. “If the user wants to process multi-year data that involves 10s to 100s of terabytes of data with computationally intense algorithms, the Stratus—a high-performance cluster—will be very effective.”

“Using this high-performance computing, or HPC, cluster will significantly reduce the data processing time for tasks like large-scale radar processing, complex synthesis product developments and advanced analytics and visualizations.”

Stratus will allow ARM users to analyze much bigger data sets (like those from radars) in a more reasonable time than was possible with a pilot cluster started eight years ago.

Prakash adds that the Stratus cluster is also attached to the ORNL High-Performance Storage System, where all ARM data—more than a petabyte so far—are archived.

For High-End Modeling, the Cumulus

The Cumulus cluster is a mid-range Cray system with 4,032 processing cores and a 2 petabyte general parallel file server. It will be primarily used for “high-end modeling,” says Mather, and for now will support routine operations of Large-Eddy Simulation (LES) ARM Symbiotic Simulation and Observation (LASSO) at ARM’s Southern Great Plains atmospheric observatory in Oklahoma.

LASSO is a two-year pilot project that is bringing measurements and models closer together by pairing ARM observations with LES output. Instead of researchers getting the measurements from ARM and then undergoing the laborious process of running their own models, atmospheric scientists can now access simulations that are pre-matched to the observational data sets, which provides a handy starting point for their own research.

LASSO’s routine high-resolution large-eddy model simulations for the ARM observatories will improve observational retrieval and parameterizations of clouds, aerosols, and radiation in climate models. These enhanced simulations are one of the major objectives of the next-generation ARM Decadal Vision.

Using the Computation Clusters

ARM science users can request access to these clusters by going to the Computing Resources web page and use the Request Access form. Deadline to submit proposals to use the resources is June 12.

If you are planning to submit a proposal to use an ARM cluster, a few considerations:

  • Requests should focus on activities that involve multiple terabytes of ARM data that are prohibitive to download to other computer systems, that require parallel processing of computationally intensive code applied to ARM data sets, or that directly use the LASSO simulations.
  • Requests will be reviewed quarterly by the ARM Infrastructure Management Board. Depending on the availability of resources, requests may also undergo scientific peer review.
  • Computational requests should clearly indicate the relevance of the proposed computational activities to the ARM Facility, should describe the ARM data sets to be analyzed, and should explain why ARM computational cluster resources are needed for achieving the science goals.
  • First decisions on proposals submitted by June 12 will be made by early July.

The ARM Data Center will provide “any technical help users might need for migrating and optimizing their scripts,” says Prakash. “You will be surprised to know how little effort it takes to successfully run your processes on the Stratus cluster.”

He adds that “we are looking for feedback from our users. We want to continuously improve our computing architecture.”

Towards a Next Generation of ARM Computing

The new computing will bring “a new level of computing capability that our users haven’t experienced in the past,” says Prakash.

A pilot cluster started eight years ago was a ‘proof-of-concept’ that allowed ARM users to explore a large collection of data, says Prakash, but “it was just four times better than a standard workstation.” The new Stratus cluster has over 1,000 computing cores and more than 100 terabytes of high-performance storage, he added. “It’s a true HPC cluster, optimized for parallel computing and attached to a high-performance parallel storage server.”

The Stratus cluster will likely be able to support 10 to 20 projects at any one time, including both internal ARM efforts designed to improve the efficiency of ARM data processing and science projects from external ARM users.

There were five internal pilot projects initially, says Prakash. Since then, a few key others have been identified that will speed the development of ARM data efforts: Routine radar processing, large-scale reprocessing, complex value-added product development, NoSQL based advanced visualizations, big-data extractions, and long-term data quality analysis.

Meanwhile, the significant computing power of Cumulus will be focused on simulations of shallow convection, using the System for Atmospheric Modeling and Weather Research and Forecasting models driven by ensembles of forcing inputs.

If You Need More

Not tied to a lot of ARM data? Need something beyond Stratus and Cumulus? There are significant other resources out there for computational tasks.

If your large-scale simulations and data analytics applications do not require high volumes of ARM data, you could go to the DOE National Energy Research Scientific Computing Center (NERSC). Or you could use other DOE leadership computing user facilities, such as the Oak Ridge Leadership Computing Facility (OLCF) and Argonne Leadership Computing Facility (ALCF).

Allocations for these computational facilities should be requested directly from the individual facilities.

# # #

The ARM Climate Research Facility is a DOE Office of Science user facility. The ARM Facility is operated by nine DOE national laboratories, including Oak Ridge National Laboratory.