Sprinting for Discovery: A Multi-Lab Intensive Activity to Improve the Accessibility of ARM Data

Published: 18 February 2020

Editor’s note: Scott Collis, an atmospheric scientist and ARM radar data translator from Argonne National Laboratory, sent in this blog post.

Scott Collis, Giri Prakash, and Maggie Davis
Scott Collis (left), an ARM data translator, and ARM Data Services Manager Giri Prakash (right) discuss ideas for making Data Discovery easier for ARM users while data scientist Maggie Davis keeps them on track. Photos are by Dee King, Oak Ridge National Laboratory.

ARM collects a lot of data. It has been collecting these data for almost three decades. But, from a user perspective, not all of these data are equally useful. For example, a search for “atmospheric temperature” at the Central Facility of ARM’s Southern Great Plains atmospheric observatory in northern Oklahoma turns up 101 different measurements.

To assist ARM users searching for the most appropriate data for their science, ARM has been refining recommended datastreams. For each core geophysical quantity (e.g., a core measurement such as precipitation), a number of datastreams that represent this measurement well are selected and ranked.

This all sounds easy, right? Well, ARM data are vast and complex. A team of three—a data scientist, an atmospheric scientist, and an archive scientist (start of a bad joke?) from Oak Ridge (ORNL) and Argonne national laboratories—got together to crack this problem. It soon became apparent that there were too many distractions at their home institutions, everything from strategic meetings to committees to choosing the new color of the document cover for travel policy recommendations (they kept it black).

The only way this key task, which was vital to the rollout for the new ARM Data Discovery interface (spoilers!), was ever going to be done is a sprint!

What is a sprint, you ask? A sprint is an intensive, focused activity with set tasks. External distractions are minimized, and task lists are completed. Also known as hackathons, sprints are a very effective way to get work done.

Ric Cederwall at sprint
Ric Cederwall (left), ARM metadata team lead, demonstrates the use of various tools for discovering product metadata and instrument dependencies.

From February 4 to 6, 2020, Maggie Davis, Scott Collis, and Ric Cederwall locked themselves away in a meeting room at the ARM Data Center (ADC) at ORNL. Armed with laptops, phones, and a wonderful set of tools designed by the ADC, this team started to hack away at a list of core measurements finding the very best datastreams to recommend.

One may ask, do Ric, Scott, and Maggie know all the best datastreams to recommend? Heck no! That’s why they made many phone calls to ARM translators, mentors, friends, and even one technical director.

Finally, after three days of sprinting, updates were ready for 30 project tasks detailing new recommended datastreams for the majority of core measurements. Then the atmospheric, archive, and data scientists headed home to partners, kids, dogs, and inboxes of urgent emails wondering where they went!