Atmospheric research turns to the power of computer programs that learn from data
Machine learning (ML), an algorithm-driven application of artificial intelligence (AI), is used to augment science and discovery, and it is beginning to supplant traditional statistical approaches. ML has the potential to revolutionize science, which is increasingly overwhelmed by big data sets that require analysis.
ML helps computers learn by automating some of the most complex parts of analysis. It sifts through data in search of correlations and predictors that would otherwise remain hidden or require intensive human labor to uncover.
Once ML is in motion and its algorithms are “trained” on data, it requires no explicit programming. In time, as more data are available, these algorithms learn to produce increasingly accurate solutions.
All this could drastically boost the productivity of researchers by allowing them to process larger and more comprehensive data sets than previously feasible.
In atmospheric science, emerging ML tools are important because weather and earth system modelers grapple with intersecting and complex variables.
Over two days in the fall of 2020, a star-power list of researchers affiliated with the U.S. Department of Energy (DOE) gathered for a virtual workshop on ML, statistical constraints, and other emerging methods for streamlining investigations of earth systems and weather.
The online meeting took the place of a breakout session that would have occurred in person at the June 2020 Joint Atmospheric Radiation Measurement (ARM) User Facility/Atmospheric System Research (ASR) Principal Investigators Meeting. That event was abbreviated by the need to meet virtually.
The October 19–20 workshop was also a natural follow-up to previous ARM/ASR joint meeting breakout sessions on ML.
A Good Match
ARM, a DOE scientific user facility, seems like a good match for ML applications. In play for nearly three decades, ARM is among the largest sources of atmospheric data in the world and has petabytes of them.
In addition to fixed and mobile observatories, the user facility has a pair of high-powered computing clusters, Stratus and Cumulus. Both systems can be accessed by scientists funded by ASR or other research programs to work with ARM data through high-performance computing facility requests, which are much like requests for field campaigns.
The two clusters are also “data adjacent” and can easily stage large archives of ARM measurements for ML tasks.
In 2017, to acknowledge and encourage using ML in weather and earth system science, ARM issued a call asking for “applications of machine learning for improving ARM data quality and uncertainty.” The call yielded projects that illustrate the kind of ML work being done within atmospheric science.
For example, Shaocheng Xie, a research scientist at Lawrence Livermore National Laboratory in California, led a team of researchers to develop an ML framework for automating quality assessments of ARM data. The team focused on a model that detects signal noise from rain contamination on the wide, screen-like exterior radomes of microwave radiometers.
Edward Luke, a senior applications engineer at Brookhaven National Laboratory and a proponent of ML back in the 1990s, used the call to develop ML algorithms to detect errors introduced into scanning radar data by sea clutter. The term describes a non-meteorological phenomenon linked primarily to wave action on open water.
Luke says the proposed ML framework can help maximize the value of data from ARM’s Aerosol and Cloud Experiments in the Eastern North Atlantic (ACE-ENA) field campaign. The two-phase ground- and air-based campaign in the Azores spanned 2017 and 2018.
“We felt it was important to get practitioners of machine learning in ARM and ASR together. We need to discuss approaches and limitations, and to build a community.”
The online format of the 2020 workshop allowed for a significantly expanded audience of researchers―more than otherwise possible at an in-person meeting. Breakout sessions normally do not draw almost 200 participants, as the October workshop did. They also take two hours or less, not two half-days.
“We felt it was important to get practitioners of machine learning in ARM and ASR together,” says workshop co-organizer Joseph Hardin, a computational scientist at Pacific Northwest National Laboratory (PNNL) in Washington state. “We need to discuss approaches and limitations, and to build a community.”
The online workshop revealed emerging ML-centered partnerships among national laboratories, industry, and universities.
Combined with DOE’s interest in ML and AI, says Hardin, such collaborations “accelerate the speed at which we can understand how, and where, these techniques can best be used in our field.”
Bulk Ice, Cloud Particles, and More
The October workshop unleashed a fire hose of information. There were four hours of presentations each day, adding up to 22 talks, with deep-diving side chats and lively question-and-answer sessions.
To provide a small sample, there were workshop talks that touched on ML-assisted investigations of:
- aerosol activation (presented by Sam Silva, PNNL)
- turbulence and convection (Ignacio Lopez-Gomez, California Institute of Technology)
- ARM mass spectrometer data sets (Manjula Canagaratna, Aerodyne Research)
- cloud microphysics (Hugh Morrison, National Center for Atmospheric Research)
- bulk ice predictions (Matthew R. Kumjian, Pennsylvania State University).
Daniel Feldman, a research scientist at Lawrence Berkeley National Laboratory in California, delivered a summary of ML science opportunities during ARM’s upcoming Surface Atmosphere Integrated Field Laboratory (SAIL) field campaign. Feldman is the principal investigator for SAIL, slated to launch in September 2021 in the Colorado Rockies.
Vanessa Przybylo, a PhD student at the University at Albany, State University of New York, delivered an update on Classification of Cloud Particle Imagery and Thermodynamics (COCPIT). The ASR-funded project is developing an ML-aided scheme for classifying ice particle images captured by research aircraft during DOE field campaigns. COCPIT then co-locates these images with associated environmental properties.
Currently, says Przybylo, these abundant ice particle images “lack consolidation and are vastly underutilized.”
Saved by GPUs
To date, most ML applications involve neural networks and decision trees―nested sets of decisions through which data flow.
Neural networks were popular in the 1980s and periodically thereafter. They failed to live up to their data-analysis promise, however, until graphics processing units (GPUs) started to proliferate in high-performance computers. More GPUs made training the layered, brain-like neural networks practical.
In August 2020, Hardin, a former ARM radar engineer, helped teach an online course on understanding and applying ML.
Finding the right algorithm and tuning it is only about one-third of the ML challenge, he says. “The rest is cleaning up and preparing the data.”
In addition to Hardin, other workshop co-organizers were:
- ARM Engineering and Process Manager Jennifer Comstock, Susannah Burrows, and ManishKumar Shrivastava, all from PNNL
- Rao Kotamarthi and ARM Instrument Operations Manager Adam Theisen, both from Argonne National Laboratory (ANL) in Illinois
- Jitendra Kumar, ARM’s reprocessing and high-performance computing operations lead at Oak Ridge National Laboratory in Tennessee
- Nicole Riemer, University of Illinois at Urbana-Champaign
- Marcus van Lier-Walqui, Columbia University.
Most of the organizers are involved in active ML projects.
Theisen, for example, is on a team working on a simple ML algorithm to calculate estimated rain rates from a variety of sensors. Their testbed is ARM’s Southern Great Plains (SGP) atmospheric observatory, where optical, mechanical, acoustic, and laser sensors measure precipitation.
‘We Want to Find Ways to Make This Work’
“As scientists, it’s our job to sift through piles and piles of data and try to extract useful relationships. Machine learning methods are designed to help. These tools are no longer black boxes. They are a game-changer.”
In the first of the workshop’s three keynote addresses, ANL computational scientist Rick Stevens outlined AI for Science, a DOE report that appeared in February 2020. Co-authored by Stevens and five other DOE researchers, the report is a baseline document for developing AI technologies over the next five to 10 years.
Day Two keynote speaker Amy McGovern of the University of Oklahoma outlined aspirations for the new $20 million, five-year National Science Foundation (NSF) institute she directs, the AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography.
“We want to find ways to make this work across (atmospheric science) agencies,” she says.
Elizabeth Barnes of Colorado State University delivered the workshop’s last keynote, on the need for interpretable ML in climate science.
“As scientists, it’s our job to sift through piles and piles of data and try to extract useful relationships,” she says. “Machine learning methods are designed to help. These tools are no longer black boxes. They are a game-changer.”
Making ML Impacts
Within ARM and ASR, “machine learning is already making an impact,” says Hardin.
He cites investigations of proxy models, parameterizations, earth system forecasting, data fusion, super-resolution outputs, and even model replacement.
Hardin is working on ML applications for radars with PNNL postdoctoral researcher Andrew Geiss, who delivered a workshop talk. A key paper of theirs, on using a neural network to achieve super-resolution radar images of precipitation features, was published online in November.
In other ML-radar work, PNNL’s Samson Hagos, Zhe Feng, and others leveraged 15 years of ARM radar observations in the tropical western Pacific to develop a model that constrains poorly understood interactions between tropical convective and stratiform clouds. Hagos presented a workshop talk on the ML-assisted cloud population model, which enables improved representations in high-resolution regional and global earth system models.
For numerical weather forecast models, ANL’s Jiali Wang is developing a domain-aware neural network to emulate the planetary boundary layer. During the October workshop, she delivered a talk on the challenges of using deep learning—an extension of ML—to generate high-resolution earth system data.
ARM is also developing VAPs from its data by using ML. At PNNL, earth scientist Donna Flynn created a soon-to-be-released version of the Micropulse Lidar Cloud Mask (MPLCMASK) VAP. It uses ML to improve the cloud mask from the ARM micropulse lidar, and hence also improves the detection of hard-to-discern low-level clouds.
Sage and a Sister Project
ANL’s Scott Collis introduced the Sage project, funded by NSF. It will “push AI to the edge,” he says, by deploying a network of sensors to record images, audio, weather, air quality, and other data. (The “edge” refers to the part of a network as close to a sensor as possible.) The plan is to link existing sensors to computers embedded within ML frameworks.
A sister project, with Collis as the lead scientist, is called ARMing the Edge, designed to improve Doppler lidar observations streaming from ARM’s SGP atmospheric observatory. (Work started in August 2020.)
During the workshop, ANL’s Robert Jackson outlined applications of edge computing to ARM. He ran case studies from historical Doppler lidar data collected at the SGP.
Jackson’s early results indicate that strategic lidar processing has the potential to predict clear, cloudy, and rainy conditions well over 90% of the time. That will improve the cloud processing studies now possible with SGP data, which must be compressed because of their large bandwidth.
With data preprocessing, he says, edge computing can preserve “a lot of vital information about microphysical cloud processes” not currently transmitted to ARM users.
In other ways too, adds Collis, Sage can be “tailored (to) the needs at ARM. I’m excited to see what we can do.”
He pointed to the planned deployment of pan-tilt-zoom cameras at the SGP―an effort to test ML-aided strategies for predicting cloud fraction.
Hardin, who praised “cross-community collaborations” with NSF and others, sees the workshop as “a way to get the ARM/ASR community to share cutting-edge results so that we can learn from each other.”
ARM is a DOE Office of Science user facility operated by nine DOE national laboratories.