Improving triggers in deep convective parameterizations using machine learning



Zhang, Tao — Brookhaven National Laboratory
Lin, Wuyin — Brookhaven National Laboratory

Area of research:

General Circulation and Single Column Models/Parameterizations

Journal Reference:

Zhang T, W Lin, A Vogelmann, M Zhang, S Xie, Y Qin, and J Golaz. 2021. "Improving Convection Trigger Functions in Deep Convective Parameterization Schemes Using Machine Learning." Journal of Advances in Modeling Earth Systems, 13(5), e2020MS002365, 10.1029/2020MS002365.


General circulation models (GCMs) often rain too frequently and at reduced intensity compared to observations. These deficiencies are conspicuously manifested in simulating the diurnal cycle of precipitation. These problems are known to be closely related to the convection trigger function, which is a set of conditions used to determine whether convection will be activated at a given time in the GCM convective parameterization. Traditional triggers suffer from large uncertainties and are ad hoc because the mechanism of deep convection occurrence is not fully understood. We use a machine learning (ML) model to construct a novel convection trigger function trained on the long-term data from the ARM user facility and assess its dependencies and efficacy.  


The ML convective trigger function greatly outperforms four convective available potential energy (CAPE)-based triggers at the Southern Great Plains (SGP) site in the central U.S., and the Manacapuru (MAO) site in the Amazon basin. A series of augmented rules are derived from the ML trigger to obtain explicit knowledge from the black-box ML trigger functions – these rules could be used to improve existing traditional CAPE-based triggers.


In this study, we implemented a novel deep convection trigger function using the XGBoost method, which is a state-of-the-art ML classification model. Data used for training the ML trigger functions are from the variationally constrained ARM forcing data set (VARANAL) and are evaluated from its SGP and MAO sites. Eleven boreal summer seasons (June, July, August) from 1999 to 2009 are used for SGP, and two years of data are used for MAO from 2014 to 2015 that cover the Green Ocean Amazon (GoAmazon2014/15) field campaign. The ML models are evaluated after separately training for the two sites, as well as a joint training that combines the data from both sites. The training data set contains a number of large-scale predictors: surface heat fluxes, surface temperature and relative humidity, CAPE, lifting condensation level, and convective inhibition—as well as the vertical profiles of temperature, specific humidity, wind shear, and advective tendencies. The performance of the ML trigger is compared with four convective trigger functions commonly used in GCMs: CAPE, undilute CAPE, dilute dynamic CAPE (dCAPE), and undilute dCAPE. The ML trigger substantially outperforms the four CAPE-based triggers in terms of the F1 score metric, widely used to estimate the performance of ML methods. The site-specific ML trigger functions can achieve, respectively, 91% and 93% F1 scores at SGP and MAO. The unified trigger also has a 91% F1 score with virtually no degradation from the site-specific training, suggesting the potential of a global ML trigger function. The ML trigger alleviates a GCM deficiency regarding the overprediction of convection occurrence, offering a promising improvement to the simulation of the diurnal cycle of precipitation. Further, to overcome the black-box issue of the ML methods, insights derived from the ML model are discussed that may be leveraged to improve traditional CAPE-based triggers.