The content of the ARM website is available to any browser, but for the best experience we highly recommend you upgrade to a standards-compliant browser such as Firefox, Opera or Safari.
VIEW CART
primary link menu HOME SITE INDEX PEOPLE
skip to main content ABOUT ARMABOUT ACRFSCIENCESITESINSTRUMENTSMEASUREMENTSDATAPUBLICATIONSEDUCATIONFORMS

Updates Archive

Operations Updates

ARM Climate Research Facility Operations Update - January 31, 2007

This bimonthly report provides a brief summary of significant accomplishments and activities in the operations area of the ARM Climate Research Facility (ACRF).

ARM Archive Sets Record for User Accounts

The ARM Archive stores and distributes the large quantities of data resulting from routine operations and scientific field campaigns conducted at the ACRF sites. Scientists use these data to study atmospheric radiation balance and cloud feedback processes, which are critical to the understanding of global climate change. In the first quarter of FY07 (October through December 2006), the ACRF recorded the largest number of Archive users than any other quarter on record—961!

The U.S. Department of Energy requires national user facilities to report facility use by total visitor days and facility to track actual visitors and active user research computer accounts. Historical data show an apparent relationship between the total number of users and the "size" of field campaigns, or intensive operational periods (IOP). Larger field campaigns draw increased site facility resources, which are reflected by the number of site visits and site visit days, research accounts, and device accounts. These types of users typically collect and analyze data in near-real time for an ongoing site-specific field campaign. To track Archive accounts, however, an individual is counted as only one unique user per site, even though he or she may open and close an account several times to obtain different data at one or more sites. The Archive accounts represent persistent (year-to-year) ACRF data users that often mine from the entire collection of ACRF data, which mostly includes routine data from the fixed and mobile sites, as well as cumulative IOP data sets. The number of Archive data users continues to steadily increase, independent of field campaign size.

Datastream Database Speeds Flow of Information

Scientific instrumentation at the ACRF sites generates massive amounts of data for atmospheric research. These data are processed at the Data Management Facility (DMF) housed at Pacific Northwest National Laboratory. In late 2006, the DMF completed the replacement of its database processing capabilities with a new Data System Database (DSDB), after a 2-year process of incremental upgrades and migration of metadata. The final upgrade showed significant improvement by increasing datastream processing up to 120 times faster.

Since nearly the outset of the ARM Program more than 15 years ago, the DMF relied on an internal "technical database" to store the datastream definitions (field names and attributes) and configurations used in processing ARM data. The technical database was implemented to provide storage of simple keyword/values. While functional, the technical database had room for improvement in performance, security, and especially in capability. In 2004, ACRF data infrastructure staff began working to replace the technical database with a modern database using PostgreSQL. The new database engine was chosen because of its maturity and full featured database capabilities, as well as its open-source licensing features which simplifies distribution to the various remote ACRF sites.

The DSDB was designed to logically and efficiently store the configuration and status information previously stored in the technical database. In order to retire that database, every datastream ingest needed to be re-released and tested with the new database-a process that was completed in October 2006. After installing a copy of the DSDB at the data reprocessing center at Oak Ridge National Laboratory, the ARM Archive reported a 120-fold increase in performance on long-running jobs. The effort to replace the technical database and develop the DSDB has yielded dramatic results—particularly in improved ingest run times—and provides maximum processing capacity for data system growth.

All Ingests - Total Daily Run Time (chart)
With all data ingests implemented using the new database engine, the DMF's routine processing saw a drastic improvement in system performance.