Formatting and File Naming Protocols

 

File Type/Format

NetCDF is the preferred data format because it supports efficient data storage and reliable/robust documentation of the data structure. More information about netCDF is available at http://www.unidata.ucar.edu/packages/netcdf/faq.html. ASCII and HDF formats are used for some “External Data Products.” When using ASCII, a description of the file structure and its proposed documentation should be reviewed and approved by the ARM Data Center managers. HDF is the standard for most satellite data. More information about HDF is available at http://www.hdfgroup.org and http://www.hdfeos.org.

File Naming Conventions

File naming protocols

Processed Data

An example netCDF data file name is depicted below:
The sgp5mwravgB4.c1.20040706.020415.cdf file contains 5-minute averaged microwave radiometer data from the Southern Great Plains Vici site from July 6, 2004. The data level is “c1” indicating the data was derived or calculated via Value-Added Processing (see Data Levels).

ARM netCDF files shall be named according to the following naming convention:
(sss)(nn)(inst)(qqq)(Fn).(dl).YYYYMMDD.hhmmss.cdf.

where:

sss
is the site identifier (e.g., sgp, twp, nsa)

nn
is the data integration period (e.g., 1, 5, 15, 30, 1440)

inst
is the instrument abbreviation (e.g., mwr, wsi, mpl)

qqq
is an optional qualifier that distinguishes these data from other data sets produced by the same instrument

Fn
is the facility designation (e.g., C1, E13, B4)

dl
is the data level (e.g., a0, a1, b1, c1)

The length constraints are:

  • sss: 3 characters
  • Fn: 2 or 3 characters
  • dl: 2 characters
  • (sss)(nn)(inst)(qqq)(Fn).(dl): MUST be 33 characters or less.

“The TOTAL length of a filename sent to the ARM Data Center MUST be 61 characters or less.”

Raw Data

Raw data files shall be named according to the following naming convention:
(sss)(inst)(Fn).00.YYYYMMDD.hhmmss.raw.(xxxx.zzz)

where:

sss
is the site identifier (e.g., sgp, twp, nsa)

inst
is the base instrument abbreviation (e.g., mwr, wsi, mpl) [as with the processed data above]

Fn
is the facility designation (e.g., C1, E13, B4)

xxxx.zzz
is the original raw data file name produced on the instrument

An example raw data file name is:
nsamwrC1.00.20021109.140000.raw.20_20021109_140000.dat

This file is from the North Slope of Alaska Barrow site. It contains raw microwave radiometer data for November 9, 2002, for the hour beginning 140000. Most raw instrument data are collected hourly resulting in 24 raw data files per day. These files are bundled into daily tar files before archival.

Tar bundles shall be named according to the following naming convention:
(sss)(inst)(Fn).00.YYYYMMDD.000000.raw.(zzz).tar

where:

sss
is the site identifier (e.g., sgp, twp, nsa)

inst
is the base instrument abbreviation (e.g., mwr, wsi, mpl)

Fn
is the facility designation (e.g., C1, E13, B4)

zzz
is the extension from the original raw data file name, usually the format of the file or an instrument serial number.

The example raw file shown above will be archived in a tar bundle named
nsamwrC1.00.20021109.000000.raw.dat.tar.

Guidelines for Original Raw File Naming

When possible, the original file name produced on the instrument or instrument data system should contain adequate information to determine the origin of the file including:

  • unique site/facility indicator
  • YYYYDDMM or YYYYJJJ
  • hhmmss, hhmm, or sequence number if more than one raw file per day
  • minimal indication of instrument type.

Under constraints of 8.3, it is probably not possible to include all this information. In these instances, it is important to include adequate header information inside the file to permit the user to determine the source/origin data and provide a reference date (including year) and time.

Data names are case sensitive. xxxxxx.DAT and xxxxxx.dat may be interpreted as two different names by ingests and bundling routines. Instruments should be consistent in the way the original file names are assigned, including case.

Other Data Formats

Processed ARM data may be stored in a format other than netCDF. The basic naming convention for processed files will not change, but the final extension will change accordingly:

asc
ASCII data format

hdf
HDF data format (limited to satellite data)

png
PNG data format (standard ARM image format)

mpg
MPG data format (standard ARM movie format)

Other data formats (e.g., gifs, jpg) may also exist, but are not recommended for future development.

Data Levels

Data levels are based on the “level of processing” with the lowest level of data being designated as raw or “00” data. Each subsequent data level has minimum requirements and data level is not increased until ALL those requirements of that level as well as the requirements of all data levels below that level have been met.

00
raw data – primary raw data stream collected directly from instrument

01
raw data – redundant data stream or sneakernet data

a0
converted to netCDF

a1
calibration factors applied and converted to geophysical units

a2… to a9
further processing on a1 level data that does not merit b1 classification

b1
QC checks applied to measurements

b2… to b9
further processing on b1 level data that does not merit c1 classification

c0
intermediate value-added data product; this data level is always used as input to a higher level “VAP”

c1
derived or calculated value-added data product (VAP) using one or more measured or modeled data (a0 to c1) as input

c2… to c9
further processing applied to a “c1” level data stream

s1
summary file consisting of a subset of the parent .c1 file with simplified QC and known ‘bad’ values set to missing

s2
summary file consisting of a further – processed s1 data.

Notes:

  1. Not every data level need be produced for each instrument data set. For example, if conversion to netCDF and calibration and engineering units are applied in a single processing step, no “a0” data product would be produced.
  2. Data level .cN is restricted to data derived or calculated through value-added processing.

Graphic Data Formats

For formatted documents and graphics-rich documents, PDF file type is standard. For photographs, drawings, sketches, and data plots, PNG file type is standard. For movies, MPG file type is standard.

File Duration

To control the number of small files and to help facilitate the use of ARM data, the suggested file period is 24 hours. Very large data sets may be routinely split into two or more netCDF files per day to increase usability. Infrequently, daily data files may be split into two files when the global header information changes as a result of a maintenance action (e.g., instrument serial number or calibration change).

Measurement Metadata and Standard Measurement Names

A scientifically relevant “measurement description” is a structured description of a data stream; the description addresses why the data stream exists. Data streams also contain other information that is important in understanding or interpreting the data stream but are not considered significant for naming purposes. Examples include global information, such as location; calibration procedural information; QC checks and flags. If relevant, other instrument details can be included:

  • Orientation: downwelling, upwelling, or dependent on installation.
  • Key information to characterize the measurement (e.g., diffuse or direct).
  • Characterization of the spectra: number of spectra and over what range of wavelengths in nm.
  • Relative position or location (e.g., height).
  • Time interval information (e.g., averaging time and measurement intervals).
  • The instrument used for the measurement (occasionally important especially if it comes from a data stream containing results from several instruments).
  • An indication that the data is a best estimate data or a calculated value data stream. Unless indicated otherwise, it is implicit that the measurement is observed.