View on GitHub

diagstandards

Technical standards for diagnostic tools

Definition of the standard

This document intends to define a standard with the purpose of facilitating the sharing of source codes used to analyze climate data and produce evaluation results and figures. Said codes will hereafter be referred to as ‘diagnostic scripts’, or in short ‘diagnostics’ or ‘scripts’.

It is assumed that diagnostic scripts are able to be wrapped as standalone codes that run as independent processes, an approach that has been successfully tested in tools such as ESMValTool and CLiMAF. It also assumes that diagnostic scripts are able to read CF compliant NetCDF data files and to produce output files with formats such as NetCDF, CVS, txt, or excel spreadsheets; as well as graphic files.

Glossary

This standard defines three tiers of enforceability for the guidelines to share diagnostics:

Diagnostic scripts are expected to be contributed along with a YAML file containing configuration options and a set of files of the same format containing information regarding the input data to be used in the diagnostic. In addition to these two files, it is recommended that another file of such format is provided as a the script’s formal description.

The requirement of using YAML as the standard format for these files is motivated by the fact that it is common practice in software development to define configuration files in such format. Furthermore, the syntax that it offers is human-readable and relatively straightforward. Having easy-to-comprehend configuration and supporting files for the diagnostic script is essential in order to share them at a communitary level. In terms of code quality, the availability of YAML linters to check the syntax is an added asset with respect to other file formats.

The standard also defines three different levels of enforceability for the contents of these YAML files:

The diagnostic configuration file

The diagnostic’s configuration file must consist of a YAML file containing key-value mappings for all the parameters needed to run the diagnostic. Those parameters, defined below in terms of their enforceability level, will give information regarding: paths required for the execution of the diagnostic script and the storing of the generated output, the versions of the tool and diagnostic interfaces, and verbosity level of the logs generated by the script.

Required options

The following options must be present in the configuration file:

It is suggested to provide different directory names for run_dir, data_dir and plot_dir, as it makes the interaction with the results easier. However, using the same directory for all purposes is accepted.

Reserved options

The following options are not required, however if present they must follow the definitions listed below:

The data definition file

A data definition file must be provided for each variable required by the diagnostic. Each of these files must consist of a YAML file containing a mapping of mappings. The top-level mapping will have the path to the data file as keys. And as values, the mapping containing the metadata related to the dataset and variable pairs.

Required options

Reserved options

Others

Therefore, if a diagnostic requires N variables [v1, v2, … , vN] and M datasets [d1, d2, …, dM] for each variable, the structure of the data definition files will be as follows:

The script formal description file

It is recommended that each script comes with a description file in YAML syntax.

The following labels should be provided in order to complete the documentation section:

The labels listed below should be provided in order to complete the formal description of the script:

The command line interface

The diagnostic must be implemented as a command line tool that accepts the path to the YAML configuration file as a parameter.

It is recommended to include in the diagnostic executable the possibility to use a --help flag that prints the information defined in section The script formal description file.

Additionally, it is possible to include additional flags to complement the execution of the diagnostics. For example, a flag could be provided when re-running a diagnostic in order to indicate whether the output directory should be emptied from previous runs or not.