CMIP Data Access

Overview

ESGF Nodes and Current Status

Access Routes

Alternative Access Platforms

Errata and Known Issues With Datasets

Overview

Please note, this section of the website is tailored to the CMIP6 output. For data from earlier phases, please see the relevant links on the CMIP3 and CMIP5 pages.

CMIP6 model output is available through a distributed data archive developed and operated by the Earth System Grid Federation (ESGF). For more information on this, see the ESGF Nodes and Current Status section below.

An important first step to understanding CMIP output, is understanding the vocabulary which is regularly used. Firstly, a project refers to either the CMIP phase (e.g. CMIP6) or the CMIP6-endorsed MIP for which the data has been produced. Experiments are requested by projects.  A simulation is a specific integration of a specific model carried out for an experiment, which in turn is part of a MIP.  Modelling centres sometimes perform the same experiment multiple times, varying the starting point slightly, to understand the role of climate variations;  this set is called an ensemble.

Practical steps through which you can access CMIP6 data can be found in the Access routes section.

ESGF Nodes and Current Status

Earth System Grid Federation (ESGF) is a collaboration of groups, agencies and institutions around the world, that are dedicated to the development and operation of a long-term system for the management, access and analysis of climate data. The ESGF architecture is based on a system of autonomous and distributed Nodes. Data is hosted on a collection of nodes located at modelling centres or data centres across the world. Nodes exchange information about their data holdings and services, trust each other for registering users and establishing access control decisions. The net result is that a user can use a web browser or rich desktop client, connect to any Node, and seamlessly find and access data throughout the federation.

The ESGF provides web access to the CMIP5 and CMIP6 model data, enabling users to search and download CMIP data from around the world. Nodes are divided into three categories, based on the data which they host and their data search capabilities. In the following descriptions, original CMIP data is the data a specific modelling centre has produced

  1. Data replication nodes: publish original CMIP data and replicate other modelling centre’s CMIP data.
  2. Data indexing nodes: publish only original CMIP data, but allow for indexing/searching of data on other nodes
  3. Replication and indexing nodes: Nodes which publish original CMIP data, replicate other modelling centre’s CMIP data, and allow for indexing of data on other nodes
  4. Original data nodes: Publish only original CMIP data.

A full, alphabetised list of modelling centres and ESGF nodes providing CMIP6 data can be found here. A map showing all the ESGF nodes, along with the CMIP Modelling Centres can be found on here.

To view the current status of the ESGF nodes, please click here. The answers to many questions surrounding the data management and performance of the ESGF can be found here.

ESGF also provide some statistics on the CMIP6 data, namely the number of published datasets and their size, the data usage metrics, and some general ESGF wide statistics.

ESGF user tutorials are available here.

Access Routes

To access CMIP6 data, navigate the ESGF Metagrid search page here. This page is in its beta testing phase. The old CoG search facility can be found by following this link (see a list of all ESGF nodes with CMIP data by following this link).

From these pages, users can search for their required data. On the left-hand side select CMIP6 and search using the blue button. This will bring up a number of options which can be used to filter the search. Filter options were defined by the CMIP6 Data Request, where:

  • MIP era: Which phase of CMIP the data is associated with.
  • Activity: Which MIP data is associated with (for CMIP6 deck experiments, this will be CMIP). A full table of MIPs can be found here.
  • Institution ID: Which modelling centre has produced the data. An alphabetised list of modelling centres can be found here.
  • Source ID: The model which has produced the data. A list of CMIP6 source IDs can be found here. This table provides the Source IDs along with their associated Modelling centre, intended and actual MIP involvement and information about the atmospheric, ocean, land and sea ice models with their respective resolutions.
  • Experiment ID: The specific experiment which has been performed. A list of all experiments across CMIP6 can be found here.
  • Variant label:  This has the format <realisation index><initialisation index><physics index><forcing index>. This is used to uniquely define each simulation of an ensemble of runs contributed by a single model. The realisation index distinguishes among ensemble members which differ only in their initial conditions. The initialisation index distinguishes among ensemble members which differ only in their initial procedures. The physics index identifies the physics version used by the model. The forcing index distinguishes ensemble members with different variants of forcing applied.
  • Table ID: the MIP table used to define the variable. The CMIP6 MIP table can be found here. In this table you can filter by Table ID and variable to find metadata on the variable, including dimensions, frequency, units and cell methods. Additionally, you can use this MIP table to identify which data you wish to download from ESGF.
  • Variable: the variable contained in the data file. A full list of CMIP6 variables can be found here.
  • Grid label: this describes the model grid used. For example; global mean data (gm), data reported on a model’s native grid (gn) or re-gridded data reported on a grid other than the native grid. See a full list of the available grid labels here.

The new Metagrid search facility currently only offers multi-file download through Wget scripts. To use Globus, Synda, or Python, use the existing CoG search system. For large data downloads, Globus provides a better performance where available. 

Selecting the arrow to the left of the file names will allow users to see additional information about the data, including its Metadata and Citation information. See the CMIP data citation and licensing page for more information.

ESGF user tutorials are available here.

Alternative Access Platforms

The ESGF have formed a partnership with Pangeo to facilitate the storage of CMIP data on the cloud. Detailed instructions on how to find and access this data can be found on the Pangeo/ESGF Cloud Data Working Group webpages.

CMIP data is also available in many storage facilities. Below are links to some of these. If you know of another place CMIP data is currently being stored, please submit this form to let us and the community know!

For all non-ESGF data access routes, we encourage users to verify that the data used is the latest version. This can be checked via the ESGF search index.

Errata and Known Issues With Datasets

The ES-DOC Errata platform tracks the known issues of ESGF datasets. It also provides documentation on data version changes. It is important to regularly check Errata for any potential issues with the data you have downloaded. Current issues submitted to the Errata system can be found here. Searches can be made through the Project (CMIP5/CMIP6), Experiment ID, Institution ID, Source ID, Variable ID, issue severity, and issue status (e.g. new/solved). Descriptions of these IDs can be found in the Access routes section below.

To top