NOAA Unified Forecast System (UFS) Replay
Replaying UFS GEFSv13/GFSv17 to ERA5 and ORAS5
Purpose and scope
The NOAA Unified Forecast System (UFS) / Global Ensemble Forecast System version 13 (GEFSv13) replay data set was developed to provide initial conditions for the retrospective forecast archive in support of the next implementation of the NOAA medium range forecast system (GEFSv13 / GFSv17).
This data set was produced by replaying [Orbe et.al 2017 ] the coupled version of the new UFS model to external reanalyses: ERA5 for the atmosphere and ORAS5 for the ocean and ice. For the replay experiment, we used the HR1 tag of the NOAA UFS coupled model that included atmospheric, oceanic, ice, land, and wave model components (see Table 1 for component description).
Table 1: Summary of the UFS and reply components in the GEFSv13 replay dataset.
Earth system component | Model version | Resolution | Replay constraint |
---|---|---|---|
Atmosphere | UFS/FV3. HR1 physics | C386 cube sphere (¼ degree), 127 vertical levels | Temperature, specific humidity, winds (horizontal velocity components), ozone, surface pressure replayed to ERA5 |
Land | Noah-MP LSM | Four vertical levels | Snow depth assimilation from NCEI Global Historical Climatology Network (GHCN) and snow cover from U.S. National Ice Center Interactive Multisensor Snow and Ice Mapping System (IMS) |
Ocean | MOM6 | 72 hybrid levels, nominal ¼ degree tri-polar grid | Temperature, salinity, currents (horizontal velocity components) replayed to ORAS5 |
Ice | CICE6 | ¼ degree tri-polar grid, 5 ice categories, 7 model levels | Ice concentration inserted from ORAS5 using Sea-ice, Ocean, and Coupled Assimilation project (SOCA) |
Wave | WaveWatch III | ¼ degree | None |
Replay methodology allows the resulting dataset to track the dynamics of selected variables from the external reference dataset, yet allows for the coupled model to generate unconstrained model variables consistent with the UFS dynamics. For example, the replay constrains the dynamics of the (3d) temperature, humidity, and winds in the atmosphere and temperature, salinity and currents in the ocean. At the same time, the UFS model was free to compute its own surface fluxes between coupled components and to compute land and wave model states that were not directly constrained by the external reanalyses.
In addition to replaying the atmospheric and oceanic states to the external reanalysis, we used a future version of NOAA's JEDI-based land data assimilation system (Gichamo and Draper, 2022) to constrain snow depth by assimilating snow depth observations from the NCEI Global Historical Climatology Network (GHCN) daily station data and satellite-derived snow cover from the U.S. National Ice Center Interactive Multisensor Snow and Ice Mapping System (IMS). The JEDI Sea-ice Ocean and Coupled Analysis system (SOCA) was used to adjust the sea-ice thickness, concentration, and the snow-depth over ice to be consistent with the ORAS5 sea-ice analysis.
The original replay dataset covered the period from January 1994 to October 2023 at a nominal ¼ resolution. Since the original production, the replay dataset was extended several times to near present day. We expect to continue these extensions until GEFSv13 / GFSv17 is transitioned to operational practice. In addition to the ¼ version of the replay dataset, we are also producing a native 1 degree version of the replay dataset by replaying the 1 degree version of the UFS model to ERA5 and ORAS5 reanalysis for the period 1958 to 2023. A sample of the 1 degree native replay data is available and will be extended as model output is produced.
The replay dataset is staged in an egress free environment on AWS and GCP cloud services, courtesy of the NOAA Open Data Dissemination program. The original output of the model is stored in AWS using native model output. To facilitate analysis of the replay dataset and to enable efficient AI training pipelines, we also have a curated version of the replay dataset in the zarr format on the GCP cloud service. (see Data access tab for details)
References
Gichamo, T. Z., and C. S. Draper, 2022: An Optimal Interpolation–Based Snow Data Assimilation for NOAA’s Unified Forecast System (UFS). Wea. Forecasting, 37, 2209–2221, https://doi.org/10.1175/WAF-D-22-0061.1
Orbe, C., Oman, L. D., Strahan, S. E., Waugh, D. W., Pawson, S., Takacs, L. L., and Molod, A. M., 2017: Large-scale atmospheric transport in GEOS replay simulations. Journal of Advances in Modeling Earth Systems, 9, 2545–2560, https://doi.org/10.1002/2017MS001053
Replay methods
Each stream of the replay simulation is initialized from a pre-existing reanalysis for each underlying model component of the NOAA Unified Forecast System (UFS). The Global Ensemble Forecast System (GEFSv13) configuration of the UFS uses the Community Mediator for Earth Prediction Systems to couple the GFDL Finite-Volume Cubed-Sphere Dynamical Core (FV3; 25 km, 127 vertical levels) and Modular Ocean Model (MOM6; ¼ degree tri-polar grid, 72 vertical levels), Noah-MP Open-Source Community Land Surface Model (Noah-MP LSM, 4 vertical levels), CICE6 sea ice model, and EMC WAVEWATCH III wave model. For this NOAA GEFSv13 replay, the atmosphere and ocean model initial conditions (ICs) were preprocessed from the ECMWF fifth generation atmospheric reanalysis (ERA5) and Ocean Reanalysis System 5 (ORAS5) data.
From the cold-start IC described above, the UFS model generates a 9 hour free running forecast (aka the predictor segment). At forecast hour 6, “analysis increments” are calculated from the difference between the component model backgrounds and ERA5/ORAS5 external reanalyses. Analysis increments for the following variables were used to constrain the coupled model during a subsequent corrector segment:
- Atmosphere (3D, from ERA5): temperature, specific humidity, horizontal wind velocity, surface pressure (3D pressure calculated for use with adjustments), and ozone concentrations.
- Ocean (3D, from ORAS5): temperature, salinity, and horizontal currents.
For the land and sea ice model states, a different approach was used to incorporate observational information. The Noah-MP LSM, which is forced by radiation and precipitation prognosed by the atmosphere model, assimilates snow depth and snow cover observations available from the NCEI Global Historical Climatology Network (GHCN) daily data and the U.S. National Ice Center Interactive Multisensor Snow and Ice Mapping System (IMS) using an Optimal-Interpolation technique (Gichamo and Draper, 2022). Sea ice concentration, thickness and snow depth over ice (from ORAS5) were directly inserted into the initial conditions using the state adjustment and mapping functions from the JCSDA Joint Effort for Data assimilation Integration framework (SOCA). The wave model was unconstrained but was forced by the atmosphere and coupled with the ocean’s near-surface layer mixing parameterization.
The replay simulation proceeds as a sequence of predictor and corrector steps that provide initial conditions to each subsequent six-hour long segment. In other words, while the very first forecast for each replay stream is directly initialized from the external reanalysis, the remainder of the replay simulation (as long as six years for some of the replay streams) is only nudged towards the external reference reanalysis. The resulting data set provides cross model component consistency compatible for initialization of GEFSv13 retrospective forecasts.
References
Bloom, S. C., L. L. Takacs, A. M. da Silva, and D. Ledvina, 1996: Data Assimilation Using Incremental Analysis Updates. Mon. Wea. Rev., 124, 1256–1271, https://doi.org/10.1175/1520-0493(1996)124<1256:DAUIAU>2.0.CO;2.
Gichamo, T. Z., and C. S. Draper, 2022: An Optimal Interpolation–Based Snow Data Assimilation for NOAA’s Unified Forecast System (UFS). Wea. Forecasting, 37, 2209–2221, https://doi.org/10.1175/WAF-D-22-0061.1.
AWS Dataset Overview
The Unified Forecast System (UFS) / Global Ensemble Forecast System version 13 (GEFSv13) replay experiment input and output data are archived with AWS Simple Storage Service (S3) bucket storage .
This dataset includes native netCDF and binary files produced by the UFS model components and the Gridpoint Statistical Interpolation (GSI) data assimilation system. Input directories contain snow depth observations and European Centre for Medium-Range Weather Forecast fifth generation atmosphere reanalysis (ERA5) and Ocean Reanalysis System (ORAS5) data preprocessed for use by the UFS replay software.
Data Structure
The AWS dataset stores model output files for both the free forecast and analysis segments of the replay forecast. The replay dataset consists of three main file types:
1. History Files
The history files in the AWS dataset are divided by model component and are produced at different frequencies. Each file is associated with a specific forecast hour, denoted as "fhr," which represents the number of hours post initialization.
Atmosphere model output (FV3), available every 3 hours:
- NetCDF files:
- bfg_<YYYYMMDDHH>_fhr<FHR>_control (fields on single levels)
- sfg_<YYYYMMDDHH>_fhr<FHR>_control (3d variables)
-
Explanation of fhr files: In general, fhr00 and fhr03 files represent output from the replay “analysis,” which incorporates analysis increments into the dynamic model forecast as a constraint. The fhr06 and fhr09 files represent output from the model free running forecast (i.e., model background). An example interpretation for 12Z January 1, 1994 output (1994010112) follows:
- fhr00 files contain data valid for the center of the previous cycle's 6-hour "corrector (replay) segment." As an example, in a 12Z folder (e.g., 1994010112), fhr00 files provide model output with incremental analysis updates applied from 03Z (1994010103) to 06Z (1994010106), with accumulated values representing 03Z to 06Z and timestamps at 06Z.
- fhr06 files contain data valid for the center of the current cycle's 6-hour free running forecast. As an example, in the 1994010112 folder, fhr06 files provide output from the model background from 09Z to 12Z (1994010112), with accumulated values representing 09Z to 12Z and timestamps at 12Z.
- fhr09 files contain data valid for the end of the current cycle's 6-hour free running forecast. As an example, in the 1994010112 folder, fhr09 files provide output from the model background from 12Z to 15Z (1994010115), with accumulated values representing 12Z to 15Z and timestamps at 15Z.
- For the next cycle (e.g., 18Z 1994010118), the same conventions apply but are shifted by 6 hours. For example, fhr00 files for an 18Z folder represent incremental analysis updates from 09Z to 12Z (1994010112), with accumulated values from 09Z to 12Z and timestamps at 12Z.
- GRIB2 Files (Available only for the corrector segment):
- GFSFLX.GrbF<FHR> (Fluxes)
- GFSPRS.GrbF<FHR> (Pressure level data)
Ocean model output (MOM6), available every 6 hours:
- ocn_<YYYY>_<MM>_<DD>_<HH>.nc (Corrector segment and first guess) time stamp defines the middle of the averaging window.
Wave model output, available every 3 hours:
- ww3.<YYYY><MM><DD>T<HH>Z.nc
Sea ice model:
- Note: Ice history files were not saved, but sea ice output is included in the atmospheric model's output.
2. Increment Files
Atmosphere Model:
- File Name: fv3_increment6.nc
- Availability: Every cycle
Ocean Model:
- File Name: mom6_increment.nc
- Availability: Only at 12Z
3. Restart Files
Atmosphere Model:
- File Types:
- ca_data.tile<tile>.nc
- fv_core.rest.tile<tile>.nc
- fv_srf_wnd.res.tile<tile>.nc
- fv_tracer.res.tile<tile>.nc
- phy_data.res.tile<tile>.nc
- sfc_data.res.tile<tile>.nc
- ca_data.nc
- fv_cores.res.nc
- Tile Range: <tile> from 1 to 6
Ocean Model:
- File Types:
- MOM.res.nc
- MOM.res_1.nc
- MOM.res_2.nc
- MOM.res_3.nc
Ice Model:
- File Name: iced.<YYYY>-<MM>--<SS>-10800.nc
Wave Model:
- File Name: restar.ww3
Coupler:
- File Name: ufs.cpld.cpl.r.<YYYY>-<MM>--<SS>-10800.nc
Google Cloud Platform (GCP) Zarr Dataset
To facilitate access to the replay data for analysis and machine learning applications, we stored the analysis portion of the replay data as a continuous record in a zarr format. The zarr archive is interpolated from the native model component grids onto the ¼ degree nominal resolution Gausssian grid using a conservative regridding algorithm.
We also provide a nominal 1-degree dataset that was obtained by subsampling the ¼ gridded dataset by choosing every 4th grid point. We chose to use sub-sampling instead of conservative interpolation to ensure that we can preserve as close as possible relationships between atmospheric, microphysics, ocean, land, and ice variables as they are collocated in a single vertical column.
To facilitate usage of the self describing zarr archive, we are providing the following jupyter notebook example.
GCP zarr dataset was produced using the ufs2arco software.
Replay Observer Diagnostic (ROD)
In addition to the gridded replay output, we provide evaluations in observational space through the Replay Observer Diagnostic (ROD). This dataset includes observation equivalents from the NOAA NASA Joint Archive (NNJA) and the replay dataset, using Gridpoint Statistical Interpolation (GSI) data assimilation software. The output includes open-access observations, replay observation equivalents, original observation values and metadata, bias-corrected observations, and bias coefficients estimated by GSI. The ROD output for each six-hour cycle is located within the GSI subdirectory and is provided in the GSI ncdiag file format.
License
The Global Ensemble Forecast System version 13 (GEFSv13) UFS-Replay data is distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0 ) license.
Example Use Case: Downloading Files with AWS CLI
To download files from the AWS S3 bucket using the AWS Command Line Interface (CLI), follow these steps:
1. Install the AWS CLI:
If you haven't already, install the AWS CLI by following these instructions .
2. Configure AWS CLI:
Set up your AWS credentials and default region using the command: aws configure
3. List Files in the Bucket:
To view the available files in the S3 bucket, use the following command:
aws s3 ls s3://noaa-ufs-gefsv13replay-pds/ --no-sign-request
4. Download Specific Files:
To download a specific file, use the aws s3 cp command. For example, to download a NetCDF atmospheric history file for a specific date and time:
aws s3 cp s3://noaa-ufs-gefsv13replay-pds/1994/01/1994010100/bfg_1994010100_fhr03_control . --no-sign-request
This command downloads the file bfg_1994010100_fhr03_control to your current directory.
5. Download Entire Directory:
To download all files in a specific directory (e.g., all files for January 1, 1994, at 00Z), use:
aws s3 cp s3://noaa-ufs-gefsv13replay-pds/1994/01/1994010100/ . --recursive --no-sign-request
How to cite
The Global Ensemble Forecast System version 13 (GEFSv13) UFS-Replay data is distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0 ) license.
To provide appropriate attribution under this license, please cite the link to this PSL website. For example:
The NOAA Global Ensemble Forecast System (version 13) replay data used in our study was created by the NOAA Physical Sciences Laboratory (NOAA, 2024).
Full citation:
NOAA, 2024: The Global Ensemble Forecast System (version 13) Replay dataset. NOAA Open Data Dissemination Program. Subset used: [MONTH YEAR – MONTH YEAR], accessed [DAY MONTH YEAR], https://psl.noaa.gov/data/ufs_replay/
UFS Replay team
The NOAA Global Ensemble Forecast System (version 13) replay data set was created by members of PSL's Modeling and Data Assimilation Division.