Generating Ensemble#

Beyond its data reconstruction capabilities, restoreio also provides the feature to create ensemble of the velocity vector field. These ensemble is crucial for quantifying uncertainties, which holds significance for various applications. For a more in-depth understanding of the ensemble generation algorithm, we direct interested readers to [2].

To create velocity ensemble, simply activate the uncertainty_quant option within restoreio.restore(). Do note that ensemble can be generated for a single time point only. This section elaborates on the utilization of restoreio.restore() specifically for ensemble generation purposes.

Required Variables#

To generate ensemble, you should provide one of the following additional variables in your input file:

If you choose to provide GDOP variables instead of the velocity error variables, the velocity errors are calculated from GDOP as follows:

\[\begin{split}\begin{align} \sigma_e &= \sigma_r \mathrm{GDOP}_e, \\ \sigma_n &= \sigma_r \mathrm{GDOP}_n, \end{align}\end{split}\]

where \(\sigma_e\) and \(\sigma_n\) are the east and north components of the velocity error, \(\mathrm{GDOP_e}\) and \(\mathrm{GDOP}_n\) are the east and north components of the GDOP, respectively, and \(\sigma_r\) is the radar’s radial error. You can specify \(\sigma_r\) using the scale_error argument within the function (also refer to Scale Velocity Errors section below).

Ensemble Generation Settings#

The following settings for ensemble generation can be set within the restoreio.restore() function:

Write Ensemble to Output#

The write_samples option allows you to save the entire population of ensemble vector fields to the output file. If this option is not enabled, only the mean and standard deviation of the ensemble will be stored. For more details, please refer to the Output Variables section.

Number of (Monte-Carlo) Samples#

The num_samples argument of the function enables you to specify the number of samples to be generated. This value should be greater than the number of velocity data points. Keep in mind that the processing time increases linearly with larger sample sizes.

Number of Eigen-Modes#

To generate ensemble, the eigenvalues and eigenvectors of the covariance matrix of the velocity data need to be computed. For a velocity data with \(n\) data points, this means the eigenvalues and eigenvectors of an \(n \times n\) matrix have to be calculated. However, such a computation has a complexity of \(\mathcal{O}(n^3)\), which can be infeasible for large datasets.

To handle this, we employ a practical approach where we only compute a reduced number of \(m\) eigenvalues and eigenvectors of the covariance matrix, where \(m\) can be much smaller than \(n\). This simplification reduces the complexity to \(\mathcal{O}(n m^2)\), which enables us to process larger datasets while maintaining a reasonable level of accuracy. For a better understanding of this concept, we refer the interested reader to Section 4 of [2].

The ratio_num_modes argument of the function allows you to specify the number of eigenvectors of the data covariance to be utilized in the computations. The number of modes should be given as a percentage of the ratio \(m/n\).

Keep in mind that the processing time quadratically increases with the number of eigenmodes. We recommend setting this value to around 5% to 10% for most datasets.

Kernel Width#

The kernel_width argument of the function represents the width of a spatial kernel used to construct the covariance matrix of the velocity data. The kernel width is measured in the unit of the velocity data points. For example, a kernel width of 5 on an HF radar dataset with a 2 km spatial resolution implies a kernel width of 10 km.

It is assumed that spatial distances larger than the kernel width are uncorrelated. Therefore, reducing the kernel width makes the covariance matrix of the data more sparse, resulting in more efficient processing. However, a smaller kernel width may lead to information loss within the dataset. As a general recommendation, we suggest setting this value to 5 to 20 data points.

Scale Velocity Errors#

The scale_error argument serves two purposes:

  • If the Ocean’s Surface East and North Velocity Error Variables are included in the input dataset, the provided scale value is multiplied by the velocity error. This is useful to match the unit of the velocity error to the unit of the velocity data if they are not in the same unit. If you have velocity errors in the same unit as the velocity data, it is recommended to set this quantity to 1.

  • If the Geometric Dilution of Precision (GDOP) Variables are included in the input dataset, the given scale value is interpreted as the HF radar’s radial error, \(\sigma_r\). In this case, the velocity error is calculated by multiplying the radar’s radial error by the GDOP variables. The typical range for the radial errors of HF radars is between 0.05 to 0.20 meters per second.