restoreio.restore#

restoreio.restore(input, min_file_index='', max_file_index='', output='', min_lon=nan, max_lon=nan, min_lat=nan, max_lat=nan, min_time='', max_time='', time='', diffusivity=20, sweep=False, detect_land=True, fill_coast=False, convex_hull=False, alpha=20, refine_grid=1, uncertainty_quant=False, num_samples=1000, ratio_num_modes=1, kernel_width=5, scale_error=0.08, write_samples=False, plot=False, save=True, verbose=False, terminate=False)#

Restore incomplete oceanographic dataset and generate data ensemble.

Parameters:
inputstr

Input filename. This can be either the path to a local file or the url to a remote dataset. The file extension should be .nc or .ncml.

min_file_indexstr, default=’’

Start file iterator to be used for processing multiple input files. For Instance, setting input=input, min_file_index=003, and max_file_index=012 means to read the series of input files with iterators input003.nc, input004.nc, to input012.nc. If this option is used, the option max_file_index should also be given.

max_file_indexstr, default=’’

Start file iterator to be used for processing multiple input files. For Instance, setting input=input, min_file_index=003, and max_file_index=012 means to read the series of input files with iterators input003.nc, input004.nc, to input012.nc. If this option is used, the option min_file_index should also be given.

outputstr

Output filename. This can be either the path to a local file or the url to a remote dataset. The file extension should be .nc or .ncml only. If no output file is provided, the output filename is constructed by adding the word _restored at the end of the input filename.

min_lonfloat, default=float(‘nan’)

Minimum longitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the minimum longitude of the input data is considered.

max_lonfloat, default=float(‘nan’)

Maximum longitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the maximum longitude of the input data is considered.

min_latfloat, default=float(‘nan’)

Minimum latitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the minimum latitude of the input data is considered.

max_latfloat, default: float(‘nan’)

Maximum latitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the maximum latitude of the input data is considered.

min_timestr, default=’’

The start of the time interval within the dataset times to be processed. The time should be provided as a string with the format 'yyyy-mm-ddTHH:MM:SS' where yyyy is year, mm is month, dd is day, HH is hour from 00 to 23, MM is minutes and SS is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this argument is not given, the earliest available time in the dataset is used. Note that specifying a time interval cannot be used together with uncertainty quantification (using argument uncertainty_quant=True). For this case, use time argument instead which specifies a single time point.

max_timestr, default=’’

The end of the time interval within the dataset times to be processed. The time should be provided as a string with the format 'yyyy-mm-ddTHH:MM:SS' where yyyy is year, mm is month, dd is day, HH is hour from 00 to 23, MM is minutes and SS is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this argument is not given, the latest available time in the dataset is used. Note that specifying a time interval cannot be used together with uncertainty quantification (using argument uncertainty_quant=True). For this case, use time argument instead which specifies a single time point.

timestr, default=’’

Specify a single time point to process. The time should be provided as a string with the format 'yyyy-mm-ddTHH:MM:SS' where yyyy is year, mm is month, dd is day, HH is hour from 00 to 23, MM is minutes and SS is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this option is not given, the latest available time in the dataset is used. This option sets both min_time and max_time to this given time value. The argument is useful when performing uncertainty quantification (using argument uncertainty_quant=True) or plotting (using argument plot=True) as these require a single time, rather than a time interval. In contrary, to specify a time interval, use min_time and max_time arguments.

diffusivityfloat, default=20

Diffusivity of the PDE solver (real number). Large number leads to diffusion dominant solution. Small numbers leads to advection dominant solution.

sweepbool, default=False

Sweeps the image data in all flipped directions. This ensures an even solution independent of direction.

detect_landint, default=2

Detect land and exclude it from ocean’s missing data points. This option should be a boolean or an integer with the following values:

  • False: Same as 0. See below.

  • True: Same as 2. See below.

  • 0: Does not detect land from ocean. All land points are assumed to be a part of ocean’s missing points.

  • 1: Detect land. Most accurate, slowest.

  • 2: Detect land. Less accurate, fastest (preferred method).

  • 3: Detect land. Currently this option is not fully implemented.

fill_coastbool, default=False

Fills the gap the between the data in the ocean and between ocean and the coastline. This option is only effective if detect_land is not set to 0.

convex_hullbool, default=False

Instead of using the concave hull (alpha shape) around the data points, this options uses convex hull of the area around the data points.

alphafloat, default=20

The alpha number for alpha shape. If not specified or a negative number, this value is computed automatically. This option is only relevant to concave shapes. This option is ignored if convex shape is used with convex_hull=True.

refine_gridint, default=1

Refines the grid by increasing the number of points on each axis by a multiple of a given integer. If this option is set to 1, no refinement is performed. If set to integer n, the number of grid points is refined by \(n^2\) times (that is, \(n\) times on each axis).

uncertainty_quantbool, default=False

Performs uncertainty quantification on the data for the time frame given by time option.

num_samplesint, default=1000

Number of ensemble members used for uncertainty quantification. This option is relevant if uncertainty_quant is set to True.

ratio_num_modesint, default=1.0

Ratio of the number of KL eigen-modes to be used in the truncation of the KL expansion. The ratio is defined by the number of modes to be used over the total number of modes. The ratio is a number between 0 and 1. For instance, if set to 1, all modes are used, hence the KL expansion is not truncated. If set to 0.5, half of the number of modes are used. This option is relevant if uncertainty_quant is set to True.

kernel_widthint, default=5

Window of the kernel to estimate covariance of data. The window width should be given by the integer number of pixels (data points). The non-zero extent of the kernel a square area with twice the window length in both longitude and latitude directions. This option is relevant if uncertainty_quant is set to True.

scale_errorfloat, default=0.08

Scale velocity error of the input data by a factor. Often, the input velocity error is the dimensionless GDOP which needs to be scaled by the standard deviation of the velocity error to represent the actual velocity error. This value scales the error. This option is relevant if uncertainty_quant is set to True.

write_samplesbool, default=False,

If True, all generated ensemble will be written to the output file. This option is relevant if uncertainty_quant is set to True.

plotbool, default=False

Plots the results. In this case, instead of iterating through all time frames, only one time frame (given with option timeframe) is restored and plotted. If in addition, the uncertainty quantification is enabled (with option uncertainty_quant=True), the statistical analysis for the given time frame is also plotted.

savebool, default=True

If True, the plots are not displayed, rather are saved in the current directory as .pdf and .svg format. This option is useful when executing this script in an environment without display (such as remote cluster). If False, the generated plots will be displayed.

verbosebool, default=False

If True, prints verbose information during the computation process.

terminate ; bool, default=False

If True, on encountering errors, the program both raises error and exists with code 1 with printing the message starting with the keyword ERROR. This is useful when this package is executed on a server to pass exit signals to a Node application. On the downside, this option causes an interactive python environment to both terminate the script and the python environment itself. To avoid this, set this option to False. In this case, upon an error, the ValueError is raised, which cases the script to terminate, however, an interactive python environment will not be exited.

See also

restoreio.scan

Notes

Output File:

The output is a NetCDF file in .nc format containing a selection of the following variables, contingent on the chosen configuration:

  1. Mask

  2. Reconstructed East and North Velocities

  3. East and North Velocity Errors

  4. East and North Velocity Ensemble

1. Mask:

The mask variable is a three-dimensional array with dimensions for time, longitude, and latitude. This variable is stored under the name mask in the output file.

Interpreting Mask Variable over Segmented Domains:

The mask variable includes information about the result of domain segmentation. This array contains integer values -1, 0, 1, and 2 that are interpreted as follows:

  • The value -1 indicates the location is identified to be on the land domain \(\Omega_l\). In these locations, the output velocity variable is masked.

  • The value 0 indicates the location is identified to be on the known domain \(\Omega_k\). These locations have velocity data in the input file. The same velocity values are preserved in the output file.

  • The value 1 indicates the location is identified to be on the missing domain \(\Omega_m\). These locations do not have a velocity data in the input file, but they do have a reconstructed velocity data on the output file.

  • The value 2 indicates the location is identified to be on the ocean domain \(\Omega_o\). In these locations, the output velocity variable is masked.

2. Reconstructed East and North Velocities:

The reconstructed east and north velocity variables are stored in the output file under the names east_vel and north_vel, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.

Interpreting Velocity Variables over Segmented Domains:

The velocity variables on each of the segmented domains are defined as follows:

  • On locations where the mask value is -1 or 2, the output velocity variables are masked.

  • On locations where the mask value is 0, the output velocity variables have the same values as the corresponding variables in the input file.

  • On locations where the mask value is 1, the output velocity variables are reconstructed. If the uncertainty_quant is enabled, these output velocity variables are obtained by the mean of the velocity ensemble, where the missing domain of each ensemble is reconstructed.

3. East and North Velocity Errors:

If the uncertainty_quant option is enabled, the east and north velocity error variables will be included in the output file under the names east_err and north_err, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.

Interpreting Velocity Error Variable over Segmented Domains:

The velocity error variables on each of the segmented domains are defined as follows:

  • On locations where the mask value is -1 or 2, the output velocity error variables are masked.

  • On locations where the mask value is 0, the output velocity error variables are obtained from either the corresponding velocity error or GDOP variables in the input file scaled by the value of scale_error.

  • On locations where the mask value is 1, the output velocity error variables are obtained from the standard deviation of the ensemble, where the missing domain of each ensemble is reconstructed.

4. East and North Velocity Ensemble:

When you activate the uncertainty_quant option, a collection of velocity field ensemble is created. Yet, by default, the output file only contains the mean and standard deviation of these ensemble. To incorporate all ensemble into the output file, you should additionally enable the write_samples option. This action saves the east and north velocity ensemble variables in the output file as east_vel_ensemble and north_vel_ensemble, respectively. These variables are four-dimensional arrays with dimensions for ensemble, time, longitude, and latitude.

The ensemble dimension of the array has the size \(s+1\) where \(s\) is the number of ensemble specified by num_samples argument.. The first ensemble with the index \(0\) corresponds to the original input dataset. The other ensemble with the indices \(1, \dots, s\) correspond to the generated ensemble.

Interpreting Velocity Ensemble Variables over Segmented Domains:

The velocity ensemble variables on each of the segmented domains are defined similar to those presented for velocity velocities. In particular, the missing domain of each ensemble is reconstructed independently.

Mean and Standard Deviation of Ensemble:

Note that the mean and standard deviation of the velocity ensemble arrays over the ensemble dimension yield the velocity and velocity error variables, respectively.

Examples

Restoring Data:

The following code is a minimalistic example of restoring the missing data of an HF radar dataset:

>>> # Import package
>>> from restoreio import restore

>>> # OpenDap URL of HF radar data, south side of Martha's Vineyard
>>> input = 'https://transport.me.berkeley.edu/thredds/dodsC/' + \
...         'root/WHOI-HFR/WHOI_HFR_2014_original.nc'

>>> # Subsetting time
>>> min_time = '2014-07-01T20:00:00'
>>> max_time = '2014-07-03T20:00:00'

>>> # Specify output
>>> output = 'output.nc'

>>> # Restore missing velocity data
>>> restore(input, output=output, min_time=min_time, max_time=max_time,
...         detect_land=True, fill_coast=False, convex_hull=False,
...         alpha=20, plot=False, verbose=True)

Ensemble Generation:

The following code is an example of generating ensemble for an HF radar dataset:

>>> # Import package
>>> from restoreio import restore

>>> # OpenDap URL of HF radar data, US west coast
>>> url = 'http://hfrnet-tds.ucsd.edu/thredds/dodsC/HFR/USWC/2km/' + \
...       'hourly/RTV/HFRADAR_US_West_Coast_2km_Resolution_Hou' + \
...       'rly_RTV_best.ncd'

>>> # Subsetting spatial domain to the Monterey Bay region, California
>>> min_lon = -122.344
>>> max_lon = -121.781
>>> min_lat = 36.507
>>> max_lat = 36.992

>>> # Time subsetting
>>> time_point = '2017-01-25T03:00:00'

>>> # Generate ensemble and reconstruct gaps
>>> restore(input=url, output='output.nc', min_lon=min_lon,
...         max_lon=max_lon, min_lat=min_lat, max_lat=max_lat,
...         time=time_point, uncertainty_quant=True, plot=False,
...         num_samples=2000, ratio_num_modes=1, kernel_width=5,
...         scale_error=0.08, detect_land=True, fill_coast=True,
...         write_samples=True, verbose=True)