restoreio.restore#
- restoreio.restore(input, min_file_index='', max_file_index='', output='', min_lon=nan, max_lon=nan, min_lat=nan, max_lat=nan, min_time='', max_time='', time='', diffusivity=20, sweep=False, detect_land=True, fill_coast=False, convex_hull=False, alpha=20, refine_grid=1, uncertainty_quant=False, num_samples=1000, ratio_num_modes=1, kernel_width=5, scale_error=0.08, write_samples=False, plot=False, save=True, verbose=False, terminate=False)#
Restore incomplete oceanographic dataset and generate data ensemble.
- Parameters:
- inputstr
Input filename. This can be either the path to a local file or the url to a remote dataset. The file extension should be
.nc
or.ncml
.- min_file_indexstr, default=’’
Start file iterator to be used for processing multiple input files. For Instance, setting
input=input
,min_file_index=003
, andmax_file_index=012
means to read the series of input files with iteratorsinput003.nc
,input004.nc
, toinput012.nc
. If this option is used, the optionmax_file_index
should also be given.- max_file_indexstr, default=’’
Start file iterator to be used for processing multiple input files. For Instance, setting
input=input
,min_file_index=003
, andmax_file_index=012
means to read the series of input files with iteratorsinput003.nc
,input004.nc
, toinput012.nc
. If this option is used, the optionmin_file_index
should also be given.- outputstr
Output filename. This can be either the path to a local file or the url to a remote dataset. The file extension should be
.nc
or.ncml
only. If no output file is provided, the output filename is constructed by adding the word_restored
at the end of the input filename.- min_lonfloat, default=float(‘nan’)
Minimum longitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the minimum longitude of the input data is considered.
- max_lonfloat, default=float(‘nan’)
Maximum longitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the maximum longitude of the input data is considered.
- min_latfloat, default=float(‘nan’)
Minimum latitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the minimum latitude of the input data is considered.
- max_latfloat, default: float(‘nan’)
Maximum latitude in the unit of degrees to subset the processing domain. If not provided or set to float(‘nan’), the maximum latitude of the input data is considered.
- min_timestr, default=’’
The start of the time interval within the dataset times to be processed. The time should be provided as a string with the format
'yyyy-mm-ddTHH:MM:SS'
whereyyyy
is year,mm
is month,dd
is day,HH
is hour from 00 to 23,MM
is minutes andSS
is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this argument is not given, the earliest available time in the dataset is used. Note that specifying a time interval cannot be used together with uncertainty quantification (using argumentuncertainty_quant=True
). For this case, usetime
argument instead which specifies a single time point.- max_timestr, default=’’
The end of the time interval within the dataset times to be processed. The time should be provided as a string with the format
'yyyy-mm-ddTHH:MM:SS'
whereyyyy
is year,mm
is month,dd
is day,HH
is hour from 00 to 23,MM
is minutes andSS
is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this argument is not given, the latest available time in the dataset is used. Note that specifying a time interval cannot be used together with uncertainty quantification (using argumentuncertainty_quant=True
). For this case, usetime
argument instead which specifies a single time point.- timestr, default=’’
Specify a single time point to process. The time should be provided as a string with the format
'yyyy-mm-ddTHH:MM:SS'
whereyyyy
is year,mm
is month,dd
is day,HH
is hour from 00 to 23,MM
is minutes andSS
is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this option is not given, the latest available time in the dataset is used. This option sets bothmin_time
andmax_time
to this given time value. The argument is useful when performing uncertainty quantification (using argumentuncertainty_quant=True
) or plotting (using argumentplot=True
) as these require a single time, rather than a time interval. In contrary, to specify a time interval, usemin_time
andmax_time
arguments.- diffusivityfloat, default=20
Diffusivity of the PDE solver (real number). Large number leads to diffusion dominant solution. Small numbers leads to advection dominant solution.
- sweepbool, default=False
Sweeps the image data in all flipped directions. This ensures an even solution independent of direction.
- detect_landint, default=2
Detect land and exclude it from ocean’s missing data points. This option should be a boolean or an integer with the following values:
False
: Same as0
. See below.True
: Same as2
. See below.0
: Does not detect land from ocean. All land points are assumed to be a part of ocean’s missing points.1
: Detect land. Most accurate, slowest.2
: Detect land. Less accurate, fastest (preferred method).3
: Detect land. Currently this option is not fully implemented.
- fill_coastbool, default=False
Fills the gap the between the data in the ocean and between ocean and the coastline. This option is only effective if
detect_land
is not set to0
.- convex_hullbool, default=False
Instead of using the concave hull (alpha shape) around the data points, this options uses convex hull of the area around the data points.
- alphafloat, default=20
The alpha number for alpha shape. If not specified or a negative number, this value is computed automatically. This option is only relevant to concave shapes. This option is ignored if convex shape is used with
convex_hull=True
.- refine_gridint, default=1
Refines the grid by increasing the number of points on each axis by a multiple of a given integer. If this option is set to 1, no refinement is performed. If set to integer n, the number of grid points is refined by \(n^2\) times (that is, \(n\) times on each axis).
- uncertainty_quantbool, default=False
Performs uncertainty quantification on the data for the time frame given by
time
option.- num_samplesint, default=1000
Number of ensemble members used for uncertainty quantification. This option is relevant if
uncertainty_quant
is set to True.- ratio_num_modesint, default=1.0
Ratio of the number of KL eigen-modes to be used in the truncation of the KL expansion. The ratio is defined by the number of modes to be used over the total number of modes. The ratio is a number between 0 and 1. For instance, if set to 1, all modes are used, hence the KL expansion is not truncated. If set to 0.5, half of the number of modes are used. This option is relevant if
uncertainty_quant
is set to True.- kernel_widthint, default=5
Window of the kernel to estimate covariance of data. The window width should be given by the integer number of pixels (data points). The non-zero extent of the kernel a square area with twice the window length in both longitude and latitude directions. This option is relevant if
uncertainty_quant
is set to True.- scale_errorfloat, default=0.08
Scale velocity error of the input data by a factor. Often, the input velocity error is the dimensionless GDOP which needs to be scaled by the standard deviation of the velocity error to represent the actual velocity error. This value scales the error. This option is relevant if
uncertainty_quant
is set to True.- write_samplesbool, default=False,
If True, all generated ensemble will be written to the output file. This option is relevant if
uncertainty_quant
is set to True.- plotbool, default=False
Plots the results. In this case, instead of iterating through all time frames, only one time frame (given with option
timeframe
) is restored and plotted. If in addition, the uncertainty quantification is enabled (with optionuncertainty_quant=True
), the statistical analysis for the given time frame is also plotted.- savebool, default=True
If True, the plots are not displayed, rather are saved in the current directory as
.pdf
and.svg
format. This option is useful when executing this script in an environment without display (such as remote cluster). If False, the generated plots will be displayed.- verbosebool, default=False
If True, prints verbose information during the computation process.
- terminate ; bool, default=False
If True, on encountering errors, the program both raises error and exists with code 1 with printing the message starting with the keyword
ERROR
. This is useful when this package is executed on a server to pass exit signals to a Node application. On the downside, this option causes an interactive python environment to both terminate the script and the python environment itself. To avoid this, set this option to False. In this case, upon an error, theValueError
is raised, which cases the script to terminate, however, an interactive python environment will not be exited.
See also
Notes
Output File:
The output is a NetCDF file in
.nc
format containing a selection of the following variables, contingent on the chosen configuration:Mask
Reconstructed East and North Velocities
East and North Velocity Errors
East and North Velocity Ensemble
1. Mask:
The mask variable is a three-dimensional array with dimensions for time, longitude, and latitude. This variable is stored under the name
mask
in the output file.Interpreting Mask Variable over Segmented Domains:
The mask variable includes information about the result of domain segmentation. This array contains integer values
-1
,0
,1
, and2
that are interpreted as follows:The value
-1
indicates the location is identified to be on the land domain \(\Omega_l\). In these locations, the output velocity variable is masked.The value
0
indicates the location is identified to be on the known domain \(\Omega_k\). These locations have velocity data in the input file. The same velocity values are preserved in the output file.The value
1
indicates the location is identified to be on the missing domain \(\Omega_m\). These locations do not have a velocity data in the input file, but they do have a reconstructed velocity data on the output file.The value
2
indicates the location is identified to be on the ocean domain \(\Omega_o\). In these locations, the output velocity variable is masked.
2. Reconstructed East and North Velocities:
The reconstructed east and north velocity variables are stored in the output file under the names
east_vel
andnorth_vel
, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.Interpreting Velocity Variables over Segmented Domains:
The velocity variables on each of the segmented domains are defined as follows:
On locations where the
mask
value is-1
or2
, the output velocity variables are masked.On locations where the
mask
value is0
, the output velocity variables have the same values as the corresponding variables in the input file.On locations where the
mask
value is1
, the output velocity variables are reconstructed. If theuncertainty_quant
is enabled, these output velocity variables are obtained by the mean of the velocity ensemble, where the missing domain of each ensemble is reconstructed.
3. East and North Velocity Errors:
If the
uncertainty_quant
option is enabled, the east and north velocity error variables will be included in the output file under the nameseast_err
andnorth_err
, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.Interpreting Velocity Error Variable over Segmented Domains:
The velocity error variables on each of the segmented domains are defined as follows:
On locations where the
mask
value is-1
or2
, the output velocity error variables are masked.On locations where the
mask
value is0
, the output velocity error variables are obtained from either the corresponding velocity error or GDOP variables in the input file scaled by the value ofscale_error
.On locations where the
mask
value is1
, the output velocity error variables are obtained from the standard deviation of the ensemble, where the missing domain of each ensemble is reconstructed.
4. East and North Velocity Ensemble:
When you activate the
uncertainty_quant
option, a collection of velocity field ensemble is created. Yet, by default, the output file only contains the mean and standard deviation of these ensemble. To incorporate all ensemble into the output file, you should additionally enable thewrite_samples
option. This action saves the east and north velocity ensemble variables in the output file aseast_vel_ensemble
andnorth_vel_ensemble
, respectively. These variables are four-dimensional arrays with dimensions for ensemble, time, longitude, and latitude.The ensemble dimension of the array has the size \(s+1\) where \(s\) is the number of ensemble specified by
num_samples
argument.. The first ensemble with the index \(0\) corresponds to the original input dataset. The other ensemble with the indices \(1, \dots, s\) correspond to the generated ensemble.Interpreting Velocity Ensemble Variables over Segmented Domains:
The velocity ensemble variables on each of the segmented domains are defined similar to those presented for velocity velocities. In particular, the missing domain of each ensemble is reconstructed independently.
Mean and Standard Deviation of Ensemble:
Note that the mean and standard deviation of the velocity ensemble arrays over the ensemble dimension yield the velocity and velocity error variables, respectively.
Examples
Restoring Data:
The following code is a minimalistic example of restoring the missing data of an HF radar dataset:
>>> # Import package >>> from restoreio import restore >>> # OpenDap URL of HF radar data, south side of Martha's Vineyard >>> input = 'https://transport.me.berkeley.edu/thredds/dodsC/' + \ ... 'root/WHOI-HFR/WHOI_HFR_2014_original.nc' >>> # Subsetting time >>> min_time = '2014-07-01T20:00:00' >>> max_time = '2014-07-03T20:00:00' >>> # Specify output >>> output = 'output.nc' >>> # Restore missing velocity data >>> restore(input, output=output, min_time=min_time, max_time=max_time, ... detect_land=True, fill_coast=False, convex_hull=False, ... alpha=20, plot=False, verbose=True)
Ensemble Generation:
The following code is an example of generating ensemble for an HF radar dataset:
>>> # Import package >>> from restoreio import restore >>> # OpenDap URL of HF radar data, US west coast >>> url = 'http://hfrnet-tds.ucsd.edu/thredds/dodsC/HFR/USWC/2km/' + \ ... 'hourly/RTV/HFRADAR_US_West_Coast_2km_Resolution_Hou' + \ ... 'rly_RTV_best.ncd' >>> # Subsetting spatial domain to the Monterey Bay region, California >>> min_lon = -122.344 >>> max_lon = -121.781 >>> min_lat = 36.507 >>> max_lat = 36.992 >>> # Time subsetting >>> time_point = '2017-01-25T03:00:00' >>> # Generate ensemble and reconstruct gaps >>> restore(input=url, output='output.nc', min_lon=min_lon, ... max_lon=max_lon, min_lat=min_lat, max_lat=max_lat, ... time=time_point, uncertainty_quant=True, plot=False, ... num_samples=2000, ratio_num_modes=1, kernel_width=5, ... scale_error=0.08, detect_land=True, fill_coast=True, ... write_samples=True, verbose=True)