restore (Command Line Interface)#

Restore incomplete oceanographic dataset. “restore” is provided by “restoreio” python package.

usage: restore [-h] -i INPUT -o OUTPUT [--min-lon MIN_LON] [--max-lon MAX_LON]
               [--min-lat MIN_LAT] [--max-lat MAX_LAT] [--min-time MIN_TIME]
               [--max-time MAX_TIME] [-t TIME] [-d DIFFUSIVITY] [-s] [-p] [-S]
               [-L DETECT_LAND] [-l] [-c] [-a ALPHA] [-r REFINE] [-u]
               [-e NUM_ENSEMBLE] [-m NUM_MODES] [-w WINDOW] [-E SCALE_ERROR]
               [-I START_FILE] [-J END_FILE] [-W] [-v] [-T] [-V]

required arguments#

-i

Input filename. This can be either the path to a local file or the URL to a remote dataset. The file or URL may or may not have a file extension. However, if the file does have an extension, the file extension should be either .nc, .nc4, .ncd, .nc.gz, .ncml, or *.ncml.gz only.

-o

Output filename. This can be either the path to a local file or the URL to a remote dataset. The file extension should be .nc or .ncml only. If no output file is provided, the output filename is constructed by adding the word _restored at the end of the input filename.

Default: “”

Named Arguments#

--min-lon

Minimum longitude in the unit of degrees to subset the processing domain. If not provided or set to nan, the minimum longitude of the input data is considered.

Default: nan

--max-lon

Maximum longitude in the unit of degrees to subset the processing domain. If not provided or set to nan, the maximum longitude of the input data is considered.

Default: nan

--min-lat

Minimum latitude in the unit of degrees to subset the processing domain. If not provided or set to nan, the minimum latitude of the input data is considered.

Default: nan

--max-lat

Maximum latitude in the unit of degrees to subset the processing domain. If not provided or set to nan, the maximum latitude of the input data is considered.

Default: nan

--min-time

The start of the time interval within the dataset times to be processed. The time should be provided as a string with the format yyyy-mm-ddTHH:MM:SS where yyyy is year, mm is month, dd is day, HH is hour from 00 to 23, MM is minutes and SS is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this argument is not given, the earliest available time in the dataset is used. Note that specifying a time interval cannot be used together with uncertainty quantification (using option ‘-u’). For this case, use ‘–time’ argument instead which specifies a single time point. (default: )

Default: “”

--max-time

The end of the time interval within the dataset times to be processed. The time should be provided as a string with the format yyyy-mm-ddTHH:MM:SS where yyyy is year, mm is month, dd is day, HH is hour from 00 to 23, MM is minutes and SS is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this argument is not given, the latest available time in the dataset is used. Note that specifying a time interval cannot be used together with uncertainty quantification (using option -u). For this case, use --time argument instead which specifies a single time point. (default: )

Default: “”

-t, --time

Specify a single time point to process. The time should be provided as a string with the format yyyy-mm-ddTHH:MM:SS where yyyy is year, mm is month, dd is day, HH is hour from 00 to 23, MM is minutes and SS is seconds. If the given time does not exactly match any time in the dataset, the closest data time is used. If this option is not given, the latest available time in the dataset is used. This option sets both --min-time and --max-time to this given time value. The argument is useful when performing uncertainty quantification (using argument -u) or plotting (using argument -p) as these require a single time, rather than a time interval. In contrary, to specify a time interval, use --min-time and --max-time arguments. (default: )

Default: “”

-d

Diffusivity of the PDE solver (real number). Large number leads to diffusion dominant solution. Small numbers leads to advection dominant solution. (default: 20)

Default: 20

-s

Sweeps the image data in all flipped directions. This ensures an even solution independent of direction.

Default: False

-p

Plots the results. In this case, instead of iterating through all time frames, only one time frame (given with option --time) is restored and plotted. If in addition, the uncertainty quantification is enabled (with option -u), the statistical analysis for the given time frame is also plotted.

Default: False

-S

If True, the plots are not displayed, rather are saved in the current directory as .pdf and .svg format. This option is useful when executing this script in an environment without display (such as remote cluster). If False, the generated plots will be displayed.

Default: False

-L

Possible choices: 0, 1, 2, 3, False, True

Detect land and exclude it from ocean’s missing data points. This option should be an boolean or an integer with the following values (default: True):

  • False: Same as 0. See below.

  • True: Same as 2. See below.

  • 0: Does not detect land from ocean. All land points are assumed to be

    a part of ocean’s missing points.

  • 1: Detect land. Most accurate, slowest.

  • 2: Detect land. Less accurate, fastest (preferred method).

  • 3: Detect land. Currently this option is not fully implemented.

Default: True

-l

Fills the gap the between the data in the ocean and between ocean and the coastline. This option is only effective if L is not set to 0.

Default: False

-c

Instead of using the concave hull (alpha shape) around the data points, this options uses convex hull of the area around the data points.

Default: False

-a

The alpha number for alpha shape. If not specified or a negative number, this value is computed automatically. This option is only relevant to concave shapes. This option is ignored if convex shape is used with -c option.

Default: -1

-r

Refines the grid by increasing the number of points on each axis by a multiple of a given integer. If this option is set to 1, no refinement is performed. If set to integer n, the number of grid points is refined by n^2 times (that is n times on each axis). (default: 1)

Default: 1

-u

Enables uncertainty quantification for the time frame given in option --time. This either produces results in output file as given in option -o, or plots the results if the option -p is specified.

Default: False

-e

Number of ensemble members used for uncertainty quantification. This option is only relevant to uncertainty quantification (when -u is used). (default: 1000)

Default: 1000

-m

Ratio of the number of KL eigen-modes to be used in the truncation of the KL expansion. The ratio is defined by the number of modes to be used over the total number of modes. The ratio is a number between 0 and 1. For instance, if set to 1, all modes are used, hence the KL expansion is not truncated. If set to 0.5, half of the number of modes are used. This option is only relevant to uncertainty quantification (when -u is used).

Default: 1.0

-w

Window of the kernel to estimate covariance of data. The window width should be given by the integer number of pixels (data points). The non-zero extent of the kernel a square area with twice the window length in both longitude and latitude directions. This option is only relevant to uncertainty quantification (when -u is used). (default: 5)

Default: 5

-E

Scale velocity error of the input data by a factor. Often, the input velocity error is the dimensionless GDOP which needs to be scaled by the standard deviation of the velocity error to represent the actual velocity error. This value scales the error. This option is only relevant to uncertainty quantification (when -u is used). (default: 0.08)

Default: 0.08

-I

Start file iterator to be used for processing multiple input files. For Instance, -i input -I 003 -J 012 means to read the series of input files with iterators input003.nc, input004.nc, to input012.nc. If this option is used, the option -J should also be given.

Default: “”

-J

Start file iterator to be used for processing multiple input files. For Instance, -i input -I 003 -J 012 means to read the series of input files with iterators input003.nc, input004.nc, to input012.nc. If this option is used, the option -I should also be given.

Default: “”

-W

If True, all generated ensemble will be written to the output file. This option is relevant to uncertainty quantification (when -u is used).

Default: False

-v

Prints verbose information.

Default: False

-T

If True, on encountering errors, the program both raises error and exists with code 1 with printing the message starting with the keyword ERROR. This is useful when this package is executed on a server to pass exit signals to a Node application. On the downside, this option causes an interactive python environment to both terminate the script and the python environment itself. To avoid this, set this option to False. In this case, upon an error, the ValueError is raised, which cases the script to terminate, however, an interactive python environment will not be exited.

Default: False

Examples:

  1. This example reads a local file and restores all time frames in it. A convex hull around data points as the working domain.

    $ %s -i input.ncml -o output.nc -d 20 -s -c

  2. In the next example, only the time frame 20 is processed and the results will be also plotted.

    $ %s -i input.ncml -o output.nc -d 20 -s -c -t 20 -p

  3. In the previous examples, the restoration domain of was inside the convex hull around the known data points. In the following example, a concave hull is used instead. The shape of the convex hull is controlled by the parameter alpha (given by the option -a). Here, we no longer specify the option (-c) which was for the convex hull.

    $ %s -i input.ncml -o output.nc -d 20 -s -t 20 -p -a 10

  4. In the following example, we exclude the land from the ocean. That is, if a part of the concave hull (domain of restoration) intersects land, we exclude it. This is done by -L option followed by an integer. Here, -L 2 is the fastest method to detect land (but also the least accurate).

    $ %s -i input.ncml -o output.nc -d 20 -s -t 20 -p -a 10 -L 2

  5. There might be a gap area between the domain of restoration and the land area. By providing -l, this area can be filled.

    $ %s -i input.ncml -o output.nc -d 20 -s -t 20 -p -a 10 -L 2 -l

  6. The following example performs uncertainty quantification with 2000 ensembles for restoring the time frame 20

    $ %s -i input.ncml -o output.nc -d 20 -s -t 20 -p -a 10 -L 2 -l -u -e 2000

  7. The input data might be stored in a series of files. In the following example, among the files: input001.nc, input002.nc, …, input012.nc, we want to read the third to tenth file, in each, read the time frame 2-, then store a single output file as ‘output.zip’. To do so, specify one of the files names (such as the first file) by the -i option, and specify the start and end iterators of the file names with -m and -n, such as by

    $ %s -i input001.nc -o output.zip -d 20 -s -L 1 -l -a 10 -t 20 -u -e 2000 -m 003 -n 010