glearn.sample_data.generate_points#

glearn.sample_data.generate_points(num_points, dimension=1, grid=False, a=None, b=None, contrast=0.0, seed=0)#

Generate a set of points in the unit hypercube.

  • Points can be generated either randomly or on a lattice.

  • The density of the distribution of the points can be either uniform over all unit hypercube, or can be more concentrated with a higher density in a smaller hypercube region embedded inside the unit hypercube.

Parameters:
num_pointsint

Determines the number of generated points as follows:

  • If grid is False, num_points is the number of random points to be generated in the unit hypercube.

  • If grid is True, num_points is the number of points along each axes of a grid of points. Thus, the total number of points is num_points**dimension.

dimensionint, default=1

The dimension of the space of points.

gridbool, default=True

If True, it generates the set of points on a lattice grid. Otherwise, it randomly generates points inside the unit hypercube.

afloat or array_like, default=None

The coordinate of a corner point of an embedded hypercube inside the unit hypercube. The point a is the closet point of the embedded hypercube to the origin. The coordinates of this point should be between the origin and the point with coordinates (1,1, ..., 1). If None, it is assumed that a is the origin. When dimension is 1, a should be a scalar.

bfloat or array_like, default=None

The coordinate of another corner point of an embedded hypercube inside the unit hypercube. The point b is the furthest point of the embedded hypercube from the origin. The coordinates of this point should be between the point a and the point (1,1, ..., 1). If None, it is assumed that the coordinates of this point is all 1. When dimension is 1`, b should be a scalar.

contrastfloat, default=0.0

The extra concentration of points to be generated inside the embedding hypercube with the corner points a and b. Contrast is the relative difference of the density of the points inside and outside the embedding hypercube region and is between 0 and 1. When set to 0, all points are generated inside the unit hypercube with uniform distribution, hence, there is no difference between the density of the points inside and outside the inner hypercube. In contrary, when contrast is set to 1, all points are generated only inside the embedding hypercube and no point is generated outside of the inner hypercube.

seedint, default=0

Seed number of the random generator, which can be a non-negative integer. If set to None, the result of the random generator is not repeatable.

Returns:
xnumpy.ndarray

A 2D array where each row is the coordinate of a point. The size of the array is (n, m) where m is the dimension of the space, and n is the number of generated points inside a unit hypercube. The number n is determined as follow:

  • If grid is True, then n = num_points**dimension.

  • If grid is False, then n = num_points.

Notes

Grid versus Random Points:

Points are generated in a multi-dimensional space inside the unit hypercube, either randomly (when grid is False) or on a structured grid (if grid is True).

Generate Higher Concentration of Points in a Region:

The points are generated with uniform distribution inside the unit hypercube \([0, 1]^d\). However, it is possible to generate the points with higher concentration in a specific hypercube \(\mathcal{H} \subset [0, 1]^d\), which is embedded inside the unit hypercube. The coordinates of the inner hypercube \(\mathcal{H}\) is determined by its two opposite corner points given by the arguments a and b.

The argument contrast, here denoted by \(\gamma\), specifies the excessive relative density of points in \(\mathcal{H}\) compared to the rest of the unit hypercube. Namely, if \(\rho_{1}\) is the density of points inside \(\mathcal{H}\) and \(\rho_2\) is the density of points outside \(\mathcal{H}\), then

\[\gamma = \frac{\rho_1 - \rho_2}{\rho_2}.\]

If \(\gamma = 0\), there is no difference between the density of the points inside and outside \(\mathcal{H}\), hence all points inside \([0, 1]^d\) are generated with uniform distribution. If in contrary, \(\gamma = 1\), then the density of points outside \(\mathcal{H}\) is zero and no point is generated outside \(\mathcal{H}\). Rather, all points are generated inside this region.

Examples

  • Generate 100 random points in the 1-dimensional interval \([0, 1]\):

    >>> from glearn.sample_data import generate_points
    >>> x = generate_points(100, dimension=1, grid=False)
    >>> x.shape
    (100, 1)
    
  • Generate 100 random points in the 1-dimensional interval \([0, 1]\) where \(70 \%\) more points are inside \([0.2, 0.5]\) and \(30 \%\) of the points are outside of the latter interval:

    >>> from glearn.sample_data import generate_points
    >>> x = generate_points(100, dimension=1, grid=False, a=0.2,
    ...                     b=0.5, contrast=0.7)
    
  • Generate 100 random points on a 2-dimensional square \([0, 1]^2\) where \(70 \%\) more points are inside a rectangle of the points \(a=(0.2, 0.3)\) and \(b=(0.4, 0.5)\)

    >>> from glearn.sample_data import generate_points
    >>> x = generate_points(100, dimension=2, grid=False, a=(0.2, 0.3),
    ...                     b=(0.4, 0.5), contrast=0.7)
    
  • Generate a two-dimensional grid of \(30 \times 30\) points in the square \([0, 1]^2\):

    >>> from glearn.sample_data import generate_points
    >>> x = generate_points(30, dimension=2, grid=True)
    >>> x.shape
    (900, 2)