Data preparation¶

Note

This section is under constant development. Its content will change and improve over time.

Normalization¶

import matplotlib.pyplot as plt
import numpy as np

Let us generate abstract data in form of array of points (pairs of floats) bounded by a rectangle with sides of length 3 and 2 units:

W = 3
H = 2
np.random.seed(1)
data = np.random.rand(200, 2) * np.array([W, H])

plt.plot(*data.transpose(), '.')
plt.gca().set_aspect('equal')
plt.show()

_images/tutorial_data_preparation_6_0.png

In order to apply basic sums we first have to derive the complex number representation of data:

from basicsums.preparation import real_array_to_complex

data_c = real_array_to_complex(data)
data_c[:10]

array([1.25106601e+00+1.44064899j, 3.43124452e-04+0.60466515j,
       4.40267672e-01+0.18467719j, 5.58780634e-01+0.69112145j,
       1.19030242e+00+1.07763347j, 1.25758354e+00+1.370439  j,
       6.13356749e-01+1.75623487j, 8.21627796e-02+1.34093502j,
       1.25191441e+00+1.11737966j, 4.21160816e-01+0.39620298j])

Note that structural sums operate on a subset of complex plane being a unit cell (i.e. a parallelogram with the area equal to unity). Hence, we should scale bounding box and the data. Morover, scaled bounding box should be shifted to the origin of the coordinate axes:

from basicsums.preparation import normalize_data

w1, w2, A = normalize_data(data_c, W, H)

One can check that the function returned periods of unit cell proportional to the original lengths:

w1 * w2.imag, w1/W, w2.imag/H

(1.0000000000000002, 0.4082482904638631, 0.4082482904638631)

from basicsums.show_disks import show_disks

show_disks(A, 0.01*np.ones(len(A)), w1, w2)

_images/tutorial_data_preparation_16_0.png

On the other hand, function normalize_cell_periods which returns scaled periods and the scale factor that can be used to scale the data. Note that normalize_cell_periods requires complex periods:

from basicsums.preparation import normalize_cell_periods

w1x, w2x, a = normalize_cell_periods(W, H*1j)

w1x==w1, w2x==w2, a

(True, True, 0.4082482904638631)

Note also that if the lengths of the sides of the bounding box are not provided, the normalize_data function figures it out based on extremal values of real and imaginary parts of data points. Hence, the automatic scale coefficient and the output period may differ from the above ones. This behaviour is usefull, when we are provided with coordinates only, without any information about the original bounding box.

w1x, w2x, Ax = normalize_data(data_c)

w1, w1x, w1x*w2x.imag

(1.2247448713915892, 1.2303484201624637, 1.0)

show_disks(Ax, 0.01*np.ones(len(Ax)), w1x, w2x)

_images/tutorial_data_preparation_24_0.png

Regularization¶

[TO DO]

Poisson¶

Assume that we have completelly random pattern.

from basicsums.basic_sums import Cell, BasicSums

from basicsums.show_disks import show_disks

w1, w2 = 1, 1j

qmax = 9
cell = Cell(w1, w2, qmax)
sums_indexes = [(2,)] + [(p, p) for p in range(2, qmax+1)]

N = 256
np.random.seed(10)
patterns = [np.random.rand(N, 2).dot([1, 1j]) - (0.5 + 0.5j)
            for i in range(50)]

show_disks(patterns[0], 0.005*np.ones(N), w1, w2)

_images/tutorial_data_preparation_34_0.png

res_poisson = [BasicSums(A, cell).esums(sums_indexes)
           for A in patterns]

res_poisson = np.array(res_poisson)

res_poisson.mean(axis=0)

array([-6.73534493e+00-6.85838505e-01j,  1.12429746e+06+7.21233118e-11j,
       -8.09119105e+10+8.24778547e-06j,  6.31110807e+15-2.40904496e-01j,
       -4.99712773e+20+1.17624383e+04j,  3.97136959e+25+4.73732441e+09j,
       -3.15929260e+30-1.71719493e+13j,  2.51395972e+35+1.65042148e+18j,
       -2.00059950e+40-2.67949024e+23j])

res_poisson.std(axis=0)

array([9.39936881e+01, 6.94431976e+06, 5.51104993e+11, 4.38657524e+16,
       3.49121745e+21, 2.77845506e+26, 2.21116775e+31, 1.75969592e+36,
       1.40040254e+41])

import matplotlib.pyplot as plt
k = 1
plt.plot(np.real(res_poisson[:,k]), 'o')
plt.grid(True)
plt.show()

_images/tutorial_data_preparation_39_0.png

fig, ax = plt.subplots(figsize=(10, 20))
for k, tup in enumerate(sums_indexes):
    plt.subplot(5, 2, k+1)
    plt.plot(np.real(res_poisson[:,k]), 'o')
    plt.title('$e_{' + str(tup)[1:-1] + '}$')
    plt.grid(True)
plt.show()

_images/tutorial_data_preparation_40_0.png

import matplotlib.pyplot as plt
k = 1
plt.plot(np.real(res_poisson[:,k]), 'o')
plt.grid(True)
plt.show()

_images/tutorial_data_preparation_41_0.png

import matplotlib.pyplot as plt
k = 8
res_poisson_copy = np.copy(np.real(res_poisson[:,k]))
res_poisson_copy[10] = 0.5*(res_poisson_copy[9]+
                            res_poisson_copy[11])
plt.plot(res_d, 'o')
plt.grid(True)
plt.show()

_images/tutorial_data_preparation_42_0.png

from basicsums.preparation import regularized_radii

radii = [regularized_radii(A, w1, w2) for A in patterns]

show_disks(patterns[0], radii[0], w1, w2)

_images/tutorial_data_preparation_45_0.png

show_disks(patterns[10], radii[10], w1, w2)

_images/tutorial_data_preparation_46_0.png

res_poisson_reg = [BasicSums(A, cell, r).esums(sums_indexes)
           for A, r in zip(patterns, radii)]

res_poisson_reg = np.array(res_poisson_reg)

res_poisson_reg.mean(axis=0)

array([ 3.16933109e+00+1.49275831e-02j,  1.97901593e+01-8.78395566e-17j,
       -2.50309956e+01-1.76022820e-16j,  7.87999363e+01-4.79041937e-16j,
       -2.47817891e+02+1.59840507e-16j,  7.74982714e+02-2.73951440e-15j,
       -2.41347750e+03+1.74931512e-14j,  7.65714672e+03-7.31309807e-15j,
       -2.50475199e+04+2.21614666e-13j])

res_poisson_reg.std(axis=0)

array([2.86191236e-01, 1.59002127e+00, 5.07241307e+00, 1.86463881e+01,
       8.09375185e+01, 2.82116567e+02, 1.09898397e+03, 3.92697716e+03,
       1.44629673e+04])

import matplotlib.pyplot as plt
k = 1
plt.plot(np.real(res_poisson_reg[:,k]), 'o')
plt.grid(True)
plt.show()

_images/tutorial_data_preparation_52_0.png

import matplotlib.pyplot as plt
k = -1
plt.plot(np.real(res_poisson_reg[:,k]), 'o')
plt.grid(True)
plt.show()

_images/tutorial_data_preparation_53_0.png

fig, ax = plt.subplots(figsize=(10, 20))
for k, tup in enumerate(sums_indexes):
    plt.subplot(5, 2, k+1)
    plt.plot(np.real(res_poisson_reg[:,k]), 'o')
    plt.title('$e_{' + str(tup)[1:-1] + '}$')
    plt.grid(True)
plt.show()

_images/tutorial_data_preparation_54_0.png

Basic Sums

Table of Contents

Data preparation¶

Normalization¶

Regularization¶

Poisson¶