Computations of basic sums

For theoretical details on basic sums see Theory overview section of the online documentation and papers cited therein.

Every application of basic sums starts with a specification of the considered cell and the maximal order of sums being used (i.e. symbolic precision). Consider the square lattice and the symbolic precision \(q_\max=5\):

import numpy as np
w1, w2  = 1, 1j  # periods of the square lattice
qmax = 5         # symbolic precision

Then we create an instance of the Cell class that computes and stores the functions \(\wp\) and \(\wp'\) for a given periods \(\omega_1\), \(\omega_2\), as well as the corresponding Eisenstein functions \(E_n\) (\(n=2, 3, 4,\ldots,q_\max\)). For more details, refer to formula (5) and section Weierstrass \wp and Eisenstein functions.

from basicsums.basic_sums import Cell
cell = Cell(w1, w2, qmax)

In applications, it is common to work with sets of disks of identical or different radii. Let us investigate both cases seperately (an aproach to the analysis of points distributions is described here).

Basic sums for identical disks

Let us import a sample configuration of identical disks:

from basicsums.examples import load_example01
A, r = load_example01()

where A is a NumPy array of complex numbers specifying the disk centres and r is a NumPy array of floats corresponding to the disk radii. One can apply show_disks() function to visualize the sample:

from basicsums.show_disks import show_disks
show_disks(A, r, w1, w2)
show_disks(A, r, w1, w2, neighbs=False)
_images/tutorial_basic_sums_12_0.png _images/tutorial_basic_sums_12_1.png

Then, for this specific configuration of disks, we create an instance of the BasicSums class, which computes all required matrices related to the functions \(E(n)\) (\(n=2, 3,4,\ldots,q_\max\)) and stores as attributes in order to reuse them in computations of basic sums (for more details on the underlying algorithms, see [1]).

from basicsums.basic_sums import BasicSums
bso = BasicSums(A, cell)

The bso object provides two main methods: esum() and esums(). The former computes the value of a single basic sum. Let us compute \(e_2\), \(e_{2, 5, 5}\), \(e_{3, 3, 2}\) and \(e_{4, 4}\) seperately:

bso.esum(2), bso.esum(2, 2, 5), bso.esum(3, 3, 2), bso.esum(4, 4)
((3.15438582262448-0.09180159581592136j),
 (-0.024658672489985198+1.2585946256954157j),
 (-10.80733726705118+1.833254246831019j),
 (7.9877213504124365-2.86102294921875e-16j))

The latter, esums() method, calculates values of basic sums given by a list of tuples representing multi-indexes. The method returns a numpy array of the corresponding results:

sums = [(2,), (2, 2, 5), (3, 3, 2), (4, 4)]
bso.esums(sums)
array([  3.15438582-9.18015958e-02j,  -0.02465867+1.25859463e+00j,
       -10.80733727+1.83325425e+00j,   7.98772135-2.86102295e-16j])

We will describe the difference between the two approaches a bit later.

An instance of the BasicSums class automaticaly computes matrices of values for Eisentstein functions \(E(n)\), \(n=2, 3,4,\ldots,q_\max\) at the time of creation. On the other hand, one can optimize this process by passing the list of only those indexes of functions \(E_n(z)\) which will be used in computation:

bso = BasicSums(A, cell, eisenstein_indexes=[2, 5])
bso.esum(2, 5, 5)
(-54.5563109397971+1.4082941022526196j)

One can check which Eisenstein functions are used by a given object:

bso.eis.keys()
dict_keys([2, 5])

Disks of different radii

In case of disks of different radii, (i.e. polydispersed disks case), corresponding BasicSums object needs some more information about the system in the form of the normalising factors \(\nu_j, j = 1, 2, 3, \ldots, N\) (see (1)). These constants are automaticaly derived by the object based on the radii of disks. Let us import a sample with normally distributed length of radii:

from basicsums.examples import load_example02
A, r = load_example02()
show_disks(A, r, w1, w2)
_images/tutorial_basic_sums_29_0.png

Let us create the an instance of the BasicSums class:

bso = BasicSums(A, cell, r)
bso.esum(2), bso.esum(2, 3, 3)
((3.1318248855050785-0.06693636374418609j),
 (-10.346554736396879+0.9422742346896005j))

Computions of sequences of basic sums

Assume that we want to compute all basic sums from a given set, say defined by \(G'_q\) (see this section for more details). In such a case, one can use multi_indexes module implemented in the package, to get the list of required multi-indexes. Assume that we use \(G'_6\) in the following considerations:

from basicsums.multi_indexes import sums_in_Gq_prime
qmax = 5
sums = sums_in_Gq_prime(qmax)
sums
[(2,),
 (2, 2),
 (2, 2, 2),
 (3, 3),
 (2, 2, 2, 2),
 (3, 3, 2),
 (4, 4),
 (2, 2, 2, 2, 2),
 (2, 3, 3, 2),
 (3, 3, 2, 2),
 (3, 4, 3),
 (4, 4, 2),
 (5, 5)]

Let us import a random sample and create corresponding instances of Cell and BasicSums classes:

A, r = load_example01()
show_disks(A, r, w1, w2)
_images/tutorial_basic_sums_38_0.png
cell = Cell(w1, w2, qmax)
bso = BasicSums(A, cell)

One can set the dict_output parameter to True in order to force the esums() method to use a dictionary data structure (dict) instance:

sums_res = bso.esums(sums, dict_output=True)

Such an approach is very handy:

sums_res[3, 3, 2]
(-10.80733726705118+1.833254246831019j)

Moreover, it allows the transfer of the results to Pandas [2] Series or DataFrame:

import pandas as pd
pd.Series(sums_res)
2        (3.15438582262448-0.09180159581592136j)
2     (11.757827037864901+5.82076609134674e-17j)
2        (43.07907709266533-1.7681300908296955j)
3    (-3.557295937476818-7.450580596923828e-17j)
2          (173.12898896402592-6.103515625e-15j)
3        (-10.80733726705118+1.833254246831019j)
4     (7.9877213504124365-2.86102294921875e-16j)
2         (670.9285779378052-33.06492019934968j)
2           (-50.034946200507406-2.9296875e-15j)
3        (-42.96183376801067+6.393431885893068j)
3       (1.0336374840805849-1.3287290405279937j)
4        (25.69722017834029-1.0060002202097529j)
5          (-16.913965412216037+2.44140625e-16j)
dtype: complex128

Memoization and memory management

Memoization of intermediate results

What is the difference between esum() and esums() methods? Instead of computing every sum seperately via esum(), the esums() is implemented as a reccurence function that uses the momoization technique for storing vector representations (numpy arrays of \(N\) complex numbers) of intermediate sums computed during its call (for more details on algorithms and vector representations of basic sums, see [1] ). Briefly speaking, it iterates over the list of multi-indexes and computes recursively each sum, memoizing intermediate results to reuse them in computations of the remaining sums.

By default, once an intermediate vector representations of a sum is computed, it is cached in the OrderedDict data structure, so that it can be reused in the forthcoming computations Note that the cache structure is a local variable of esums(), hence all the results vanish as the function call ends.

To see this, let us derive time of computing \(G'_{16}\) containing 263167 sums:

qmax = 20
sums = sums_in_Gq_prime(qmax)
len(sums)
263167

Let us prepare the required objects:

cell = Cell(w1, w2, qmax)
bso = BasicSums(A, cell)

Now, we can compare running times of both approaches using timeit magic command:

%timeit -n 1 -r 1 np.array([bso.esum(*m) for m in sums])
1min 42s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
%timeit -n 1 -r 1 bso.esums(sums)
21.9 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Both the approaches give the same result.

res1 = np.array([bso.esum(*m) for m in sums])
res2 = bso.esums(sums)
np.all(np.isclose(res1, res2))
True

Intermediate vector representations are cached in the OrderedDict data structure which is local for esums() method. Hence, another run of esums() recomputes all the values. However, one can tell the esums() to store all intemediate results in the objects’s attribute:

%timeit -n 1 -r 1 bso.esums(sums, nonlocal_cache=True)
22 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Then,r another run computes only the sums of cached vector representations.

%timeit -n 1 -r 1 bso.esums(sums)
1.98 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

One can check how many vector representations were stored, comparing to the number of considered basic sums:

len(sums), len(bso._cache)
(263167, 481364)

Let us clear the cache:

bso.clear_cache()

The nonlocal cache can be used in the scenario when basic sums are computed at different steps of the program. Let us first compute half of the considered sums:

half_index = len(sums)//2
%timeit -n 1 -r 1 bso.esums(sums[:half_index], nonlocal_cache=True)
len(bso._cache)
11.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
240681

And the, the other half. Once the nonlocal cached was established with the nonlocal_cache parameter, it is reused and updated during the forthcomming calculations:

%timeit -n 1 -r 1 bso.esums(sums[half_index:])
len(bso._cache)
10.8 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
481364
bso.clear_cache()

As we can see, the running times sum up to the case of calculations of all sums from sums at once.

Memory load management

Note, that the momory cache load is getting larger, as the order of \(G'_q\) increase. By default, the esums() method remembers all representations of intermediate sums. One can limit the number of cached arrays via cache_only and maxsize parameters. However, it may result in an increased running time of computations.

The cache_only parameter indicates multi-indexes of sums which should be cached.

%timeit -n 1 -r 1 bso.esums(sums, cache_only=sums, nonlocal_cache=True)
39.9 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

The cache is much smaller now:

len(sums), len(bso._cache)
(263167, 263167)

Let us try another example:

bso.clear_cache()
%timeit -n 1 -r 1 bso.esums(sums, cache_only=sums[:5000])
1min 18s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

As we mentioned earlier, esums() uses OrderedDict for the cache. The maxsize parameter indicates that the cache will remember only maxsize most recently stored vector representations.

%timeit -n 1 -r 1  bso.esums(sums, maxsize=1000, nonlocal_cache=True)
45.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

One can see, that the cache do not exceed 1000 elements:

len(sums), len(bso._cache)
(263167, 1000)

References

[1](1, 2)
  1. Nawalaniec, Efficient computation of basic sums for random polydispersed composites, W. Comp. Appl. Math. (2017). https://doi.org/10.1007/s40314-017-0449-6
[2]Wes McKinney, Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010)