Background
Basics
ILTpy implements an inversion algorithm to fit experimental or simulated noisy data of complex materials by computing distributions of underlying physical or chemical properties. It was initially developed for magnetic resonance relaxation and diffusion data, and then also utilized for electrochemical impedance (EIS) data. These data often contain multiple components with varying distributions. Fitting a specific model, such as a mono-exponential, requires prior knowledge of the number of species present and yields only an effective characteristic constant. In contrast, inversion algorithms do not assume the shape or number of species in the system, but instead reveal the distribution of characteristic constants using a kernel suitable for modeling the response of a particular process.
A common approach to analyzing magnetic resonance data using Inverse Laplace Transform (ILT) methods involves applying a non-negativity constraint to prevent oscillatory solutions. This constraint assumes that all relaxation components have the same sign. However, in systems where cross-relaxation or exchange occurs, such a constraint is unjustified, as it suppresses any relaxation components with negative values, potentially introducing artificial features in the resulting distributions that do not correspond to actual physical processes. In contrast, ILTpy avoids the use of a non-negativity constraint, employing instead a zero-crossing penalty along with uniform penalty regularization to stabilize the inversion process. Parameterization and the algorithm are described in : J. Granwehr, P.J. Roberts, J. Chem. Theory Comput. 8, 34733482 (2012).
Note
In the following, adjustable parameters in ILTpy are denoted as parameter
with the default value enclosed in brackets as parameter [=default value]
.
For information on changing the default parameters, see changing parameters.
Summary of the algorithm
Measured signals \(s(t)\) at sampling times \(t\) are related to an underlying distribution \(G(\tau)\) via an inhomogeneous Fredholm integral of the first kind as
with kernel \(K\) and random noise term \(\epsilon\). For example, in a magnetic resonance relaxation experiment, \(t\) is an evolution or recovery time, and \(\tau\) a relaxation time constant. This expression can be discretized as
with data vector \(\mathbf{s}\), kernel matrix \(\mathbf{K}\), distribution vector (or ‘spectrum’) \(\mathbf{g}\), and noise vector \(\mathbf{e}\). Solving this equation for \(\mathbf{g}\) is ill-conditioned. ILTpy uses Tikhonov regularization in generalized form to stabilize the solution, solving the optimization problem
of finding an estimate \(\hat{\mathbf{g}}\) for the true underlying \(\mathbf{g}\).
The regularization term contains a global scaling factor \(\alpha_{00}\) (alpha_00 [=1]
).
Other than with Tikhonov regularization in standard form, the regularization matrix is not a unity matrix. Instead, an energy norm
is used. It consists of three independent regularization terms: (i) a uniform penalty (UP) second derivative term \({\boldsymbol \Lambda}_{\rm up} = \mathbf{C} \mathbf{L}_{2}\), where \(\mathbf{L}_{2}\) is the curvature (second derivative) operator that acts on \(\mathbf{g}\) and \(\mathbf{C}\) is the UP coefficient matrix, (ii) a zero-crossing first derivative term \({\boldsymbol \Lambda}_{\rm zc} = \mathbf{D} \mathbf{L}_{1}\), with slope (first derivative) operator \(\mathbf{L}_{1}\) and coefficient matrix \(\mathbf{D}\), and (iii) an amplitude regularization term \({\boldsymbol \Lambda}_{\rm amp} = \mathbf{A} + \mathbf{Q}\) with coefficient matrices \(\mathbf{A}\) and \(\mathbf{Q}\).
These terms are combined to the regularization matrix as
The coefficient matrices \(\mathbf{C}\) and \(\mathbf{D}\), and possibly also \(\mathbf{A}\), depend on \(\mathbf{g}\).
They are iteratively calculated until \({\boldsymbol \Lambda}^2\) converges or the number of iterations reaches maxloop [=100]
.
\(\mathbf{C}\) is a diagonal matrix whose elements are defined as
The denominator of \(\mathbf{C}\) contains three independent terms. Two of them act on the estimate \(\mathbf{g}\)
from the previous iteration. \(\delta\) is the linear or logarithmic distance between neighboring points in \(\mathbf{g}\),
depending if the corresponding vector tau
contains linearly or logarithmically spaced elements, respectively.
ILTpy assumes logarithmic spacing for inverted dimensions and linear spacing for non-inverted dimensions, but this behavior
can be adjusted using the parameter base_tau [=0 (linear); =1(lograrithmic)]
.
\(\delta\) is included in the calculation of \(\mathbf{C}\) to obtain a regularization whose parameters are largely independent of the number of points \(M\) chosen for \(\mathbf{g}\),
while \(\mathbf{L}_{1}\) and \(\mathbf{L}_{2}\) are simply finite difference operators that do not take \(\delta\) into account.
The three terms in the denominator of \(\mathbf{C}\), which can be individually scaled, can be summarized as follows:
The curvature term counteracts the \(\mathbf{L}_{2}\) term in the numerator, leading to a uniform regularization. It is responsible for the name uniform penalty. It is scaled with the adjustable curvature regularization parameter \(\alpha_{c}\) (
alpha_c
), whose default value is set to match the standard deviation \(\sigma\) of the data. Currently, ILTpy does not attempt to estimate the \(\sigma\) from the data but assumes by default that \(\sigma=1\). Hence \(\alpha_{c}= \delta M/N\) as standard value, where \(N\) is the number of data points of \(\mathbf{s}\). Finally, a moving maximum of width \(2\eta_2+1\) is applied across the curvature term. \(2\eta_2+1\) is represented byc_nmax [=3]
.When only a curvature term is used, distributions tend to assume some rather unphysical features, such as box shapes or, for broad contributions, staircase-like distributions, The slope term prevents this by adding to the denominator of \(\mathbf{C}\) (and thereby redusing the regularization) in regions where the curvature is low. Trial and error indicated that good results were obtained by chosing the adjustable slope regularization parameter \(\alpha_{p}\) (
alpha_p [= 5 * alpha_c]
) as a multiple of \(\alpha_{c}\). A moving maximum of width \(2\eta_1+1\) is applied across the curvature term. \(2\eta_1+1\) is represented byp_nmax [=3]
.The compliance floor \(\alpha_{0}\) (
alpha_0 [=1e-4]
) provides a base bias in regions with no or negligible spectral density.
The zero-crossing penalty replaces the more common non-negativity constraints. It penalizes sign changes of \(\mathbf{g}\), but without favoring one sign over the other. It consists of an \(\mathbf{L}_{1}\) slope operator, hence it acts more strongly if the slope of the distribution is large at positions where its sign changes. This adds a considerable penalty to strongly oscillating \(\mathbf{g}\), while smooth transitions are less affected. The diagonal coefficient matrix is calculated as
where \(\operatorname{sign}(\mathbf{g})\) is the vector of element-wise signs of \(\mathbf{g}\). \(\mathbf{L}_{1^-}\) and \(\mathbf{L}_{1^+}\) are the difference operators calculating the difference of each element of
\(\mathbf{g}\) with its previous and its subsequent element, respectively. The weight of the zero-crossing term can be adjusted with \(\alpha_{d}\), corresponding to alpha_d [=1e-5]
, and a moving maximum of width \(2\eta_D+1\),
with \(2\eta_D+1\) represented by zc_nmax [=1]
, is applied across the zero crossing term. The zero crossing penalty can be switched using reg_zc [=True]
, which can be helpful if instead a non-negativity penalty is applied.
Penalty at the boundaries
The third term penalizes features in \(\mathbf{g}\) that are expected to be absent (prior knowledge) or undesired. It includes a penalty at the boundaries, with an additional regularization for the first and last \(h_{0,r}\) points (
reg_bc [=2 (inverted dimension); =0 (non-inverted, regularized dimension)]
) of the distribution to avoid large fluctuations at the edges. The diagonal coefficient matrix is\[\mathbf{A}_k = \frac{(H(h_{0}-k) + H(h_{0}-M+k))} {\sqrt{\alpha_{a} \delta}} \, ,\]where \(H(\cdot)\) denotes the Heaviside function. The weight of this term can be adjusted with parameter \(\alpha_a\). However, the current implementation uses a parameter
alpha_bc [=50]
that represents \(1/\sqrt{\alpha_{a} \delta}\).Non-negativity Penalty
In addition, the amplitude term can also be used to include a non-negativity penalty, which does not entirely suppress negative contributions in the distribution but adds an additional penalty to effectively minimize them. This contribution is turned off by default (
nn [=False]
). The diagonal coefficient matrix\[\mathbf{Q}_k = \frac{1}{2} \alpha_Q \left(1-\operatorname{sign}(\mathbf{g}\right)_k)\]is used, with regularization parameter \(\alpha_Q\) (
alpha_nn [=1000]
). This penalty can be used to test the validity of a non-negativity assumption by comparing the residuals with zero-crossing penalty and with non-negativity penalty.See an example where assumption of non-negativity is tested.
Static Baseline
For some experiments, such as longitudinal (\(T_1\)) relaxation time experiments in NMR or EPR, a static baseline must be considered, i.e. the signal does not relax to zero at infinite evolution time, but to some bias value.
Formally this could be addressed by including a term with \(T_1 \rightarrow \infty\). However, for distributions with logarithmically spaced independent variable vector tau
, the UP regularization cannot be employed,
hence only the compliance floor term alpha_0
remains. Such a term for the static baseline can be included into \(\mathbf{g}\) by setting sb [=False]
to True
. Provided that the signal-to-noise ratio (SNR) is high enough that
the baseline can be clearly distinguished from the longest time constant in the target distribution vector, the exact value of alpha_0
is not too critical. Alternatively, the upper limit of tau
can be set a few orders of magnitude higher than what can be
accurately distinguished for a given data set, otherwise the baseline manifests as a peak with an unrealistically large time constant.
See an example where static baseline feature is used for 1D inversion and an example where the baseline artifact is removed for 2D data.
Weighted inversion
While in NMR the assumption of independent and identically distributed (iid) Gaussian white noise characterized by a parameter-independent variance \(\sigma^2\) is often justified, this is not an assumption that can be generally made. For example, in electrochemical impedance spectroscopy (EIS), where data are recorded over a frequency range spanning several orders of magnitude, \(\sigma^2\) is frequency dependent. With the algorithm used by ILTpy, where regularization is parametrized that its norm matches the norm of the noise, a choice would have to be made to overregularize regions with low noise, underregularize regions with high noise, or perform some sort of piecewise inversion with varying regularization. As an alternative Borgia et al. suggested to use a weighted inversion,1
with the diagonal weight matrix \(\mathbf{B}\), whose elements \(\mathbf{B}_{kk} = {\boldsymbol \sigma}_k^{-2}\) are the inverse of the variance of the correspondig data point \(\mathbf{s}_k\). To facilitate a weighted inversion,
a vector or matrix \(\boldsymbol \sigma\) (sigma [=None]
) with the same shape as \(\mathbf{s}\) containing the respective standard deviations must be provided.
See an example where weighted inversion is used.
Data compression
Particularly when using ILTpy with multi-dimensional data, the size of matrices gets large quickly. Data compression based on singular value decomposition (SVD) has been
suggested as a mitigation by Venkataramanan et al.2 Thereby a SVD is applied on the data, and all the singular values with a combined contribution below the noise level are discarded.
The number of retained singular values is specified using s [=15]
. The success of this strategy largely depends on the kernel. The singular values of data described by an exponential kernel decay quickly,
hence a substantial data reduction without loss of information can be achieved. In contrast, the singular values of a Debye kernel, as commonly used with impedance data, decay more slowly, hence either a large number of singular values must be retained,
or some information contained in the data may be filtered out. By default, SVD-based data compression is turned off (compress [=False]
). For large sparse matrices, there are alternative algorithms that are more efficient
in determining a low number of singular values. These can be used by setting use_svds [=False]
to True
.
See an example with data compression.
Multiple dimensions
Since many multi-dimensional magnetic resonance correlation experiments encode different parameters along different dimensions, the data can be described by a Fredholm integral of the first kind with separable kernels. This kernel structure can be exploited by combining kernels for multiple dimensions via outer product and vectorizing the multi-dimensional distribution. This is less memory efficient than the approach by Venkataramanan et al.2 for the 2D case, but in principle scales to an arbitrary number of dimensions. ILTpy employs such an outer product kernel structure. To use it, the scalar parameters defined above can be provided as a list of parameters with each element in a list corresponding to a specific dimension.
While the different terms in the denominator of the UP regularization coefficient matrix are adjusted to reproduce broad features in the target distribution \(\mathbf{g}\), a practical problem arises in multi-dimensional data with broad ridges along diagonals, which are otherwise narrower along the main axes. These features, which are rather common in magnetic resonance, tend to be broken up into individual features with a width corresponding to their dimensions along the main axes. This can be prevented by constructing \({\boldsymbol \Lambda}_{\rm up}\) to also contain diagonal and anti-diagonal terms, i.e. an \(\mathbf{L}_{2}\) operator calculating the curvature along the diagonal and anti-diagonal, along with a corresponding coefficient matrix.
See an example for inverting 3D data.
Convergence
Since the zero-crossing penalty is an on/off regularization term, convergence of the algorithm can be delayed or prevented completely.
An example is a minor negative component that vanishes in one iteration when the zero-crossing penalty is effective.
In the next iteration there is no zero-crossing penalty, and the effect reappears, leading to an oscillating behavior as iterations progress.
This problem only has a minor impact for the residuals and does not cause large-scale oscillations of the distribution, but depending on the object
function or metric, may prevent convergence. A possible measure to accelerate convergence is to only partially update the regularization matrix during each iteration.
The updated terms only contain some fraction \(\beta\) (with \(0<\beta<=1\)) from the current iteration and a contribution \((1-\beta)\) from previous value.
To stabilize convergence, \(\beta\) can be reduced as the number of iterations increases. ILTpy offers the possibility to independently adjust \(\beta\) for the uniform penalty term
and for the zero crossing term of the regularization matrix (terms reg_down [=29]
, reg_downrate [=0.05]
, reg_upd [=1]
, Fit.reg_udpmin [=0.2]
, and Fit.zc_on [=4]
, zc_down [=59]
, zc_downrate [=0.05]
, zc_upd [=1]
, zc_updmin [=0.2]
, respectively).
Overview of parameters
alpha_0
(float, optional, default=1e-4)In regions without spectral density of the inverted distribution, this parameter defines a baseline regularization.
alpha_00
(float, optional, default=1)Parameter to globally scale the regularization term. It equally affects all terms of the penalty.
alpha_p
(float or None, optional, default=None)Second term of uniform penalty regularization. It reduces the curvature penalty as a function of the slope. Primarily important for broad features with weak curvature.
alpha_c
(float or None, optional, default=None)Main term of uniform penalty regularization. It determines the weight of the curvature penalty.
alpha_d
(float or None, optional, default=None)Determines the weight of the ZC penalty towards the full regularization matrix. If
alpha_d
is None, it is set equal to1/alpha_a
.alpha_a
(float or None, optional, default=1e5)Adjusts the contribution of zero crossings to the regularization matrix. If
alpha_d
is None, it is set equal to1/alpha_a
.alpha_bc
(float, optional, default=50)Boundary regularization helps to achieve a smooth, physically meaningful behavior of the resulting distribution at the edges. Depends on a suitable choice of the user-selectable range of the independent parameter of the distribution. If chosen range is too narrow that features outside this range are implied by the data, systematic residuals will result.
alpha_nn
(float, optional, default=1000)Alternatively to or together with a zero-crossing penalty, a non-negativity penalty can be applied. It is not a constraint that completely prevents negative components, but a regularization penalty to minimize them. This parameter determines its weight.
alt_g
(int, optional, default=0, allowed={0,2,4,6})Different numerical algorithms can be chosen for the time-limiting inversion step. Default (alt_g=0) well suited for sparse kernel matrices (e.g. if second dimension is regularized, but not inverted). alt_g=6 performs better with dense kernel matrices, when all dimensions are inverted.
compress
(bool or list of bool, optional, default=False)Turn singular value decomposition based data compression on or off.
use_svds
(bool or list of bool, optional, default=False)Use alternative sparse algorithm to perform singular value decomposition of highly sparse matrices.
s
(int or list of int, optional, default=15)When using data compression based on singular value decomposition, this parameter contains the number of singular values, ordered according to their amplitude, to retain.
sigma
(ndarray, optional, default=None)If noise is not independent and identically distributed, a weighted inversion can be conducted that takes into account the noise variance of each data point.
sigma
contains the noise standard deviation of each data point and must have the same shape as thedata
.g_guess
(ndarray, optional, default=None)If prior knowledge is available regarding the target distribution, an initial guess can be provided.
c_nmax
(int, optional, default=3)The denominator of the coefficient matrix for the uniform penalty term contains a curvature term. This curvature term is calculated by applying a moving maximum over the result of a finite difference curvature operator applied to the estimate of the target distribution from the previous iteration. The size of the box for the moving maximum is c_nmax.
p_nmax
(int, optional, default=3)The denominator of the coefficient matrix for the uniform penalty term contains a slope term. This slope term is calculated by applying a moving maximum over the result of a finite difference operator applied to the estimate of the target distribution from the previous iteration. The size of the box for the moving maximum is p_nmax.
reg_bc
(int or None, optional, default=None)Specifies the number of data points that are regularized at the edges of the distribution to facilitate smooth boundary conditions. Default is 2 for inverted and 0 for non-inverted dimensions.
kernel
(IltKernel, default=IltKernel())Kernel function used for inversion. Defaults to
IltKernel()
. Input must be an instance of IltKernel() class. Users can specify arbitrary kernel functions in addition to a set of predefined kernels (See specifiying kernels.)base_tau
(int, {0, 1}, optional, default=1)Defines spacing for
tau
: 1 = logarithmic, 0 = linear. If a distribution contains features over a large range (typically more than one order of magnitude), it is convenient to choose logarithmically spaced vectors for the independent variable. Data along non-inverted dimensions are often linearly spaced. This has some implications for calculations of slope and curvature, hence this parameter can be adjusted. ILTpy defaults tobase_tau
equal to 0 if an identity kernel is used and equal to 1 otherwise.maxloop
(int, optional, default=100)ILTpy finishes if the convergence criterion is not met after this number of iterations. Due to the discontinuous properties of the ZC penalty, it is not unusual that no exact convergence is achieved. If maxloop is chosen too big, overfitting may be observed.
nn
(bool, optional, default=False)Apply a non-negativity penalty. Does not prevent negative components in the distribution completely, but negative components are effectively suppressed if chosen large enough.
reg_down
(int, optional, default=29)After this iteration, the regularization update rate decreases to stabilize convergence.
reg_downrate
(float, optional, default=0.05)After the number of iteration has passed
reg_down
, the fraction of regularization update decreases by this much every iterationreg_upd
(float, optional, default=1)Specify initial fraction of regularization update compared to the previous iteration (0<reg_upd<=1).
reg_updmin
(float, optional, default=0.2)Minimum regularization update fraction. Once this value is reached, the update rate does not decrease further.
sb
(bool, optional, default=False)Include a bias term to the distribution. This term is not included for the UP regularization. With an exponential kernel, this term corresponds to an infinitely long relaxation time constant.
reg_zc
(bool, optional, default=True)Enable/disable zero-crossing (ZC) penalty.
zc_max
(int, optional, default=1)(Deprecated) Previously used in ZC penalty calculations.
zc_down
(int, optional, default=59)After this iteration, the ZC update rate decreases to stabilize convergence.
zc_upd
(float, optional, default=1)Specify initial fraction of ZC penalty update compared to the previous iteration (0<zc_upd<=1).
zc_updmin
(float, optional, default=0.2)Minimum ZC penalty update fraction. Once this value is reached, the update rate does not decrease further.
zc_downrate
(float, optional, default=0.05)After the number of iteration has passed
zc_down
, the fraction of ZC penalty update decreases by this much every iterationzc_on
(int, optional, default=4)The iterative procedure may converge more quickly if initially a few iterations are done without ZC regularization. This parameter specifies the first iteration with ZC regularization turned on.
zc_nmax
(int, optional, default=3)To calculate the coefficient matrix for the zero-crossing penalty, a moving maximum over the result of a finite difference operator applied to the sign of the target distribution estimate from the previous iteration is applied. The size of the box for the moving maximum is zc_nmax.
dim_ndim
(ndarray, optional, default=None)In case of a multi-dimensional data set, if the last dimension is not inverted but regularized i.e an identity kernel is chosen in that dimension, by default, ILTpy will use an array with a spacing of 1 as the input (and output) sampling. This behavior can be changed by specifying
dim_ndim [=None]
, ILTpy then uses the array specified bydim_ndim
as the input sampling. If features change smoothly along a non-inverted dimension, a smoother distribution with potentially narrower features may be obtained.tau
(ndarray or list of ndarray, optional, default=None)Vector containing the values of the independent variable along the dimension of the target distribution. Can be linearly or logarithmically spaced.
conv_limit
(float, default=1e-6)Convergence reached if the norm of the difference of the target distribution between two consecutive iterations relative to the norm of the distribution is below this value.
Cz
(list of ndarray, optional, default=[np.array([0])])Contains the zero crossing term of the regularization. It is updated during each iteration and is only needed as an input if an inversion is interrupted after a certain number of iteration and later continued. The output can be used to inspect the ZC contribution to the regularization matrix.
store_g
(bool, optional, default=False)If True, g (distribution or spectrum) is stored in
IltData.g_list
after each iteration.store_gamma
(bool, optional, default=False)If True, the regularization matrix is stored in
IltData.gamma_list
after each iteration.force_sparse
(bool, optional, default=False)If True, all operations for alt_g = 0 will use matrices in sparse format irrespective of the sparsity of kernel matrices
sparse_threshold
(float, optional, default=0.92)Defines the minimum sparsity level required for kernel matrices to be treated as sparse in case of alt_g = 0. For example, a value of 0.9 means that a matrix must have more than 90% zero entries to be considered sparse.
References
[1] Borgia, G.; Brown, R.; Fantazzini, P. J. Magn. Reson. 1998, 132,65−77.
[2] Venkataramanan, L.; Song, Y.-Q.; Hürlimann, M. IEEE Trans.Signal Process. 2002, 50, 1017−1026.
[3] Granwehr, J.; Roberts, P.J.; Journal of chemical theory and computation, 2012, 8, 3473-3482.