# Distributions¶

Heuristics to fit duration distributions to sums of exponentially distributed waiting times. This module has to be considered work in progress. It works, but can be much improved, e.g. by putting conditions on the ordering of waiting times such that the symmetry in parameter space can be utilized for fitting.

class `epipack.distributions.``ExpChain`(durations)[source]

Bases: `object`

A class that represents a chain of states where the waiting time between consecutive states is distributed exponentially with predetermined mean. The class can be used to fit total waiting time distributions, i. e. distributions of sums of exponential random variables.

Parameters

durations (numpy.ndarray of float) -- mean waiting times between states in the chain

Example

```>>> C = ExpChain([0.2,0.4])
>>> C.get_mean()
0.6
```
`dydt`(t, y)[source]

The ODE that performs the transitions between states.

`get_cdf`(t=None, percentile_cutoff=0.9999, method='RK23')[source]

Obtain the cumulative distribution function of the total waiting time this chain represents.

Parameters
• t (numpy.ndarray of float, default = None) -- Ordered array of time points for which the CDF should be returned. If `None`, `scipy.optimize.solve_ivp` will choose the points itself.

• percentile_cutoff (float, default = 1 - 1e-4) -- maximum value of the CDF

• method (str, default = 'RK23') -- This is going to be passed to `scipy.optimize.solve_ivp`.

Returns

• t (numpy.ndarray of float) -- Ordered array of time points for which the CDF was computed.

• cdf (numpy.ndarray of float) -- the corresponding values of the cumulative distribution function

`get_cdf_at_percentiles`(percentiles, method='RK23')[source]

Obtain the cumulative distribution function of the total waiting time this chain represents.

Parameters
• t (numpy.ndarray of float, default = None) -- Ordered array of time points for which the CDF should be returned. If `None`, `scipy.optimize.solve_ivp` will choose the points itself.

• percentile_cutoff (float, default = 1 - 1e-4) -- maximum value of the CDF

• method (str, default = 'RK23') -- This is going to be passed to `scipy.optimize.solve_ivp`.

Returns

• t (numpy.ndarray of float) -- Ordered array of time points for which the CDF was computed.

• cdf (numpy.ndarray of float) -- the corresponding values of the cumulative distribution function

`get_mean`()[source]

Returns the mean waiting time of this chain.

`get_median_and_iqr`(method='RK23')[source]

Returns the median and inter-quartile range of the waiting time distribution this chain represents.

Parameters

method (str, default = 'RK23') -- This is going to be passed to `scipy.optimize.solve_ivp`.

Returns

• median (float) -- the median of the distribution

• iqr (numpy.ndarray of float) -- array of length 2 containing the inter-quartile range.

`get_pdf`(t=None, percentile_cutoff=0.9999, method='RK23')[source]

Obtain the probability distribution function of the total waiting time this chain represents. Uses `ExpChain.get_cdf()`.

Parameters
• t (numpy.ndarray of float, default = None) -- Ordered array of time points for which the CDF should be returned. If `None`, `scipy.optimize.solve_ivp` will choose the points itself.

• percentile_cutoff (float, default = 1 - 1e-4) -- maximum value of the CDF

• method (str, default = 'RK23') -- This is going to be passed to `scipy.optimize.solve_ivp`.

Returns

• tmean (numpy.ndarray of float) -- Ordered array of bin midpoints for which the pdf was computed.

• pdf (numpy.ndarray of float) -- the corresponding values of the pdf.

• df (numpy.ndarray of float) -- the corresponding bin sizes

`epipack.distributions.``fit_chain_by_cdf`(n, time_values, cdf, lower=1e-10, upper=10000000000.0, percentile_cutoff=0.999999999999999, x0=None)[source]

Fit a chain of exponentially distributed random variables to a distribution where the cdf is known for several time points.

While there exist statistcally sound measures to quantify the distance between two distributions, I found that the total mean squared distance actually finds decent fits consistently, so this function is going to be using that until someone convinces me that another distance measure yields better results.

This whole thing should be considered heuristic patch work in any case.

Parameters
• n (int) -- number of transitions in the chain

• time_values (numpy.ndarray of float) -- Ordered array of time points for which the cdf is known.

• cdf (numpy.ndarray of float) -- Ordered array of corresponding CDF values.

• lower (float, default = 1e-10) -- lower bound of waiting times for each transition

• upper (float, default = 1e10) -- upper bound of waiting times for each transition

• percentile_cutoff (float default = 1 - 1e-15) -- max value of the CDF that should be integrated to

• x0 (numpy.ndarray of float, default = None) -- array of length n that contains initial guesses of the chain's waiting times. If `None`, `x0` is going to contain the value `mean/n` n times, where `mean` is the mean of the distribution determined by `cdf`.

Returns

chain -- The chain that was fit to the given CDF.

Return type

ExpChain

Example

```>>> median, iqr = 13.184775302968362, ( 7.81098765, 20.86713744)
>>> fit_C = fit_chain_by_cdf(3,[iqr, median, iqr],[0.25,0.5,0.75])
>>> fit_C.get_median_and_iqr()
13.183969129892406, array([ 7.8109697 , 20.86699702])
>>> fit_C.tau
[9.22794388 0.75881288 5.72462722]
```
`epipack.distributions.``fit_chain_by_median_and_iqr`(n, median, iqr, lower=1e-10, upper=10000000000.0)[source]

Fit a chain of exponentially distributed random variables to a distribution where only median and iqr are known.

Parameters
• n (int) -- number of transitions in the chain

• median (float) -- the median of the distribution to fit to

• iqr (2-tuple of float) -- the inter-quartile range of the distribution to fit to

• lower (float, default = 1e-10) -- lower bound of waiting times for each transition

• upper (float, default = 1e10) -- upper bound of waiting times for each transition

Returns

chain -- The chain that was fit to the median and iqr.

Return type

ExpChain

Example

```>>> times = [0.3,6.,9,0.4]
>>> C = ExpChain(times)
>>> fit_C = fit_chain_by_median_and_iqr(3,*C.get_median_and_iqr())
>>> C.get_median_and_iqr()
13.184775302968362, array([ 7.81098765, 20.86713744])
>>> fit_C.get_median_and_iqr()
13.183969129892406, array([ 7.8109697 , 20.86699702])
>>> C.tau
[0.3, 6.0, 9, 0.4]
>>> fit_C.tau
[9.22794388 0.75881288 5.72462722]
```