Distributions¶

Heuristics to fit duration distributions to sums of exponentially distributed waiting times. This module has to be considered work in progress. It works, but can be much improved, e.g. by putting conditions on the ordering of waiting times such that the symmetry in parameter space can be utilized for fitting.

class epipack.distributions.ExpChain(durations)[source]¶

Bases: object

A class that represents a chain of states where the waiting time between consecutive states is distributed exponentially with predetermined mean. The class can be used to fit total waiting time distributions, i. e. distributions of sums of exponential random variables.

Parameters: durations (numpy.ndarray of float) -- mean waiting times between states in the chain

Example

>>> C = ExpChain([0.2,0.4])
>>> C.get_mean()
0.6

dydt(t, y)[source]¶: The ODE that performs the transitions between states.

get_cdf(t=None, percentile_cutoff=0.9999, method='RK23')[source]¶

Obtain the cumulative distribution function of the total waiting time this chain represents.

Parameters

t (numpy.ndarray of float, default = None) -- Ordered array of time points for which the CDF should be returned. If None, scipy.optimize.solve_ivp will choose the points itself.
percentile_cutoff (float, default = 1 - 1e-4) -- maximum value of the CDF
method (str, default = 'RK23') -- This is going to be passed to scipy.optimize.solve_ivp.

Returns

t (numpy.ndarray of float) -- Ordered array of time points for which the CDF was computed.
cdf (numpy.ndarray of float) -- the corresponding values of the cumulative distribution function

get_cdf_at_percentiles(percentiles, method='RK23')[source]¶

Obtain the cumulative distribution function of the total waiting time this chain represents.

Parameters

t (numpy.ndarray of float, default = None) -- Ordered array of time points for which the CDF should be returned. If None, scipy.optimize.solve_ivp will choose the points itself.
percentile_cutoff (float, default = 1 - 1e-4) -- maximum value of the CDF
method (str, default = 'RK23') -- This is going to be passed to scipy.optimize.solve_ivp.

Returns

t (numpy.ndarray of float) -- Ordered array of time points for which the CDF was computed.
cdf (numpy.ndarray of float) -- the corresponding values of the cumulative distribution function

get_mean()[source]¶: Returns the mean waiting time of this chain.

get_median_and_iqr(method='RK23')[source]¶

Returns the median and inter-quartile range of the waiting time distribution this chain represents.

Parameters

method (str, default = 'RK23') -- This is going to be passed to scipy.optimize.solve_ivp.

Returns

median (float) -- the median of the distribution
iqr (numpy.ndarray of float) -- array of length 2 containing the inter-quartile range.

get_pdf(t=None, percentile_cutoff=0.9999, method='RK23')[source]¶

Obtain the probability distribution function of the total waiting time this chain represents. Uses ExpChain.get_cdf().

Parameters

t (numpy.ndarray of float, default = None) -- Ordered array of time points for which the CDF should be returned. If None, scipy.optimize.solve_ivp will choose the points itself.
percentile_cutoff (float, default = 1 - 1e-4) -- maximum value of the CDF
method (str, default = 'RK23') -- This is going to be passed to scipy.optimize.solve_ivp.

Returns

tmean (numpy.ndarray of float) -- Ordered array of bin midpoints for which the pdf was computed.
pdf (numpy.ndarray of float) -- the corresponding values of the pdf.
df (numpy.ndarray of float) -- the corresponding bin sizes

epipack.distributions.fit_chain_by_cdf(n, time_values, cdf, lower=1e-10, upper=10000000000.0, percentile_cutoff=0.999999999999999, x0=None)[source]¶

Fit a chain of exponentially distributed random variables to a distribution where the cdf is known for several time points.

While there exist statistcally sound measures to quantify the distance between two distributions, I found that the total mean squared distance actually finds decent fits consistently, so this function is going to be using that until someone convinces me that another distance measure yields better results.

This whole thing should be considered heuristic patch work in any case.

Parameters

n (int) -- number of transitions in the chain
time_values (numpy.ndarray of float) -- Ordered array of time points for which the cdf is known.
cdf (numpy.ndarray of float) -- Ordered array of corresponding CDF values.
lower (float, default = 1e-10) -- lower bound of waiting times for each transition
upper (float, default = 1e10) -- upper bound of waiting times for each transition
percentile_cutoff (float default = 1 - 1e-15) -- max value of the CDF that should be integrated to
x0 (numpy.ndarray of float, default = None) -- array of length n that contains initial guesses of the chain's waiting times. If None, x0 is going to contain the value mean/n n times, where mean is the mean of the distribution determined by cdf.

Returns

chain -- The chain that was fit to the given CDF.

Return type

ExpChain

Example

>>> median, iqr = 13.184775302968362, ( 7.81098765, 20.86713744)
>>> fit_C = fit_chain_by_cdf(3,[iqr[0], median, iqr[1]],[0.25,0.5,0.75])
>>> fit_C.get_median_and_iqr()
13.183969129892406, array([ 7.8109697 , 20.86699702])
>>> fit_C.tau
[9.22794388 0.75881288 5.72462722]

epipack.distributions.fit_chain_by_median_and_iqr(n, median, iqr, lower=1e-10, upper=10000000000.0)[source]¶

Fit a chain of exponentially distributed random variables to a distribution where only median and iqr are known.

Parameters

n (int) -- number of transitions in the chain
median (float) -- the median of the distribution to fit to
iqr (2-tuple of float) -- the inter-quartile range of the distribution to fit to
lower (float, default = 1e-10) -- lower bound of waiting times for each transition
upper (float, default = 1e10) -- upper bound of waiting times for each transition

Returns

chain -- The chain that was fit to the median and iqr.

Return type

ExpChain

Example

>>> times = [0.3,6.,9,0.4]
>>> C = ExpChain(times)
>>> fit_C = fit_chain_by_median_and_iqr(3,*C.get_median_and_iqr())
>>> C.get_median_and_iqr()
13.184775302968362, array([ 7.81098765, 20.86713744])
>>> fit_C.get_median_and_iqr()
13.183969129892406, array([ 7.8109697 , 20.86699702])
>>> C.tau
[0.3, 6.0, 9, 0.4]
>>> fit_C.tau
[9.22794388 0.75881288 5.72462722]