Distributions¶
Heuristics to fit duration distributions to sums of exponentially distributed waiting times. This module has to be considered work in progress. It works, but can be much improved, e.g. by putting conditions on the ordering of waiting times such that the symmetry in parameter space can be utilized for fitting.

class
epipack.distributions.
ExpChain
(durations)[source]¶ Bases:
object
A class that represents a chain of states where the waiting time between consecutive states is distributed exponentially with predetermined mean. The class can be used to fit total waiting time distributions, i. e. distributions of sums of exponential random variables.
 Parameters
durations (numpy.ndarray of float)  mean waiting times between states in the chain
Example
>>> C = ExpChain([0.2,0.4]) >>> C.get_mean() 0.6

get_cdf
(t=None, percentile_cutoff=0.9999, method='RK23')[source]¶ Obtain the cumulative distribution function of the total waiting time this chain represents.
 Parameters
t (numpy.ndarray of float, default = None)  Ordered array of time points for which the CDF should be returned. If
None
,scipy.optimize.solve_ivp
will choose the points itself.percentile_cutoff (float, default = 1  1e4)  maximum value of the CDF
method (str, default = 'RK23')  This is going to be passed to
scipy.optimize.solve_ivp
.
 Returns
t (numpy.ndarray of float)  Ordered array of time points for which the CDF was computed.
cdf (numpy.ndarray of float)  the corresponding values of the cumulative distribution function

get_cdf_at_percentiles
(percentiles, method='RK23')[source]¶ Obtain the cumulative distribution function of the total waiting time this chain represents.
 Parameters
t (numpy.ndarray of float, default = None)  Ordered array of time points for which the CDF should be returned. If
None
,scipy.optimize.solve_ivp
will choose the points itself.percentile_cutoff (float, default = 1  1e4)  maximum value of the CDF
method (str, default = 'RK23')  This is going to be passed to
scipy.optimize.solve_ivp
.
 Returns
t (numpy.ndarray of float)  Ordered array of time points for which the CDF was computed.
cdf (numpy.ndarray of float)  the corresponding values of the cumulative distribution function

get_median_and_iqr
(method='RK23')[source]¶ Returns the median and interquartile range of the waiting time distribution this chain represents.
 Parameters
method (str, default = 'RK23')  This is going to be passed to
scipy.optimize.solve_ivp
. Returns
median (float)  the median of the distribution
iqr (numpy.ndarray of float)  array of length 2 containing the interquartile range.

get_pdf
(t=None, percentile_cutoff=0.9999, method='RK23')[source]¶ Obtain the probability distribution function of the total waiting time this chain represents. Uses
ExpChain.get_cdf()
. Parameters
t (numpy.ndarray of float, default = None)  Ordered array of time points for which the CDF should be returned. If
None
,scipy.optimize.solve_ivp
will choose the points itself.percentile_cutoff (float, default = 1  1e4)  maximum value of the CDF
method (str, default = 'RK23')  This is going to be passed to
scipy.optimize.solve_ivp
.
 Returns
tmean (numpy.ndarray of float)  Ordered array of bin midpoints for which the pdf was computed.
pdf (numpy.ndarray of float)  the corresponding values of the pdf.
df (numpy.ndarray of float)  the corresponding bin sizes

epipack.distributions.
fit_chain_by_cdf
(n, time_values, cdf, lower=1e10, upper=10000000000.0, percentile_cutoff=0.999999999999999, x0=None)[source]¶ Fit a chain of exponentially distributed random variables to a distribution where the cdf is known for several time points.
While there exist statistcally sound measures to quantify the distance between two distributions, I found that the total mean squared distance actually finds decent fits consistently, so this function is going to be using that until someone convinces me that another distance measure yields better results.
This whole thing should be considered heuristic patch work in any case.
 Parameters
n (int)  number of transitions in the chain
time_values (numpy.ndarray of float)  Ordered array of time points for which the cdf is known.
cdf (numpy.ndarray of float)  Ordered array of corresponding CDF values.
lower (float, default = 1e10)  lower bound of waiting times for each transition
upper (float, default = 1e10)  upper bound of waiting times for each transition
percentile_cutoff (float default = 1  1e15)  max value of the CDF that should be integrated to
x0 (numpy.ndarray of float, default = None)  array of length n that contains initial guesses of the chain's waiting times. If
None
,x0
is going to contain the valuemean/n
n times, wheremean
is the mean of the distribution determined bycdf
.
 Returns
chain  The chain that was fit to the given CDF.
 Return type
Example
>>> median, iqr = 13.184775302968362, ( 7.81098765, 20.86713744) >>> fit_C = fit_chain_by_cdf(3,[iqr[0], median, iqr[1]],[0.25,0.5,0.75]) >>> fit_C.get_median_and_iqr() 13.183969129892406, array([ 7.8109697 , 20.86699702]) >>> fit_C.tau [9.22794388 0.75881288 5.72462722]

epipack.distributions.
fit_chain_by_median_and_iqr
(n, median, iqr, lower=1e10, upper=10000000000.0)[source]¶ Fit a chain of exponentially distributed random variables to a distribution where only median and iqr are known.
 Parameters
n (int)  number of transitions in the chain
median (float)  the median of the distribution to fit to
iqr (2tuple of float)  the interquartile range of the distribution to fit to
lower (float, default = 1e10)  lower bound of waiting times for each transition
upper (float, default = 1e10)  upper bound of waiting times for each transition
 Returns
chain  The chain that was fit to the median and iqr.
 Return type
Example
>>> times = [0.3,6.,9,0.4] >>> C = ExpChain(times) >>> fit_C = fit_chain_by_median_and_iqr(3,*C.get_median_and_iqr()) >>> C.get_median_and_iqr() 13.184775302968362, array([ 7.81098765, 20.86713744]) >>> fit_C.get_median_and_iqr() 13.183969129892406, array([ 7.8109697 , 20.86699702]) >>> C.tau [0.3, 6.0, 9, 0.4] >>> fit_C.tau [9.22794388 0.75881288 5.72462722]