Poisson binomial distribution

Probability distribution

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.

Poisson binomial
Parameters — success probabilities for each of the n trials
Support k ∈ { 0, …, n }
PMF
CDF
Mean
Variance
Skewness
Excess kurtosis
MGF
CF
PGF

In other words, it is the probability distribution of the number of successes in a collection of n independent yes/no experiments with success probabilities . The ordinary binomial distribution is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is .

Definitions

Probability Mass Function

The probability of having k successful trials out of a total of n can be written as the sum [1]

 

where   is the set of all subsets of k integers that can be selected from  . For example, if n = 3, then  .   is the complement of  , i.e.  .

  will contain   elements, the sum over which is infeasible to compute in practice unless the number of trials n is small (e.g. if n = 30,   contains over 1020 elements). However, there are other, more efficient ways to calculate  .

As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the recursive formula [2] [3]

 

where

 

The recursive formula is not numerically stable, and should be avoided if   is greater than approximately 20.

An alternative is to use a divide-and-conquer algorithm: if we assume Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n=2^{b}} is a power of two, denoting by Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f(p_{i:j})} the Poisson binomial of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{i},\dots ,p_{j}} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle *} the convolution operator, we have Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f(p_{1:2^{b}})=f(p_{1:2^{b-1}})*f(p_{2^{b-1}+1:2^{b}})} .

More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_{0},\dots ,P_{n}} where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_{i}=[1-p_{i},p_{i}]} . This observation leads to the Direct Convolution (DC) algorithm for computing Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pr(K=0)} through Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pr(K=n)} :

// PMF and nextPMF begin at index 0
function DC(Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{1},\dots ,p_{n}}
) is 
     declare new PMF array of size 1
     PMF[0] = [1]
     for i = 1 to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n}
 do 
          declare new nextPMF array of size i + 1
          nextPMF[0] = (1 -  ) * PMF[0]
          nextPMF[i] = Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{i}}
 * PMF[i - 1]
          for k = 1 to i - 1 do
               nextPMF[k] = Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{i}}
 * PMF[k - 1] + (1 - Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{i}}
) * PMF[k]
          repeat
          PMF = nextPMF
     repeat
     return PMF
end function

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pr(K=k)} will be found in PMF[k]. DC is numerically stable, exact, and, when implemented as a software routine, exceptionally fast for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n\leq 2000} . It can also be quite fast for larger Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n} , depending on the distribution of the Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{i}} .[4]

Another possibility is using the discrete Fourier transform.[5]

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pr(K=k)={\frac {1}{n+1}}\sum \limits _{l=0}^{n}C^{-lk}\prod \limits _{m=1}^{n}\left(1+(C^{l}-1)p_{m}\right)}

where   and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i={\sqrt {-1}}} .

Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu[6] and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.[4]

Cumulative distribution function

The cumulative distribution function (CDF) can be expressed as:

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pr(K\leq k)=\sum _{l=0}^{k}\sum \limits _{A\in F_{l}}\prod \limits _{i\in A}p_{i}\prod \limits _{j\in A^{c}}(1-p_{j})} ,

where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F_{l}} is the set of all subsets of size   that can be selected from  .

It can be computed by invoking the DC function above, and then adding elements   through   of the returned PMF array.

Properties

Mean and Variance

Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the n Bernoulli distributions:

 
 

Entropy

There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.[7]

The Shepp–Olkin concavity conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities  .[8] This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015.[9] The Shepp–Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in  , if all  . This conjecture was also proved by Hillion and Johnson, in 2019.[10]

Chernoff bound

The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when   and for any  ):

 

where we took  . This is similar to the tail bounds of a binomial distribution.

Approximation by Binomial Distribution

A Poisson binomial distribution   can be approximated by a binomial distribution   where  , the mean of the  , is the success probability of  . The variances of   and   are related by the formula

 

As can be seen, the closer the   are to  , that is, the more the   tend to homogeneity, the larger  's variance. When all the  are equal to  ,   becomes  ,  , and the variance is at its maximum.[1]

Ehm has determined bounds for the total variation distance of   and  , in effect providing bounds on the error introduced when approximating   with  . Let   and   be the total variation distance of   and  . Then

 

 

where  .

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d(PB,B)} tends to 0 if and only if Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(PB)/Var(B)} tends to 1.[11]

Approximation by Poisson Distribution

A Poisson binomial distribution Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle PB} can also be approximated by a Poisson distribution Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Po} with mean Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda =\sum _{i=1}^{n}\displaystyle p_{i}} . Barbour and Hall have shown that

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle {\frac {1}{32}}\min({\frac {1}{\lambda }},1)\textstyle \sum _{i=1}^{n}\displaystyle p_{i}^{2}\leq d(PB,Po)\leq {\frac {1-\epsilon ^{-\lambda }}{\lambda }}\sum _{i=1}^{n}\displaystyle p_{i}^{2}}

where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d(PB,B)} is the total variation distance of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle PB} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Po} .[12] It can be seen that the smaller the Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{i}} , the better Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Po} approximates Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle PB} .

As Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(Po)=\lambda =\sum _{i=1}^{n}\displaystyle p_{i}} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(PB)=\sum \limits _{i=1}^{n}p_{i}-\sum \limits _{i=1}^{n}p_{i}^{2}} , Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(Po)>Var(PB)} ; so a Poisson binomial distribution's variance is bounded above by a Poisson distribution with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda =\sum _{i=1}^{n}\displaystyle p_{i}} , and the smaller the Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_{i}} , the closer Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(Po)} will be to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(PB)} .

Computational methods

The reference [13] discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it:

  • An R package poibin was provided along with the paper,[13] which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
  • poibin - Python implementation - can compute the PMF and CDF, uses the DFT method described in the paper for doing so.

See also

Lua error in mw.title.lua at line 346: bad argument #2 to 'title.new' (unrecognized namespace name 'Portal').

References

  1. ^ 1.0 1.1 Wang, Y. H. (1993). "On the number of successes in independent trials" (PDF). Statistica Sinica. 3 (2): 295–312.
  2. ^ Shah, B. K. (1994). "On the distribution of the sum of independent integer valued random variables". American Statistician. 27 (3): 123–124. JSTOR 2683639.
  3. ^ Chen, X. H.; A. P. Dempster; J. S. Liu (1994). "Weighted finite population sampling to maximize entropy" (PDF). Biometrika. 81 (3): 457. doi:10.1093/biomet/81.3.457.
  4. ^ 4.0 4.1 Biscarri, William; Zhao, Sihai Dave; Brunner, Robert J. (2018-06-01). "A simple and fast method for computing the Poisson binomial distribution function". Computational Statistics & Data Analysis. 122: 92–100. doi:10.1016/j.csda.2018.01.007. ISSN 0167-9473.
  5. ^ Fernandez, M.; S. Williams (2010). "Closed-Form Expression for the Poisson-Binomial Probability Density Function". IEEE Transactions on Aerospace and Electronic Systems. 46 (2): 803–817. Bibcode:2010ITAES..46..803F. doi:10.1109/TAES.2010.5461658. S2CID 1456258.
  6. ^ Chen, S. X.; J. S. Liu (1997). "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions". Statistica Sinica. 7: 875–892.
  7. ^ Harremoës, P. (2001). "Binomial and Poisson distributions as maximum entropy distributions" (PDF). IEEE Transactions on Information Theory. 47 (5): 2039–2041. doi:10.1109/18.930936.
  8. ^ Shepp, Lawrence; Olkin, Ingram (1981). "Entropy of the sum of independent Bernoulli random variables and of the multinomial distribution". In Gani, J.; Rohatgi, V.K. (eds.). Contributions to probability: A collection of papers dedicated to Eugene Lukacs. New York: Academic Press. pp. 201–206. ISBN 0-12-274460-8. MR 0618689.
  9. ^ Hillion, Erwan; Johnson, Oliver (2015-03-05). "A proof of the Shepp–Olkin entropy concavity conjecture". Bernoulli. 23 (4B): 3638–3649. arXiv:1503.01570. doi:10.3150/16-BEJ860. S2CID 8358662.
  10. ^ Hillion, Erwan; Johnson, Oliver (2019-11-09). "A proof of the Shepp–Olkin entropy monotonicity conjecture". Electronic Journal of Probability. 24 (126): 1–14. arXiv:1810.09791. doi:10.1214/19-EJP380.
  11. ^ Ehm, Werner (1991-01-01). "Binomial approximation to the Poisson binomial distribution". Statistics & Probability Letters. 11 (1): 7–16. doi:10.1016/0167-7152(91)90170-V. ISSN 0167-7152.
  12. ^ Barbour, A.D.; Hall, Peter (1984). "On the Rate of Poisson Convergence" (PDF). Zurich Open Repository andArchive. Mathematical Proceedings of the Cambridge Philosophical Society, 95(3). pp. 473–480.
  13. ^ 13.0 13.1 Hong, Yili (March 2013). "On computing the distribution function for the Poisson binomial distribution". Computational Statistics & Data Analysis. 59: 41–51. doi:10.1016/j.csda.2012.10.006.