Under what conditions does a binomial distribution tends to Poisson distribution?

Nội dung chính Show

Definitions
Standard errors
Skew & kurtosis
Their distributions
Assumptions
Mass probability functions

Definitions

Both the binomial and the Poisson distributions can arise in two ways: from the terms of a discrete mathematical series, and by repeated random samples of a binary variable. For example if the two states of the binary variable are success and failure, their probability functions describe the average distribution of a simple statistic, the sum or frequency of the number of successes in that sample, when it is calculated from many identical random samples of an infinite population. Although it is often used differently, the Poisson model is merely a special case of the binomial one - and makes very much the same assumptions.

Binomial distribution
The binomial distribution gives the number of 'successes' in a series of independent trials or random samples where the probability of success remains constant from one sample to the other - and where random selection is the only source of variation. A success is defined as having a particular characteristic; a failure lacks that characteristic. The expected probabilities of the binomial distribution are given by the expansion of the following expression:
Algebraically speaking -
where :
- P1 (or P) is the proportion of the population having the characteristic
- P0 (or Q) is the proportion lacking the characteristic, (1 − P)
- n is the number of observations comprising your sample
Simple arithmetic tells us that, irrespective of the sample size, [P + (1 − P)]n = 1. This corresponds to the total probability under the binomial distribution. For a sample of n binary observations there are n+1 possible outcomes of that sample - ranging from no successes to n successes - and [P + (1 − P)]n = 1 can be expanded into n+1 terms, each of which gives the probability of observing a given number of successes, r.
If we code a success as a one and failure as a zero, then the sum of a binary sample (ΣY) is equal to the frequency of successes ( f ). The mean (ΣY / n) of a binary population is equal to the proportion of successes (P), and the expected mean frequency of successes in samples of variable Y is equal to Pn. Assuming observations are independent and random samples of an infinite binary population, the sample sum ( f ) and mean ( p ) provide unbiased estimates of their population equivalents (Pn, and P).
The figure below shows the observed and expected frequencies of successes in random samples of 8 observations on a variable Y, where P = 0.333. Expected frequencies are calculated using the binomial mass probability function (given below).

{Fig. 1}

The distribution is unimodal, in this case with a mode at 2-3 successes for a sample size of 8, as would be expected for P=0.333.

Poisson distribution
Sometimes, when sampling a binomial variable, the probability of observing the event is very small (that is P tends to zero) and the sample size is large (that is n tends to infinity). This might be the case, for example, if we were looking at the incidence of a rare disease, where only one in ten thousand people are affected. In such a situation it is difficult and tedious to estimate expected probabilities from the binomial distribution.
Fortunately another distribution approximates the results of the binomial distribution under these circumstances - this is known as the Poisson distribution.
Since the assumption of independence of events applies for the Poisson distribution, just as it does for the binomial, one of the commonest use of the Poisson distribution is as a test of randomness in space or time.
Since n is tending to infinity, and P is tending to zero, the sample mean is not a meaningful statistic - and the sum is used instead. Hence the sum (ΣY, or f) is an estimate of Pn - which is the expected mean frequency (λ). Since P and n are combined, λ is the only parameter of the Poisson distribution.
In reality, P is assumed to be small (but not zero), and n to be large (but not infinite). Happily, as you can see from the graphs below, the Poisson distribution approximates fairly rapidly to the binomial distribution - even at relatively modest sample sizes.

{Fig. 2}

One important advantage of using the Poisson formulae is that the calculations tend to be rather more straightforward than those of the binomial. In addition the indeterminate sample size makes it applicable to a wider range of situations. For example, if you are attempting a visual count of wildlife, each animal seen can be counted as a success - each not seen is a failure. Then, assuming the probability of success is quite small, we can still estimate the standard error of our counts - provided our other assumptions are reasonable (see below).

Standard errors

Binomial distribution

The standard error of frequencies and proportions (sums and means of a binary variable) can be estimated in the same way as for those of a continuous variable - most conveniently from the sample variance formula, pq. Notice that this estimate of the standard error makes the same assumptions as the binomial distribution - and may be referred to as the binomial standard error.

Algebraically speaking -

The standard deviation of a sample of variable Y, is:
The estimated standard error of the frequency (f or ΣY) is:
The estimated standard error of the proportion (p, or ΣY/n) is:

Where

Y is a binary variable,
n is the number of observations in your sample,
f is the number (or frequency) of successes in that sample,
p = f/n = ΣY/n = the proportion of successes in that sample
q = [n − f]/n = [1 − p] = the proportion of failures in that sample.

Poisson distribution

Where the proportion of successes in a population is very small, and the sample size is very large, the frequency of successes in the sample is used as an estimate of their frequency in similarly large random samples of their population. In other words, f is used as an estimate of λ or Pn. Moreover, the variance of the observed frequencies is equal to the mean (expected) frequency.

In this situation the Poisson standard error standard error formulae may be simplified as follows:

Algebraically speaking -

For a single sample, the estimated Poisson standard error of the frequency (f) is simply:
For N samples of the same population, of unknown but presumably equal size, the estimated standard error of their mean frequency (m) is:
For a single sample of the same population, of n observations, the estimated standard error of the proportion (or 'rate') as:

where:

f is the number of successes in your sample of the binary variable Y;
SE(f) is the estimated standard error of a single frequency;
SE(m) is the estimated standard error of the mean frequency of N samples;
SE(ρ) is the standard error of the proportion, or rate;
n is a known sample size.

Skew & kurtosis

The skew and kurtosis of binomial and Poisson populations, relative to a normal one, can be calculated as follows:

Binomial distribution
Skew = (Q − P) / √(nPQ)
Kurtosis = 3 − 6/n + 1/(nPQ)
- n is the number of observations in each sample,
- P = the proportion of successes in that population,
- Q = the proportion of failures in that population,
Poisson distribution
Skew = 1 / √λ
Kurtosis = 3 + 1/λ
- λ = Pn = the expected frequency of successes in samples of that population,
- P = the proportion of successes in that population (and approaches zero),
- n is the number of observations (failures + successes) in each sample (and approaches infinity).

Their distributions

The binomial distribution
Although the binomial is a discrete distribution function, in some ways the sums (= frequencies) and means (= proportions) of binary variables behave very similarly to those of continuous variables.
In relation to sample size (n):
- The distribution of means and sums of the smallest samples approach that of the observations they represent.
- The distribution of means and sums of large samples tend towards a normal distribution.
In relation to the proportion of successes (P):
- For small and large values of P, the distributions are skewed.
- As P approaches 0.5, the distribution approximates to the normal distribution.

{Fig. 3}

For computational convenience therefore, a normal distribution function is commonly used as an approximation to the binomial one, providing PQn is at least 5. However, we still have the problem that we are using a continuous distribution to approximate a discrete one. This is commonly dealt with by using a continuity correction which consists of subtracting 0.5 from the frequency.

For example, the first graph below shows a normal approximation to a binomial function. Even though PQn is less than 5, the normal density function is not too implausible a fit. However the cumulative function shows a clear difference (of half a unit) between their locations.

{Fig. 4}

For large sample sizes the continuity correction may be ignored, but for moderate sample sizes it is generally required when a continuous function is used to approximate a discrete one. Be aware however, although they hide within quite a few textbook formulae, not all statisticians agree upon when (or indeed if) continuity corrections should be used.

The Poisson distribution
Like the binomial distribution the Poisson is discrete, and for large values of λ it approaches normal. For small expected frequencies, like the binomial, it is markedly skewed. Where the frequency is 5 or above the normal distribution is often used as an approximation - usually with a continuity correction.

{Fig. 5}

Assumptions

Because the binomial and Poisson models are often used descriptively, or merely assumed to apply, their assumptions tend to get ignored. This is partly because they tend to be used in different ways, some of which require additional assumptions.

There is a series of n experiments, or n observations - the outcome of which is unknown beforehand.
There are two possible outcomes for each trial.
The outcomes are mutually exclusive.
The outcomes are independent of each other.
The probability, P, of the outcome remains constant from one trial to the next.
The Poisson model also assumes n is large and P is small.

Where observations are recorded as groups - for example using a quadrat - you are assuming the observations within each group behave as if they were selected individually at random. For this to be reasonable, some additional assumptions must be made:

The groups you select represent the population you are interested in.
The probability of the outcome, P, is the same for all groups - and any variation in either p or f are entirely due to chance.
The sample size, n, is the same for all groups.
Although n may be undefined for the Poisson model, Pn is assumed to remain constant.

If you use a complete count of events in a time period, or of organisms in an area, then the issue of how the sample is taken does not arise. But if you are taking a sample, for example counting the number of insects within each of a number of quadrats, the quadrats are assumed to be randomly located.

When the binomial or Poisson model are used to estimate standard errors, or to predict how p or f will vary, it is further assumed there is no source of variation, other than random selection - such as measurement error.

If deviations from the binomial or Poisson distributions are used as estimates of non-randomness, it is assumed that all the other assumptions are met.

N.B. Although a Poisson model is expected to produce counts whose variance and mean are equal, the converse does not apply. Nor is a variance greater than the mean a reliable measure of 'aggregation'.

Mass probability functions

The binomial distribution
The parameters P and Q are usually estimated from the numbers in a sample with and without the characteristic. Alternatively they may be obtained from theory such as in genetics studies. The probability of getting r individuals with the characteristic in n observations can then be determined from the general formula for the binomial distribution:
Algebraically speaking -

P(ΣY = r) = n! Pr Q(n − r) for r = 0, 1, 2 ... n

r! (n − r)!

where:
- P(ΣY = r) is the probability of obtaining a given number (r) of successes in a series of independent trials or in a random sample,
- r is the number of successes in a sample (= ΣY),
- (n − r) is the number of failures in a sample (= 1 − ΣY),
- n is the number of observations,
- P is the proportion of successes in the population,
- Q is the proportion of failures in the population
The Poisson distribution
The following expression gives the probabilities for each frequency class:
Algebraically speaking -

P(ΣY=r) = λr for r = 0, 1, 2 ... ∞

r! eλ

where