2 Complete Probability & Statistics for Cambridge International AS & A Level
Second Edition
James Nicholson
We ensure every Cambridge learner can...
Aspire
We help every student reach their full potential with complete syllabus support from experienced teachers, subject experts and examiners.
Succeed
We bring our esteemed academic standards to your classroom and pack our resources with effective exam preparation. You can trust Oxford resources to secure the best results.
Progress
We embed critical thinking skills into our resources, encouraging students to think independently from an early age and building foundations for future success.
OXFORD
UNIVERSITY PRE SS
Great Clarendon Su eet, Oxford, OX2 6DP, United Kingdom
Oxford University Press is a department of the University of Oxford. It fiuthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford Univer:sity Press in the UK and in certain other countries
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press , or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above
You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
978-0-19-842517-5 10987654321
Paper used in the production of this book is a natural, recyclable product made from wood grown in sustainable forests.
The manufacturing process conforms to the environmental regulations of the country of origin.
Printed in Great Britain by Bell and Bain Ltd. Glasgow
The questions, all example answers and comments that appear in this book were written by the authors.
Aclm.owledgements
The publisher would like to thank the following for permission to reproduce photographs: pt: Norma Jean Gargasz/age footstock; p2 (fl): Rackem1ann/iStockphoto; p2 (TR) Maica/iStockphoto; p2 (Ml): dpa picture alliance/Alamy Stock Photo; p2 (MR) Tim Graham/Ala.my Stock Photo; p2 (Bl): Echo/Gettyimages; p2 (BR): Richard Wear/age footstock; pt3 ff): ERproductions Ltd/age footstock; p13 (BL) ChungkingJShutterstock; pt3 (Bl): Phil Robinson/age footstock; p14 (T) LoloStock/ Shutterstock; pt4 (Bl): Dzimin/ Fotolia; pt4 (BR): Pixel Shepherd/age footstock; p20: Leigh Prather/ Shutterstock; p24: Gwright/Alamy Stock Photo; p28: Martin Plob/age footstock; p47 (T) Bert de Ruiter/ Alamy Stock Photo; p47 (M): Vadim PetrakovJShutterstock; p47 (B): Danita DelmontJShutterstock; p48: Caro SeebergJPhotoshot; p55 (TL) OJO_Images/iStockphoto; pSS (TR) Denis Kuvaev/Shutterstock; p60: Barna Tanko/Shutterstock; p62: Lucky-photographer/Shutterstock; p63: DariazufShutterstock; pSO: FloridaStock/Shutterstock; p86 (Bl): James SteidlfShutterstock; p86 (BR): Photology1971 / Shutterstock; pt t2 (f): f11photo/Shutterstock; pt t2 (B): Ei Katsumata/age footstock; p113 Frank Vetere/Alan1y Stock Photo; pt t 4: Ian Murray/age footstock; pt 3 6: Javier Larrea/age footstock; pt60: Saryms akov Andrey/Shutterstock; pt 76 (Tl): MR1805/iStockphoto; pt 76 (fR): ssuaphoto/ iStockphoto; pt77: AshDesign/Shutterstock
Cover lllustration by Ian Norris , Oxford University Press
1 The Poisson distribution
1.2 The role of the parameter of the Poisson distribution
1.3 The recurrence relation for the Poisson distribution
1.4 Mean and variance of the Poisson distribution
1.5 Modelling with the Poisson distribution
2 Approximations involving the Poisson distribution
2.1 Poisson as an approximation to the binomial
2.2 The normal approximation to the Poisson distribution
3 Linear combination of random variables
3.1 Expectation and variance of a linear function of a random variable
3.2 Linear combination of two (or more) independent random variables
3.3 Expectation and variance of a sum of repeated independent observations of a random variable, and the mean of those observations
3.4 Comparing the sum of repeated independent observations with the multiple of a single observation
Maths in real-life: The mathematics of the past
4 Linear combination of Poisson and normal variables
4.1 The distribution of the sum of two independent Poisson random variables
4.2 Linear functions and combinations of normal random variables
5
6 Sampling
6.1 Populations, census and sampling
6.2 Advantages and disadvantages of sampling
6.3 Variability between samples and use of random numbers
6.4 The sampling distribution of a statistic
6.5 Sampling distribution of the mean of repeated observations of a random variable
6.6 Sampling distribution of the mean of a sample from a normal distribution
6.7 The Central Limit Theorem
6.8 Descriptions of some sampling methods
7 Estimation
7.1 Interval estimation
7.2 Unbiased estimate of the population mean
7.3 Unbiased estimate of the population variance
7.4 Confidence intervals for the mean of a normal distribution
7.5 Confidence intervals for the mean of a large sample from any distribution
7.6 Confidence intervals for a proportion
8 Hypothesis testing for discrete distributions
8.1 The logical basis for hypothesis testing
8.2 Critical region
8.3 Type I and Type II errors
8.4 Hypothesis test for the proportion p of a binomial distribution
8.5 Hypothesis test for the mean of a Poisson distribution
9 Hypothesis testing using the normal distribution
9.1 Hypothesis test for the mean of a normal distribution
9.2 Hypothesis test for the mean using a large sample
9.3 Using a confidence interval to carry out a hypothesis test
Introduction
About this book
This book has been written to cover the Cambridge International AS & A Level Mathematics (9709) course, and is fully aligned to the syllabus.
In addition to the main curriculum content, you will find:
● ‘Maths in real-life’ , showing how principles learned in this course are used in the real world.
● Chapter openers, which outline how each topic in the Cambridge 9709 syllabus is used in real-life.
The book contains the following features:
Notes Did you know?
Advice on calculator use
Exam-Style Question
Throughout the book, you will encounter worked examples and a host of rigorous exercises. The examples show you the important techniques required to tackle questions. The exercises are carefully graded, starting from a basic level and going up to exam standard, allowing you plenty of opportunities to practise your skills. Together, the examples and exercises put maths in a real-world context, with a truly international focus.
At the start of each chapter, you will see a list of objectives covered in the chapter. These are drawn from the Cambridge AS and A Level syllabus. Each chapter begins with a Beforeyoustart section and ends with a Summaryexercise and Chaptersummary, ensuring that you fully understand each topic.
Each chapter contains key mathematical terms to improve understanding, highlighted in colour, with full definitions provided in the Glossary of terms at the end of the book.
The answers given at the back of the book are concise. However, you should show as many steps in your working as possible. All exam-style questions have been written by the author.
About the author
James Nicholson is an experienced teacher of mathematics at secondary level, taught for 12 years at Harrow School as well as spending 13 years as Head of Mathematics in a large Belfast grammar school. He is the author of two A Level statistics texts, and editor of the ConciseOxfordDictionary ofMathematics. He has also contributed to a number of other sets of curriculum and assessment materials, is an experienced examiner and has acted as a consultant for UK government agencies on accreditation of new specifications.
James ran schools workshops for the Royal Statistical Society for many years, and has been a member of the Schools and Further Education Committee of the Institute of Mathematics and its Applications since 2000, including six years as chair, and is currently a member of the Community of Interest group for the Advisory Committee on Mathematics Education. He has served as a vice-president of the International Association for Statistics Education for four years, and is currently Chair of the Advisory Board to the International Statistical Literacy Project.
Student Book: Complete Probability & Statistics 2 for Cambridge International AS & A Level
Syllabus: Cambridge International AS & A Level Mathematics: Probability and Statistics 2 (9709)
PROBABILITY & STATISTICS 2
Syllabus overview Unit S2: Probability & Statistics 2 (Paper 6)
1. The Poisson distribution
• Calculate probabilities for the distribution Po(λ)
• Use the fact that if X ~ Po(λ) then the mean and variance of X are each equal to λ
• Understand the relevance of the Poisson distribution to the distribution of random events, and use the Poisson distribution as a model
• Use the Poisson distribution as an approximation to the binomial distribution where appropriate (n > 50 and np < 5, approximately)
• Use the normal distribution, with continuity correction, as an approximation to the Poisson distribution where appropriate (λ > 15, approximately)
2. Linear combinations of random variables
• Use, in the course of solving problems, the results that:
– E(aX + b) = aE(X) + b and Var(aX + b) = a2Var(X)
– E(aX + bY) = aE(X) + bE(Y)
– Var(aX + bY) = a2Var(X) + b2Var(Y) for independent X and Y
– if X has a normal distribution then so does aX + b
– if X and Y have independent normal distributions then aX + bY has a normal distribution
– if X and Y have independent Poisson distributions then X + Y has a Poisson distribution
3. Continuous random variables
• Understand the concept of a continuous random variable, and recall and use properties of a probability density function (restricted to functions defined over a single interval)
• Use a probability density function to solve problems involving probabilities, and to calculate the mean and variance of a distribution (explicit knowledge of the cumulative distribution function is not included, but location of the median, for example, in simple cases by direct consideration of an area may be required)
Student Book
Pages 3–9
Pages 10–11
Pages 12–15
Pages 21–22
Pages 23–25
Pages 29–33
Pages 34–38
Pages 34–38
Pages 50–54
Pages 50–54
Pages 47–49
Pages 59–61
Pages 61–73
4. Sampling and estimation
• Understand the distinction between a sample and a population, and appreciate the necessity for randomness in choosing samples
• Explain in simple terms why a given sampling method may be unsatisfactory (knowledge of particular sampling methods, such as quota or stratified sampling, is not required, but candidates should have an elementary understanding of the use of random numbers in producing random samples)
• Recognise that a sample mean can be regarded as a random variable, and use the facts that E(X –) = μ and that Var(X –) = σ 2 n
• Use the fact that X –has a normal distribution if X has a normal distribution
• Use the Central Limit Theorem where appropriate
• Calculate unbiased estimates of the population mean and variance from a sample, using either raw or summarised data (only a simple understanding of the term ‘unbiased’ is required)
• Determine and interpret a confidence interval for a population mean in cases where the population is normally distributed with known variance or where a large sample is used
• Determine, from a large sample, an approximate confidence interval for a population proportion
5. Hypothesis tests
• Understand the nature of a hypothesis test, the difference between one-tail and two-tail tests, and the terms null hypothesis, alternative hypothesis, significance level, rejection region (or critical region), acceptance region and test statistic
• Formulate hypotheses and carry out a hypothesis test in the context of a single observation from a population which has a binomial or Poisson distribution, using – direct evaluation of probabilities – a normal approximation to the binomial or the Poisson distribution, where appropriate
• Formulate hypotheses and carry out a hypothesis test concerning the population mean in cases where the population is normally distributed with known variance or where a large sample is used
• Understand the terms Type I error and Type II error in relation to hypothesis tests
• Calculate the probabilities of making Type I and Type II errors in specific situations involving tests based on a normal distribution or direct evaluation of binomial or Poisson probabilities
Pages 77–79
Pages 79–86
Pages 86–94
Pages 94–96
Pages 96–100
Pages 109–115
Pages 116–121
Pages 121–124
Pages 128–137
Pages 142–149
Pages 153–161
Pages 137–139
Pages 140–141
The Poisson distribution 1
Objectives
After studying this chapter you should be able to:
● Calculate probabilities for the distribution Po(λ).
The Poisson distribution can be used to (at least approximately) model a large number of natural and social phenomena. You might not expect the number of photons arriving at a cosmic ray observatory, the number of claims made to an insurance company, the number of earthquakes of a given intensity and the number of atoms decaying in a radioactive material to have much in common, but they are all examples of this distribution. The photo is of VERITAS – Very Energetic Radiation Telescope Array in Arizona – which is helping to shape our understanding of how subatomic particles like photons are accelerated to extremely high energy levels.
● Use the fact that if X ~ Po(λ) then the mean and variance of X are each equal to λ.
● Understand the relevance of the Poisson distribution to the distribution of random events, and use the Poisson distribution as a model.
Before you start You should know how to: Skills check:
1. Use your calculator to work out values of exponential functions, e.g.
Find the value of e−2.5 e−2.5 = 0.0821 (3 s.f.)
2. Substitute values into more complex formulae, e.g.
Find the value of × = 2.54 e2.5 4! p × 2.54 e2.5 4! = × 008213906 24 = 0.134(3d.p.)
1. Find the value of: a) e−3 b) e−2.1
2. Find the value of × = 35e3 5! p
1.1 Introducing the Poisson distribution
Think about the following random variables:
● The number of dandelions in a square metre of a piece of open ground.
● The number of errors in a page of a typed manuscript.
● The number of cars passing a point on a motorway in a minute.
● The number of telephone calls received by a company switchboard in half an hour.
● The number of lightning strikes in an area over a year.
Do they have any features in common? Does any one of them stand out as being rather different?
The behaviour in five of these photos follows the Poisson distribution.
Formally, the conditions are that i) events occur at random ii) events occur independently of one another iii) the average rate of occurrences remains constant iv) there is zero probability of simultaneous occurrences.
The Poisson distribution is defined as λλ == e P()! r r Xr for r= 0, 1, 2, 3, …
You need to have a value for λ in order for this to make sense, so there is a family of Poisson distributions but there is only one parameter, λ, which is the mean number of occurrences in the time period (or length, area or volume) being considered.
You can write the Poisson distribution as X ~ Po(λ).
Example 1
If X ~ Po(3) find P(X= 2).
Example 2
The number of cars passing a point on a road during a 5-minute period may be modelled by the Poisson distribution with parameter 4. Find the probability that in a 5-minute period i) 2 cars go past ii) fewer than 3 cars go past. X ~ Po(4)
In the first photo of the middle pair the traffic is free flowing and cars can overtake when they want to but in the second photo there is much less randomness because the traffic is so heavy it is all travelling at almost the same speed.
You will look later in this chapter in more detail at cases where the four conditions listed here are not met exactly.
< 3)
(3
Mathematical note: It is not immediately obvious from the mathematics you cover in this course that the form of the Poisson distribution constitutes a probability distribution – remember from S1 Chapter 5 this requires all probabilities to be non-negative (which they obviously all are here because exp(–λ) > 0 for any value of λ) but also that the sum of the probabilities is 1.
==
Xr for r= 0, 1, 2, 3, … is a probability distribution because
– this is an example of an advanced topic in Pure Maths where functions like exponentials, logarithms and the trigonometric functions have (infinite) power series forms. Truncated forms of these infinite series are how electronic calculators obtain values of these functions.
Exercise 1.1
1. If X ~ Po(2) find i) P(X = 1) ii) P(X = 2) iii) P(X = 3).
2. If X ~ Po(1.8) find i) P(X = 0) ii) P(X = 1) iii) P(X = 2).
3. If X ~ Po(5.3) find i) P(X = 3) ii) P(X = 5)
iii) P(X = 7).
4. If X ~ Po(0.4) find i) P(X = 0) ii) P(X = 1) iii) P(X = 2).
5. If X ~ Po(2.15) find i) P(X = 2) ii) P(X = 4) iii) P(X = 6).
6. If X ~ Po(3.2) find i) P(X = 2) ii) P(X ≤ 2) iii) P(X ≥ 2).
7. The number of telephone calls arriving at an office switchboard in a 5-minute period may be modelled by a Poisson distribution with parameter 3.2. Find the probability that in a 5-minute period
a) exactly 2 calls are received b) more than 2 calls are received.
8. The number of accidents which occur on a particular stretch of road in a day may be modelled by a Poisson distribution with parameter 1.3. Find the probability that on a particular day
a) exactly 2 accidents occur on that stretch of road b) fewer than 2 accidents occur.
Note: This is beyond the requirements of the syllabus.
1.2 The role of the parameter of the Poisson distribution
The mean number of events in an interval of time or space is proportional to the size of the interval.
Example 2 in Section 1.1 looked at the number of cars passing a point on a road during a 5-minute period. This may be modelled by the Poisson distribution with parameter 4.
In this case, the number of cars passing that point in a 20-minute period may be modelled by the Poisson distribution with parameter 16, and in a 1-minute period may be modelled by the Poisson distribution with parameter 0.8.
If the conditions for a Poisson distribution are satisfied in a given period, they are also satisfied for periods of different length.
Example 3
The number of accidents in a week on a stretch of road is known to follow a Poisson distribution with mean 2.1.
Find the probability that
a) in a given week there is 1 accident
b) in a two week period there are 2 accidents
c) there is 1 accident in each of two successive weeks.
a) In one week, the number of accidents follows a Po(2.1) distribution, so the probability of 1 accident = = 211e2.1 1! 0.257(3sf).
b) In two weeks, the number of accidents follows a Po(4.2) distribution, so the probability of 2 accidents = = 422e4.2 2 ! 0.132(3s.f.) .
If the average rate of occurrences remains constant, then the mean number of occurrences in an interval will be proportional to the length of the interval.
c) This cannot be done directly as a Poisson distribution since it says what has to happen in each of two time periods, but these are the outcomes considered in part a).
So the probability this happens in two successive weeks is
This is considerably less than the probability in part b), and this is because ‘1 accident in each of two successive weeks’ will give ‘2 accidents in a two week period’ but so will having none then two, or having two and then none, so the total probability that in a two week period there are 2 accidents must be higher than specifying there will be 1 in each week.
Example 4
The number of flaws in a metre length of dress material is known to follow a Poisson distribution with parameter 0.4.
Find the probabilities that
a) there are no flaws in a 1 metre length
b) there is 1 flaw in a 3 metre length
c) there is 1 flaw in a piece of material which is half a metre long.
a) X ~ Po(0.4) ⇒ P(X × ⇒=== 0.40 e0.4~Po(0.4)0!P(0)0.670 XX (3 s.f.).
b) Y ~ Po(1.2) ⇒ P(Y × ⇒=== 1.21 e1.2~Po(1.2)1!P(1)0.361 YY (3 s.f.).
c) Z ~ Po(0.2) ⇒ P(Z × ⇒=== 0.21 e0.2~Po(0.2)1!P(1)0.164 ZZ (3 s.f.).
Exercise 1.2
It is good practice to define new variable names when the interval changes. While all three of these calculations relate to the same basic situation, they all use different Poisson distributions and this is a simple way to stop confusion arising.
1. The number of telephone calls arriving at an office switchboard in a 5-minute period may be modelled by a Poisson distribution with parameter 1.4. Find the probability that in a 10-minute period
a) exactly 2 calls are received
b) more than 2 calls are received.
2. The number of accidents which occur on a particular stretch of road in a day may be modelled by a Poisson distribution with parameter 0.4.
Find the probability that during a week (7 days)
a) exactly 2 accidents occur on that stretch of road
b) fewer than 2 accidents occur.
3. The number of letters delivered to a house on a day may be modelled by a Poisson distribution with parameter 0.8.
a) Find the probability that there are 2 letters delivered on a particular day.
b) The home owner is away for 3 days. Find the probability that there will be more than 2 letters waiting for him when he gets back.
4. The number of errors on a page of a booklet can be modelled by a Poisson distribution with parameter 0.2.
a) Find the probability that there is exactly 1 error on a given page.
b) A section of the booklet has 7 pages. Find the probability that there are no more than 2 errors in the section.
c) The booklet has 25 pages altogether. Find the probability that the booklet contains exactly 6 errors altogether.
5. The number of people calling a car breakdown service can be modelled by a Poisson distribution, and the service has an average of 6 calls per hour. Find the probability that in a half-hour period
a) exactly 2 calls are received
b) more than 2 calls are received.
1.3 The recurrence relation for the Poisson distribution
This is not directly in the course, but it is a useful property of the Poisson to be aware of, and gives some insight into why the Poisson distribution has the shape that it does.
You can calculate probabilities for a Poisson distribution in sequence using a recurrence relation.
Example 5
If X ~ Po(λ)
a) write down the probability that i) X = 3 and ii) X = 4
b) write P(X= 4) in terms of P(X= 3).
a) i)
b)
The general relationship is
The graphs on the next page show the probability distributions for different values of λ and what effect changing the value of λ has on the shape of a particular Poisson distribution.
Poisson, = 1.2 [= E(X)]
0123456789 10 11 12
Poisson, = 2.5 [= E(X)]
0123456789 10 11 12
Poisson, = 4 [= E(
All Poisson variables have a sample space which is all of the non-negative integers. However, when λ is relatively low, the probabilities tail off very quickly.
1.2 2
1.2 3 0.4; 1.2 4 0.3; so the initial probability that X = 0 is multiplied by 1.2, then 0.6, then 0.4, 0.3, … and so the mode of X = 1.
Here λ is larger than in the previous graph and the peak has moved across to the right. For values of X which are less than λ the probability increases, but once x is greater than λ the probabilities start to decrease. More values of x have a noticeable probability, so the highest individual probability is not as large as it was in the previous graph and the distribution is more spread out.
What happens when λ is an integer?
Here P(X= 4) = P(X= 3) × 4 4 = P(X= 3) and the distribution has two modes – at 3 and 4. Generally, the mode of the Poisson (λ) distribution is at the integer below λ when λ is not an integer and there are two modes (at λ and λ – 1) when it is an integer.
λ < 1 is a special case.
Here even the first time the recurrence relation is used you are multiplying by < 1, so the mode will be 0 and the probability distribution is strictly decreasing for all values of x.
The general forms for the probabilities of 0 and 1 for a Poisson distribution are
Example 6
X ~ Po(λ) and P(X= 6) = 2 × P(X= 5). Find the value of λ.
P(X= 6) = λ 6 × P(X= 5) so λ 6 = 2 and λ = 12.
Example 7
X ~ Po(5.8). State the mode of X.
Since 5.8 is not an integer, the mode is the integer below it, i.e. the mode is 5.
Exercise 1.3
1. X ~ Po(2.5)
a) Write down an expression for P(X= 4) in terms of P(X= 3).
b) If P(X= 3) = 0.214, calculate the value of your expression in part a).
c) Calculate P(X= 4) directly and check it is the same as your answer to b).
d) What is the mode of X?
2. X ~ Po(5)
a) Write down an expression for P(X= 5) in terms of P(X= 4).
b) Explain why X has two modes at 4 and 5.
3. X ~ Po(λ) and P(X= 4) = 1.2 × P(X= 3).
a) Find the value of λ.
b) What is the mode of X?
1.4 Mean and variance of the Poisson distribution
If X ~ Po(λ), then λλ σλ == ⇒= XX E( ); Var( )st. dev. () .
A special property of the Poisson distribution is that the mean and variance are always equal.
Example 8
The number of calls arriving at a company’s switchboard in a 10-minute period can be modelled by a Poisson distribution with parameter 3.5.
Give the mean and variance of the number of calls which arrive in i) 10 minutes ii) an hour iii) 5 minutes.
i) Here λ = 3.5 so the mean and variance will both be 3.5.
ii) Here λ = 21 (= 3.5 × 6) so the mean and variance will both be 21.
iii) Here λ = 1.75 (= 3.5 ÷ 2) so the mean and variance will both be 1.75.
Example 9
A dual carriageway has one lane blocked off because of roadworks.
The number of cars passing a point in a road in a number of 1-minute intervals is summarised in the table.
a) Calculate the mean and variance of the number of cars passing in 1-minute intervals.
b) Is the Poisson likely to provide an adequate model for the distribution of the number of cars passing in 1-minute intervals?
a)
b) The mean and variance are not numerically close so it is unlikely the Poisson will be an adequate model (with only one lane open for traffic, overtaking cannot happen on this stretch of the road and the numbers of cars will be much more consistent than would happen in normal circumstances – hence the variance is much lower than would be expected if the Poisson model did apply).
Derivation of mean and variance of the Poisson distribution
You must be able to use these results but are not required to be able to prove them – they are included here for completeness, and as a nice manipulation using the power series expression for the exponential function.
, after discarding the zero case
Then Var(X) = λ2 + λ − λ2 = λ.
Exercise 1.4
1. If X ~ Po(3.2) find i) E(X) ii) Var(X).
2. If X ~ Po(49) find the mean and standard deviation of X.
3. X ~ Po(3.6)
a) Find the mean and standard deviation of X.
b) Find P(X > μ), where μ = E(X).
c) Find P(X > μ + 2σ), where σ is the standard deviation of X.
d) Find P(X < μ − 2σ).
4. X is the number of telephone calls arriving at an office switchboard in a 10minute period. X may be modelled by a Poisson distribution with parameter 6.
a) Find the mean and standard deviation of X.
b) Find P(X > μ), where μ = E(X).
c) Find P(X > μ + 2σ), where σ is the standard deviation of X.
d) Find P(X < μ − 2σ).
5. Compare your answers to part d) of questions 3 and 4.
1.5 Modelling with the Poisson distribution
The Poisson distribution describes the number of occurrences in a fixed period of time or space if the events occur independently of one another, at random and at a constant average rate.
Standard examples of situations in real-life which can often be modelled reasonably by the Poisson distribution include: radioactive emissions, traffic passing a fixed point, telephone calls or letters arriving, and accidents occurring.
Example 10
The maternity ward of a hospital wanted to work out how many births would be likely to happen during a night.
The hospital has 3000 deliveries each year, so if these happen randomly around the clock 1000 deliveries would occur between the hours of midnight and 8 am. This is the time when many staff are off duty and it is important to ensure that there will be enough people to cope with the workload on any particular night.
The average number of deliveries per night is 1000 365 , which is 2.74.
From this average rate the probability of delivering 0, 1, 2, etc. babies each night can be calculated using the Poisson distribution. If X is a random variable representing the number of deliveries per night, some probabilities are
P(X = 0) = 2.740 × –2.74 e 0! = 0.065
P(X = 1) = 2.741 × –2.74 e 1! = 0.177
P(X = 2) = 2.742 × –2.74 e 2! = 0.242
P(X = 3) = 2.743 × –2.74 e 3! = 0.221.
i) On how many days in the year would 5 or more deliveries be likely to occur?
ii) Over the course of one year, what is the greatest number of deliveries likely to occur at least once?
iii) Why might the pattern of deliveries not follow a Poisson distribution?
i) 52 = 365 × P(X ≥ 5).
ii) 8 – the largest value for which the probability is greater than 1 365 .
iii) If deliveries were not random throughout the 24 hours, e.g. if a lot of women had labour induced or had elective caesareans done during the day.
Did you know?
An elective caesarean is planned in advance for some births which are expected to be difficult.
In this real-life example, deliveries in fact followed the Poisson distribution very closely, and the hospital was able to predict the workload accurately.
The conditions for the Poisson distribution are that i) events occur at random ii) events occur independently of one another iii) the average rate of occurrences remains constant iv) there is zero probability of simultaneous occurrences. As with the distributions you met in S1, the Poisson can be a useful model for a situation even when these conditions are not met perfectly.
Be careful:
Some change in the underlying conditions may alter the nature of the distribution, e.g. traffic observed close to a junction, or where there are lane restrictions and traffic is funnelled into a queue travelling at constant speed.
The underlying conditions may be distorted by interference from other effects, e.g. if a birthday or Christmas occurs during the period considered then the Poisson conditions would not be reasonable for the arrival of letters by post, for instance.
Randomness or independence may be lost due to a difference in the average rate of occurrences, e.g. the rate of traffic accidents occurring would be expected to vary somewhat as road conditions vary.
Example 11
The number of cyclists passing a remote village post-office during the day can be modelled as a Poisson random variable. On average two cyclists pass by in an hour.
a) What is the probability that i) no cyclist passes ii) more than three cyclists pass by between 10 and 11 am?
b) What is the probability that exactly one passes by while the shop-keeper is on a 20-minute tea-break?
c) What is the probability that more than three cyclists pass by in an hour exactly once in a 6-hour period?
c) The situation is that of a binomial distribution – there are 6 ‘trials’, the number of cyclists in each hour is independent of the other periods, and the probability of more than 3 in an hour remains the same for all the 6-hour periods, i.e. if Y = number of times that more than 3 cyclists pass by in an hour exactly once in a 6-hour period Y ~ B(6, 0.1429) (using the probability calculated in part a) ii).
This cannot be treated as a single Poisson with parameter 12 since it specifies a particular event to be considered in each one hour time period separately.
At a certain harbour the number of boats arriving in a 15-minute period can be modelled by a Poisson distribution with parameter 1.5.
a) Find the probability that exactly six boats will arrive in a period of an hour.
b) Given that exactly six boats arrive in a period of an hour, find the conditional probability that twice as many arrive in the second half hour as arrive in the first half hour.
a) In an hour the average number of boats arriving is 6, so
P(6 boats arrive in an hour) = = 66e6 6! 0.161
b) If twice as many arrive in the second half hour, then there needs to be 2 in a half-hour period and then 4 in the next half hour, so
P(2 boats arrive in half hour, then 4 boats in next half hour)
=×=×= 3234 e3e3 2!4! 0.2240.1680.0376.
Then the conditional probability is
P(2 then 4 in half hour | 6 boats arrive in an hour) = = 0.0376 0.161 0.234.
Exercise 1.5
1. For the following random variables state whether they can be modelled by a Poisson distribution.
If they can, give the value of the parameter λ; if they cannot then explain why.
a) The average number of cars per minute passing a point on a road is 12. The traffic is flowing freely.
X = number of cars which pass in a 15 second period.
b) The average number of cars per minute passing a point on a road is 14.
There are roadworks blocking one lane of the road.
X = number of cars which pass in a 30 second period.
c) Amelie normally gets letters at an average rate of 1.5 per day.
X = number of letters Amelie gets on December 22nd.
d) A petrol station which stays open all the time gets an average of 832 customers in a 24 hour time period.
X = number of customers in a quarter of an hour at the petrol station.
Amelie lives in a country where Christmas is a major festival on December 25th each year.
e) An A&E department in a hospital treats 32 patients an hour on average.
X = number of patients treated between 5 pm and 7 pm on a Friday evening.
2. For the following situations state what assumptions are needed if a Poisson distribution is to be used to model them, and give the value of λ that would be used.
You are not expected to do any calculations!
a) On average defects in a roll of cloth occur at a rate of 0.2 per metre. How many defects are there in a roll which is 8 m long?
b) On average defects in a roll of cloth occur once in 2 metres. How many defects are there in a roll which is 8 m long?
c) A small shop averages 8 customers per hour. How many customers does it have in 20 minutes?
3. An explorer thinks that the number of mosquito bites he gets when he is in the jungle will follow a Poisson distribution.
The explorer records the number of mosquito bites he gets in the jungle during a number of hour-long periods, and the results are summarised in the table.
Number of bites 0 1 2 3 4 5 6 ≥7 Frequency 3 7 9 6 6 3 1 0
a) Calculate the mean and variance of the number of bites the explorer gets in an hour in the jungle.
b) Do you think the Poisson is a good model for the number of bites the explorer gets in an hour in the jungle?
4. The number of emails Serena gets can be modelled by a Poisson distribution with a mean rate of 1.5 per hour.
a) i) What is the probability that Serena gets no emails between 4 pm and 5 pm?
ii) What is the probability that Serena gets more than 2 emails between 4 pm and 5 pm?
iii) What is the probability that Serena gets one email between 6 pm and 6.20 pm?
b) What is the probability that Serena gets more than 2 emails in an hour exactly twice in a 5-hour period?
c) Would it be sensible to use the Poisson distribution to find the probability that Serena gets no emails between 4am and 5am?
5. The number of lightning strikes in the neighbourhood of a campsite in a week can be modelled by a Poisson distribution with parameter 1.5.
a) Find the probability that there is exactly one lightning strike in the neighbourhood in a given week.
b) Alejandra spends three weeks at the campsite. Find the probability that there are exactly three lightning strikes in the neighbourhood during her holiday.
c) Given that the neighbourhood has exactly three lightning strikes during her holiday, find the conditional probability that each week has exactly one strike.
Summary exercise 1
1. If X ~ Po(1.45) find
a) P(X = 2)
b) P(X ≤ 2)
c) P(X ≥ 2).
2. If X ~ Po(3.2)
a) find i) P(X = 0) ii) P(X = 1) iii) P(X > 2).
b) For X, write down the i) mean ii) variance iii) standard deviation.
c) Explain why the mode of X is 3.
d) Find
i) P(X < μ) ii) P(|X – μ| < σ) where μ is the mean and σ is the standard deviation of X.
3. If X ~ Po(6.4)
a) find i) P(X ≤ 3) ii) P(X = 6).
b) For X, write down the i) mean ii) variance iii) standard deviation.
c) Write down the mode of X.
d) Find i) P(X < μ – σ)
ii) P(|X – μ| < σ), where μ is the mean and σ is the standard deviation of X.