Section 7.2 Sampling Distribution of the Sample Mean The Central Limit Theorem It seems reasonable to estimate the mean µ of a population by using a sample mean from a representative simple random sample drawn from the population. For example, I might estimate the mean height µ for all APSU students enrolled in spring 2011 by taking a SRS of 50 students and finding the mean height for those 50 students. Recall that we call this a point estimate for µ. So how good is our estimate? Well, the sample mean is a random variable (it varies from sample to sample) and so it has a distribution. Knowing the distribution of the sample mean helps us to know how good our estimate is. Let’s look at an example to see what we can say about the mean and standard deviation of the distribution of the sample mean . Another example will help us see what the shape of the distribution should be. Example 1 The heights in inches of 5 starting players on a men’s basketball team are as follows. Alfred: 76 Bob: 79 Carl: 85 Dennis: 82 Edgar: 78 We will now answer the following questions. The population mean height µ = _____________________. How many samples of size n = 2 players can be chosen? __________________ Now list all the possible samples of size 2 and then calculate the sample mean. The mean of the sample means , µ( ), is _____________________. What you have just noticed is not a coincidence! The mean of all the sample means is always equal to the population mean. The population standard deviation of the players’ heights is ____________________. The standard deviation of the mean heights is ______________________. These numbers are different! It turns out that the standard deviation for the distribution of sample means depends on n, the sample size. If the sample size is small relative to the population size, then for samples of size n, the standard deviation for the distribution of the sample means is given by σ( ) = n the mean. , where σ is the population standard deviation. We will call this the exact standard error of Notice that, because we are dividing by , the larger the size n of the sample, the closer the sample means will be packed around the population mean. Example Suppose that in Tennessee the mean living space for a single family detached home is µ = 1742 ft2 with a standard deviation of σ = 568 ft2. a) For samples of size n = 25, give the mean and standard deviation for the distribution of sample means. µ( ) = ______________ σ( ) = __________________ b) For samples of size n = 500, give the mean and standard deviation for the distribution of sample means. µ( ) = ______________ σ( ) = __________________ Shape of the distribution of the sample means So now we know what the mean and standard deviation are for the distribution of sample means from samples of size n. But what shape does that distribution have? Is it unimodal? bimodal? symmetric? Let’s look at an example to see if we can find out. Example Consider the following table giving the number of people per household and the relative frequency for each number. # of people 1 2 3 4 5 6 7 Relative freq. .232 .317 .175 .154 .073 .030 .019 What are the population mean and standard deviation? [Enter one column in L1 on your calculator, the other in L2 and do 1-var stat L1,L2 on your home screen to get the answers.] You should get µ = 2.685 persons and σ = 1.47 persons. What shape distribution does this population have? To get an idea we’ll take a simple random sample of size n = 1000 (using Minitab) to see what it might look like. What shape do you see? Now let’s take 10,000 samples of size n = 30, calculate the mean for each sample and look at the distribution of those means. (Again, using Minitab) What shape does this distribution have? Is it the same as the population? *We have seen that the sample means are approximately normally distributed with a mean µ( ) = 2.685 (the population mean) and σ( ) = 0.2684 (the population standard deviation divided by the square root of the sample size). If we were to increase our sample size, the distribution would still be approximately normal, centered at the population mean of 2.685 persons, but the standard deviation will decrease, so the distribution ‘tightens’ around 2.685. (There is less variation in sample means as the sample size gets larger.) This is the content of the ‘fundamental theorem of statistics’ the Central Limit Theorem (CLT). CLT As the sample size n increases, the sample mean has a distribution that tends toward a normal distribution N(µ( ),σ( )) where µ( ) = population mean and σ( ) = population standard deviation divided by the square root of the sample size: σ/ . Note: If the population distribution is itself normal or very nearly so, then the distribution of will have a normal model for samples of any size. In general we can use a normal model if the sample size n is at least 30, regardless of the population distribution! If the population is ‘somewhat normal’, then we can use a normal model even for a sample size of 10 or 12. Example Suppose that for adults the mean weight is 175 lb with a standard deviation of 25 lb and that the weights have approximately a normal distribution. An elevator has a weight limit of 10 people or 2000 lb. What is the probability that the 10 people who get on the elevator will go over its weight limit? Solution: We are really asking ‘What is the probability that the mean weight of a sample of 10 people is more than 200 lb (2000 lb/10)?” We will assume that the 10 people are a random sample and that the weights are independent. [Is this always necessarily so? Think of an elementary school field trip, a football team at a hotel, a weight loss clinic on the 4th floor etc.] Our population mean is 175 lb and our population standard deviation is 25 lb. Since our population distribution is approximately normal, the CLT says we can use a normal model for the distribution of sample means from samples of size n = 10. This model will have mean = 175 lb (the population mean) and standard deviation = 25/ = 7.91 lb, correct to two decimal places. To see where a mean of 200 lb would be in this distribution we calculate its z-score. 200 175 3.16 . Thus, P( > 200) = P(z > 3.16) = .0008. Our conclusion is that there only a very 7.91 slight chance that 10 people would overload the elevator. z= Sampling Distribution Models: A Recap The statistic (mean, proportion, etc.) is a random variable. The sampling distribution shows us the distribution of possible values the statistic could have. For the sample mean and the sample proportion the CLT tells us that we can model the sampling distribution with a normal model for samples of an appropriate size. Key idea: The CLT states that the sampling distribution model for the sample mean (and the sample proportion) is approximately normal for large n, regardless of the shape of the population distribution, as long as the observations are independent. Note: A proportion can always be viewed as a mean by letting a ‘success’ be indicated by a 1, a ‘failure’ by a 0. Then the mean of the 1’s and 0’s gives the proportion of successes!

© Copyright 2020