STAT 311 Week 14 Worksheet Solutions: Bringing Everything Together Confidence Intervals 1. The student body president at a large Midwestern university would like to determine if the students at his school support increasing the cost of student parking permits if it would fund building a new parking garage. The student government takes a simple random sample of 100 students and finds that the 47 of them support the increase. a.) Calculate a 95% confidence interval for the true proportion of students at that university that support the increase. Be sure to clearly communicate about the parameter in presenting your interval. The first step is always to identify the population and parameter. The population is the group of subjects that we are studying. The first sentence tells us that the people of interest, the population, are all students at a large Midwestern university. The parameter is what (a numerical summary) about the population we are interested in finding out about. In this case, the first sentence tells us that we want to know “the (TRUE) proportion of students at the school that support increasing the cost of student parking permits if it would fund building a new parking garage." Since we are dealing with a proportion, we can take out our formula sheets and find the two sections about proportions. Figure 0.1: Formula Sheet Clip for Sampling Distribution of the Sample Proportion On the front page of the formula sheet, we find information about the sampling distribution of the the sample proportion. This section of the formula sheet contains the information necessary for determining if it is appropriate to conduct inference (calculate a confidence interval or perform a hypothesis test). It is appropriate to calculate the confidence interval for a sample proportion if when the underlying sampling distribution is approximately normal, which happens when: • We have a random sample ˆ ≥ 10. • The sample size is large enough: n pˆ ≥ 10 AND n qˆ = n(1 − p) 1 For this problem, we are told that “the student government takes a simple random sample" and thus the first condition is satisfied. Is the sample large enough? We are told that the sample size is 100 47 = .47). If we then check n pˆ = 100 ∗ .47 = (n=100) and that 47 students supported the increase (pˆ = 100 ˆ = 100 ∗ .53 = 53 ≥ 10, we can see that it is appropriate to calculate the confi47 ≥ 10 and n qˆ = n(1 − p) dence interval in this case. To actually calculate the interval, we find the section of the formula sheet dealing with confidence intervals for proportions. Figure 0.2: Formula Sheet Clip for Hypothesis Testing and Confidence Intervals for Proportions Here we see the formula: s pˆ ± z ∗ pˆ qˆ n ˆ OR pˆ ± Margin of Error OR pˆ ± z ∗ SE (p) We can calculate each of these pieces separately and put them together. The first step is to calculate the standard error, which is: s ˆ = SE (p) pˆ qˆ = n s ˆ − p) ˆ p(1 = n r .47 ∗ .53 = 0.04991 100 Next, we need to calculate the appropriate z ∗ value. Remember, for a confidence interval, this is a value that you get off of the Z-table, NOT a quantity that you calculate. The z ∗ value for a 95% interval is 1.96. (If you do not know how to find this number, make sure to find out before the final in either the tutoring center or by asking another student.) ˆ = Now that we have a z ∗ and a standard error, we can calculate the Margin of Error, MOE = z ∗ SE (p) 1.96 ∗ 0.04991 = 0.0978. Thus the confidence interval is: 2 s pˆ ± z ∗ pˆ qˆ → pˆ ± 1.96 ∗ 0.04991 → pˆ ± 0.0978 → (0.3722, 0.5678) n Once we have a confidence interval, the final step is to interpret in the context of the problem! We believe that the TRUE proportion of students at the Midwestern university that support increasing the cost of student parking permits to fund a new parking garage is between 37.22% and 56.78%. b.) A student senator who opposes the increase saw the results and said “it is clear that less than half of the students support this measure, therefore we should not increase the permit costs." Use your answer to part (a) to refute his statement. Be concise in your argument. What this student is saying is false. The confidence interval contains values above 0.5, indicating that there are values above 0.5 that are plausible values for the population proportion. What is the Point? The point of this problem is to get you to take your Statistics 311 knowledge out into the real world. In whatever field you will be working in, it is probable that at some point you will come across statistics of some sort. You may forget the specific formulas and some of the details, but hopefully you can remember the basic idea behind confidence intervals. The idea is that we have some parameter that we want to study. To study the parameter we take a sample and calculate a sample statistic. We know that every sample we take will give us different sample statistics. If we can obtain a confidence interval (whether we calculate it ourselves or have it provided to us) we know that it tells us the plausible values of our parameter! Often, you will see confidence intervals often around election times when news organizations report the results of a poll. For example, suppose a news organization reports that President Obama leads in the polls with 48% of the vote with a margin of error of ± 4%. This mean that the poll estimates that the TRUE proportion of all voters that will vote for President Obama is between 44% and 52%!) Hypothesis Testing 2. A real estate broker was interested in the typical size of homes in Wake County. Specifically she had been told that homes were typically around 1500 square feet. She believes that perhaps the size was actually larger than 1500 square feet and took a random sample of 25 homes in Wake County to try to verify that idea. Since home size is likely to be skewed she decides to use the median to measure “typical." Here is the output from a hypothesis test that was conducted: Hypothesis test results: Parameter: median of Variable H0 : Parameter = 1500 H A : Parameter > 1500 3 Variable n n for test Sample Median Below Equal Above p-val Sq.Ft. 25 25 1740 9 0 16 0.1148 a.) If the median of the population is really 1500 how many of the 25 observations would you expect to be less than 1500? First, let’s think for a second why we are discussing medians as a parameter and not means. Well, we are told that the population of home sizes in Wake county are skewed AND that we are only taking a sample of size 25. Thus, it is not appropriate to use methods we have learned about for means! Recall that one of the strengths of the median is that the median is not influenced by outliers, whereas the mean is! Now let’s contemplate why the population of home sizes is skewed, which may provide some insight into the nature of our problem. Most people have a salary between $30,000 and $120,000 and thus live in a “smallish" to medium sized house (maybe between 1500-2500 sq. ft.), as that is what they can afford. A very small number of people make astronomical amounts of money and build very very big houses (maybe around 10,000 sq. ft.). This makes for a population distribution that is skewed to the right and looks like the figure below. 0e+00 3e−04 Density Population of Home Sizes in Wake County 0 2000 4000 6000 8000 10000 Home Size (Sq. Ft.) Since the population is so heavily skewed and we are taking a “small" sample size, we do not want our results to be messed up by one really big house that could potentially be randomly selected to be included in our data. Thus, the median is a great choice for the parameter of interest in this case as it helps us to learn about what a “typical" house size in Wake County is. The median is the “middle" of our sample. Since we have 25 observations in our sample, then the median is the thirteenth number. So if 1500 sq. ft. really is the population median than we expect that around 12 or 13 observations to be less than 1500. b.) Since she is interested in the median the broker conducts what is called a Sign test. The Sign test is about the population median rather than the mean. She correctly carried out the test in software and the resulting output is given below. What conclusion can we draw from this output? For this problem, we are given output from a test (the Sign Test) that we never learned about. How4 ever, we do know something about hypothesis tests, p-values, and how to make conclusions about a hypothesis test. In this respect this problem is very much like every other hypothesis test we have done before, as the overall idea is the same but the details are a little different, we just need to adapt what we have learned to a new context. Here the parameter is the TRUE median home size in Wake County, and thus the population is ALL homes in Wake County. The Null Hypothesis states that the TRUE median home size in Wake County is 1500 sq. ft. whereas the Alternative Hypothesis states that the TRUE median home size in Wake County is larger than 1500 sq. ft. In the output, we are given a p-value of 0.1148 which is above our default significance level of 0.05. Thus, at the 5% significance level, we fail to reject the null hypothesis. That is, there is insufficient evidence to conclude that the TRUE median home size in Wake County is larger 1500 sq. ft. What is the Point? The point of this problem is to get you to take your Statistics 311 knowledge out into the real world. In whatever field you will be working in, it is probable that at some point you will come across statistics of some sort. You may forget about specific formulas and details, but hopefully you can remember the basic idea behind hypothesis testing and how to reach a conclusion about the population parameter. There is some parameter that we are interested in studying. If someone can formulate a question about that parameter, we can turn that question into a set of hypotheses. A test can be done (usually by a software program) that provides a p-value. From the p-value we can decide to reject the null hypothesis or fail to reject it and make an intelligent conclusion about the parameter! Common Mistakes on Midterm 2 and Things to Review • When communicating about the parameter of interest, be sure to clearly distinguish between population parameter and sample statistic. Use the word TRUE or POPULATION! We have focused extensively on three types of parameters: proportion, mean, and slope. When reading through a problem where you need to calculate something, determine which of these parameters is relevant to the question and go to the appropriate place on the formula sheet before performing ANY CALCULATIONS! • When performing a calculation, please show your work and be organized. You may know what your doing but if we cannot tell then we cannot give you credit. • Do not mix up Standard Error and Standard Deviation. Standard Error is an estimate of the Standard Deviation. Look on your formula sheet and identify both the Standard Deviation formulas and the Standard Error formulas. What is the difference? • Practice correctly stating the hypotheses. The hypotheses are about population parameters and NOT about sample statistics. The population parameters represents an unknown quantity that we want to learn about and a statistic is a number that we calculated from a sample. A hypothesis is a question that we ask about a parameter! The null hypothesis represents the “status quo" or “general consensus" whereas the alternative hypothesis is a claim that we hope to prove is true. • P-values are always between 0 and 1, so if you have calculated a p-value that is not between 0 and 1 then something has gone wrong in your calculation! 5 • Test statistics that are very very large (greater than 25) should make you think to double check your calculations. • Extrapolation is when you try to predict the response variable (y) at values of the explanatory variable (x) outside the range of your data. We CAN make predictions for values of the response for any value of X (the independent variable) that lies within the range of our data set, this is NOT extrapolation. • Given a p-value, you need to be able determine if you reject or do not reject the null, and then make an appropriate conclusion in the context of the problem. For any hypothesis test, you have a significance level α (usually α = 0.05) and a p-value. If your p-value is less than your significance level then you reject the null hypothesis as what you observed is very unlikely by random chance if the null hypothesis is true. (It is always important to remember that the hypothesis test is conducted assuming that the null hypothesis is true!) If your p-value is higher than your significance level then you fail to reject the null hypothesis. DO NOT ACCEPT THE NULL! (See Worksheet 10 and 11 solutions) • Review the connection between confidence intervals and two-sided hypothesis tests, as these are two ways of doing the same thing, reaching a conclusion about a population parameter. The confidence interval gives you a range of plausible values of your parameter. A hypothesis test looks at the null hypothesis and asks: Is this a plausible value for the parameter? Yes or No. If we are given a confidence interval, then for any two-sided hypothesis test, we will fail to reject any null hypothesis if the null value is contained in the interval and will reject any null hypothesis if the null value is not contained in the interval! (See Worksheet 11 Solutions) • Review how to recognize and perform a paired-t test. The machinery is very similar to a hypothesis test for means except that you instead work with the differenced data, so the hypotheses and the interpretation will change. (See the last problem on Worksheet 11 Solutions) • Review how to find p-value when the test statistic is negative, in particular for a t-test. The alternative tells us “where" in the null distribution to use. It almost ALWAYS helps to DRAW PICTURES, students that draw pictures very rarely make mistakes on calculating p-values! • Review how to find the p-value from the randomization test output. (There were a few quiz questions where you had to do this.) • Review the relationship between p-values from a one and two-sided test. If you have the p-value for a one-sided test, multiply it by 2 to get the p-value for the two-sided test. If you have a pvalue from a two-sided test, divide the p-value by 2 to get the one-sided p-value. Draw the null distribution to see why this is true. • Review the meaning of R 2 , know what it is and how to find it in the output. Also, know how to use R-sq to compare two models. (See Worksheet 12 Solutions) 6

© Copyright 2018