 # STAT 100, Section 4 Sample Final Exam Questions, Part 2 Fall 2012

```STAT 100, Section 4
Sample Final Exam Questions, Part 2
The following questions are similar to the types of questions you will see on the final
exam. The actual final will consist of 70 to 80 multiple-choice questions.
NOTE: On the actual exam, the choices will not require calculator math.
On this sample exam, some of the choices may require a calculator.
Question 61. It is known that in the presidential elections of 1992, 56% of all eligible
adults actually voted. If we collect a large number of simple random samples each of
voted will fall between:
(A) .56 ± 4 ×
p
(B) .56 ± 1 ×
p
(C) .56 ± 2 ×
p
(D) .56 ± 3 ×
p
Question 62.
result.
.56(1 − .56)/1600, or 0.512 and 0.608.
.56(1 − .56)/1600, or 0.548 and 0.572.
.56(1 − .56)/1600, or 0.536 and 0.584.
.56(1 − .56)/1600, or 0.524 and 0.596.
If a result is statistically significant, this means that it is an important
(A) True
(B) False
Question 63. Lee Salk exposed one group of newly born infants, the treatment group, to
the sound of a human heartbeat. Next, Salk compared their weight gains to those of
a group of newly born infants not exposed, the control group. Salk concluded from his
data that the treatment group had higher mean weight gain than the control group.
The error which he possibly commits here is:
(A) In thinking that the sound of a heartbeat could have an effect on an infant’s weight.
(B) A Type II error.
(C) No error at all; we know that infants exposed to a heartbeat are healthier.
(D) A Type I error.
Question 64. If a data set is normally distributed then:
(A) The mean is smaller than the median or the mode.
(B) The mean is larger than the median but not larger than the mode.
(C) The mean, median and mode are the same.
(D) None of the above.
1
Fall 2012
Question 65.
A study of caffeine levels of farmers and doctors reported the following data:
Sample Size
Sample Mean
Sample S.D.
Farmers
25
13
4
Doctors
25
10
3
A 95% confidence interval for the difference in population mean caffeine levels (farmers’
mean minus doctors’ mean) is:
(A) (13 − 10) ± 1.64 × (SE of the difference ).
(B) (4 − 3) ± 2 × (SE of the difference ).
(C) (4 − 3) ± 2 × (SE of the difference ).
(D) (13 − 10) ± 2 × (SE of the difference ).
Question 66. As measured by the Stanford-Binet test, IQ scores are approximately
normally distributed with mean 100 and standard deviation 16. A score of 108 on the
Stanford-Binet test falls in the:
(A) 69th percentile.
(B) 92nd percentile.
(C) 50th percentile.
(D) 80th percentile.
Question 67. To obtain a margin of error of 1% we need a sample of size:
(A) 1,000
(B) 100,000
(C) 10,000
(D) 100
Question 68. To study the effects of exercise on lean body (muscle) weight change, a
random sample of 36 students was placed on a two-month long exercise program. At
the end of the program, all 36 students’ changes in lean body weight were measured.
The sample mean change in muscle weight was 1.05 pounds and the sample standard
deviation was 3.6 pounds. The study organizers wish to know if the results of this
sample provide good evidence that this exercise program causes a statistically significant change in the population mean lean body weight. The value of this test statistic
(standardized score) is:
3.6
(A) 1.05 = 3.43
1.05
√ = 1.75
3.6/ 36
1.05
(C) 3.6 = 0.29
3.6√
(D)
= 20.57
1.05/ 36
(B)
2
Question 69. A student claims that, for any data set of size two or more, the standard
error of the mean (SEM) is smaller than the sample standard deviation. This claim
is:
(A) Always true.
(B) Not always true; it depends on the actual numbers in the sample.
(C) Always false.
(D) Not always false; it depends on the sample size.
Question 70. Lee Salk exposed one group of newly born infants, the treatment group, to
the sound of a human heartbeat. Next, Salk compared their weight gains to those of
a group of newly born infants not exposed, the control group. In Salk’s experiment,
the alternative hypothesis is:
(A) Infants exposed to the sound of a human heartbeat will gain a higher mean weight
than infants not exposed to the sound of a heartbeat.
(B) Infants exposed to the sound of a human heartbeat will gain the same mean weight
as infants who are not exposed to the sound of a heartbeat.
(C) Infants not exposed to the sound of a human heartbeat will hear the heartbeat.
(D) Infants exposed to the sound of a human heartbeat will hear the heartbeat.
Question 71. When a distribution is greatly skewed to the right (i.e., has a long right
tail but a comparitively short left tail), the median will usually be:
(A) In no relation whatsoever to the mean.
(B) Smaller than the mean.
(C) Larger than the mean.
(D) Exactly equal to the mean.
Question 72. To survey the opinions of its customers, a supermarket grouped its customers by the days when they did most of their shopping. The supermarket randomly
selected two such groups, and asked all customers in those two groups to complete a
survey. This (biased) method of sampling is called:
(A) Stratified random sampling.
(B) Systematic sampling.
(C) Random digit dialing.
(D) Cluster sampling.
3
Question 73. A supermarket manager wants to know if customers would pay slightly
higher prices to have computers available throughout the store to help them locate
items. An interviewer is posted at the entrance door and asked to collect a sample of
100 opinions by asking questions of the next person who came to the door each time
she had completed an interview. This method of sampling is called:
(A) Cluster sampling.
(B) Stratified random sampling.
(C) Random digit dialing.
(D) Convenience sampling.
(E) Systematic sampling.
Question 74. The term “control” means that:
(A) There must be a basis for making comparisons.
(B) We control carefully the cost of performing the experiment.
(C) We control carefully the observed outcomes of the experiment.
(D) There is a need to control the subjects of the experiment.
Question 75. A dataset contains yearly measurements since 1960 of the divorce rate
(per 100,000 population) in the United States and the yearly number of people (per
100,000 population) sent to prison for drug offenses in the United States. We observe
a strong correlation of .67. Each of the following statements is true EXCEPT
(A) Years with higher divorce rates have tended to be years with higher lockup rates.
(B) Higher divorce rates lead to higher rates of drug lockups.
(C) Higher divorce rates are associated with higher rates of drug lockups.
(D) A positive correlation between divorce rates and drug lockups exists.
Question 76. The standard deviation of the histogram of a large number of sample means
is, approximately:
(A) The area below the normal curve and between -2 and +2.
(B) The mean of a proportion of the population which is never sampled.
(C) The true mean of the population.
√
(D) (population standard deviation)/ sample size.
Question 77. Consider the research hypothesis: Working at least 5 hours per day at a
computer contributes to deterioration of eyesight. The null hypothesis is:
(A) working at least 5 hours per day does not affect your eyesight
(B) cannot determine the null hypothesis
(C) working at least 5 hours per day improves your eyesight
(D) working at least 5 hours per day contributes to the deterioration of eyesight
4
Question 78. Fiona receives a beautiful four-sided die for her eighteenth birthday. After
playing with it for two hours, she starts to suspect that her die is more favorable to
rolling “1” than to any other number. Fiona’s null hypothesis is that
(A) The die rolls “1” with probability not equal to 0.25.
(B) The die rolls “1” with probability less than 0.25.
(C) The die rolls “1” with probability equal to 0.25.
(D) The die rolls “1” with probability greater than 0.25.
Question 79. When measured with extreme accuracy, the variable “height of a building”
is:
(A) A discrete quantitative variable
(B) A nominal categorical variable
(C) A continuous quantitative variable
(D) An ordinal categorical variable
Question 80. In 1982, 490,000 subjects were asked about their drinking habits. Researchers tracked subjects’ death rates until 1991, and found that adults who regularly had one alcoholic drink daily had a lower death rate than those who did not
drink. Most of the subjects were middle-class, married, and college-educated. This
experiment is an example of:
(A) A block design; many groups were compared, e.g., married vs. not married, middleclass vs. not middle-class.
(B) A prospective study; after recording their drinking habits, subjects were studied into
the future.
(C) A matched pairs design; two groups were compared, those who had one drink daily
and those who did not.
(D) A retrospective study; after the nine-year period, survivors were asked about their
drinking habits.
Question 81. The term “randomization” means that we:
(A) We study only a random subset of all observed outcomes of the experiment.
(B) Allow subjects to assign themselves randomly to the placebo or treatment group.
(C) Use a chance mechanism to assign subjects to the treatment and control groups.
(D) Assign the subjects to the random experiment in a systematic manner.
Question 82. A radio advertiser wishes to choose a sample of size 100 from a population
of 5000 listeners. After observing that 5, 000 ÷ 50 = 100, he first selects a subject at
random from the first 50 names in the sampling frame, and then he selects every 50th
subject listed after that one. This method of sampling is called:
(A) Stratified random sampling.
(B) Simple random sampling.
(C) Cluster random sampling.
(D) Systematic random sampling.
5
Question 83. If a particular dataset has a five-number summary given by 10, 20, 30, 40,
50, then the interquartile range is:
(A) 30 − 5 = 25
(B) 40 − 20 = 20
(C) (10 + 20 + 30 + 40 + 50)/5 = 30
(D) 50 − 10 = 40
Question 84. All other things remaining constant, if the population size quadruples from
10 million to 40 million then the width of a confidence interval will:
(A) Increase by a factor of two.
(B) Decrease by half.
(C) Remain unchanged.
(D) Increase and then decrease.
Question 85. Consider the research hypothesis that there is a difference in the proportions
of men and women at PSU who own cell phones. Data from the class survey question:
Do you own a cell phone?
No Yes
Female 26 51
77
Male
19 16
35
45 67 112
If the chi-square statistic is 4.215 (which equals 2.0532 ), the p-value is:
(A) greater than .05
(B) can’t tell
(C) less than .05
(D) equal to .05
Question 86. Fiona receives a beautiful four-sided die for her eighteenth birthday. After
playing with it for two hours, she starts to suspect that her die is more favorable to
rolling “1” than to any other number. Fiona’s alternative hypothesis is
(A) One-sided
(B) Two-sided
Question 87. A random sample of 25 farmers was examined in a study of caffeine levels.
From the data collected, a 95% confidence interval for the population mean caffeine
level was calculated to be 21.5 to 23.0. We can conclude that:
(A) 23.3 is a plausible value for the population mean.
(B) 21.8 is not a plausible value for the population mean.
(C) 22.3 is a plausible value for the population mean.
(D) 19.7 is a plausible value for the population mean.
6
Question 88. Suppose that 1%of the population has hepatitis. Suppose we have a test for
the disease that has 80% sensitivity and 90% specificity. What is Pr(hepatitis given
that the test is positive)?
(A) .01
(B) .14
(C) .80
(D) .075
Question 89. The standard deviation of the histogram of a large number of sample
proportions is, approximately:
(A) The area below the normal curve and between -1.96 and +1.96.
(B) The square root of: (true proportion) × (1 − true proportion)/(sample size).
(C) A proportion of the population which is never sampled.
(D) The square root of: (true proportion) × (sample size)/(1 − true proportion).
(E) The true proportion of the population.
Question 90. Suppose you repeatedly toss a coin for which the probability of tossing
heads is .3, or 30%. The probability that you do not toss heads on any of your first
four tosses is:
(A) 1 − (.7)4 = .76
(B) .7 + .7 + .7 + .7 = 2.8
(C) .7
(D) (.7)4 = .24
(E) None of the above
Question 91. Consider the following two variables: Weight of a car and its gas mileage
(the number of miles it can drive on a gallon of gasoline). We would expect the
correlation to be
(A) negative
(B) positive
(C) zero
(D) one
Question 92. Lee Salk exposed one group of newly born infants, the treatment group, to
the sound of a human heartbeat. Next, Salk compared their weight gains to those of
a group of newly born infants not exposed, the control group. In Salk’s experiment, a
Type II error occurs if:
(A) Infants exposed to the sound of a human heartbeat actually hear the heartbeat.
(B) The study rejects the hypothesis that exposed infants have the same mean weight gain
as unexposed infants when, in fact, this hypothesis is valid.
(C) The study fails to reject the hypothesis that exposed infants have the same mean
weight gain as unexposed infants when, in fact, this hypothesis is not valid.
(D) Infants not exposed to the sound of a human heartbeat do not hear the heartbeat.
7
Question 93. All other things remaining constant, if the sample size increases by a factor
of nine then the confidence interval for the population mean will:
(A) Triple in width.
(B) Become one-ninth as wide.
(C) Become nine times as wide.
(D) Become one-third as wide.
Question 94. Suppose ten studies were conducted to assess the relationship between
watching violence on television and subsequent violent behavior in children. Suppose
that none of the ten studies detected a statistically significant relationship. What
would be the result of applying the vote-counting method to this example?
(A) Vote-counting might detect a relationship in this example, but we would need more
information.
(B) Vote-counting will not detect a relationship in this example.
(C) Vote-counting will detect a relationship in this example.
Question 95. Suppose that in a particular sample, we find that 80% of female college
students use cell phones while 75% of male college students use cell phones. Which
of the following is true?
(A) This difference between males and females is less likely to be statistically significant
if the sample is larger.
(B) This difference between males and females is more likely to be statistically significant
if the sample is larger.
(C) This difference between males and females is more likely to be practically significant
if the sample is larger.
(D) This difference between males and females is less likely to be practically significant if
the sample is larger.
No Ticket Ticket Total
Female
52
25
77
Question 96.
Male
17
19
36
Total
69
44
113
The data show clearly that males received traffic tickets at a higher rate than females. From
these census data we conclude that the events “Female” and “Ticket” are:
(A) Independent
(B) Mutually exclusive
(C) Neither mutually exclusive nor independent
(D) Both mutually exclusive and independent
Question 97. Which of the following measures is valid and categorical?
(A) Time on a clock that is always 10 minutes fast
(B) The sale price of a house
(C) Sex (male or female)
(D) Weight of an individual on a scale that is sometimes 5 pounds too light, sometimes 5
pounds too heavy
8
Question 98. In a statistical study, the population is:
(A) The group of people from whom data cannot be collected.
(B) The people or objects studied in the sample survey.
(C) The group of people or objects for which conclusions are to be made.
(D) All people in the United States.
Question 99. The mean of a large number of sample means from equally-sized random
samples will be approximately:
√
(A) (population standard deviation)/ sample size.
(B) The mean of a proportion of the population which is never sampled.
(C) The area below the normal curve and between -1.96 and +1.96.
(D) The true mean of the population.
Question 100. In 1982, 490,000 subjects were asked about their drinking habits. Researchers tracked subjects’ death rates until 1991, and found that adults who regularly
had one alcoholic drink daily had a lower death rate than those who did not drink.
Most of the subjects were middle-class, married, and college-educated.
This experiment was:
(A) A randomized experiment; subjects were assigned in a randomized manner to have
one alcoholic drink each day.
(B) An observational study; it would not be ethical for the researchers to randomly assign
subjects to drink alcohol or not.
(C) Based on a stratified random sample; subjects were stratified randomly into overlapping groups according to whether or not they had one drink daily.
(D) Accurate; there is fundamentally solid anecdotal evidence that people’s health will
improve if they have one drink daily.
Question 101. A researcher asks 1,600 randomly chosen doctors whether or not they
take aspirin regularly. She also asks them to estimate the number of headaches they
have had in the past six months, and compares the number of headaches reported by
those who take aspirin regularly to the number of headaches reported by those who
do not take aspirin regularly. In this study, the number of headaches is a:
(A) Response variable.
(B) Explanatory variable.
(C) Confounding variable.
(D) None of the above.
Question 102. All other things remaining constant, increasing the confidence coefficient
causes the width of a confidence interval to:
(A) Remain unchanged.
(B) Increase.
(C) Decrease.
(D) Increase for a while and then decrease.
9
Question 103. A researcher asks 1,600 randomly chosen doctors whether or not they
take aspirin regularly. She also asks them to estimate the number of headaches they
have had in the past six months, and compares the number of headaches reported by
those who take aspirin regularly to the number of headaches reported by those who
do not take aspirin regularly. This type of study design is a:
(A) Retrospective observational study.
(B) Randomized experiment.
(C) Census.
(D) Prospective study.
Question 104. If a simple random sample of 1,600 subjects is chosen then an approximate
margin of error for making inferences about a percentage of the entire population is:
(A) 1/1, 600, or 0.06%.
(B) Impossible to calculate without knowing the percentage observed in the sample.
(C) Very high, because the sample size is a tiny percentage of the population size.
(D) Approximately zero because the sample size is so large.
√
(E) 1/ 1, 600, or 2.5%.
Question 105. Hypothetical research question, asked of a random sample of students:
Do you own a pet? (Data and related output below.)
Female
Male
All
Chi-sq =
No Pet
26
17
43
Yes Pet
50
18
68
All
76
35
111
0.40 +
0.87 +
0.25 +
xxxx =
2.08
If all the counts in the table above were multipled by 10, the chi-squared value would
(A) change to .208
(B) change to 20.8
(C) change to 208
(D) stay the same
Question 106.
use:
To calculate a 99% confidence interval for a population proportion, we
(A) Sample proportion ± 1.64× S.D.
(B) Sample proportion ± 2× S.D.
(C) Sample proportion ± 2.33× S.D.
(D) Sample proportion ± 2.576× S.D.
10
Question 107. In a study to see if a new variety of popcorn pops faster than the old
variety, we collected the following data on time to complete popping in minutes:
Old variety New variety
Mean
15
10
SEM
3
4
Which of the following is true? (Hint: The test statistic is very easy to compute here
without a calculator.)
(A) We would support the research advocate
(B) We would not support the research advocate
(C) Not enough information to decide
Question 108. A statistical study considers the question of whether the presence of plants
in an office might lead to fewer sick days. In this study, the null hypothesis is:
(A) The presence of sick people in an office leads to fewer plants.
(B) Insufficient information is given to allow us to determine the null hypothesis.
(C) The presence of plants in an office leads to fewer sick days.
(D) The presence of plants in an office does not lead to fewer sick days.
Question 109.
data:
A study of caffeine levels of farmers and doctors reported the following
Sample Size
Sample Mean
Sample S.D.
Farmers
25
13
4
Doctors
25
10
3
The standard error of the difference between the two sample means is:
(A) The standard deviation of the data in the combined samples.
√
√
(B) (Sample Mean1 − Sample Mean2 )/ SD1 − SD2 = (13 − 10)/ 4 − 3 = 3.0
(C)
p
(D)
p
(Sample Mean1 )2 /SD1 + (Sample Mean2 )2 /SD2 =
(SEM1 )2 + (SEM2 )2 =
q
( √1325 )2 + ( √1025 )2 = 3.28
q
( √425 )2 + ( √325 )2 = 1.0
Question 110. In any large data set, the proportion of data falling at or below Q3 , the
third quartile, is:
(A) 25%
(B) 75%
(C) Dependent on the sample drawn from the population.
(D) 68%
(E) 99.7%
Question 111. True or False: Outliers in a scatter plot can increase the correlation.
(A) True
(B) False
11
Question 112. The following table presents experts’ and students’ rankings of the risks
of eight activities. (Hint: Sketch the scatterplot of these rankings with the experts’
rankings on the horizontal axis.)
The Eight Greatest Risks
Activity or Technology Experts’ Rank (x) Students’ Rank (y)
Motor Vehicles
1
5
Smoking
2
3
Alcoholic Beverages
3
7
Handguns
4
4
Surgery
5
11
Motorcycles
6
6
X-rays
7
17
Pesticides
8
2
If “Pesticides” were deleted from the data above, then we would expect that:
(A) The correlation and the slope of the regression line both will decrease.
(B) The correlation will increase and the slope will decrease.
(C) The correlation and the slope of the regression line both will remain unchanged.
(D) The correlation and the slope of the regression line both will increase.
(E) None of the above.
Question 113. In a study of caffeine levels of farmers and doctors, suppose that a 95%
confidence interval for the difference in the means did not contain the value zero. We
could infer that the population means of farmers’ and doctors’ caffeine levels:
(A) Are significantly different from each other.
(B) Are close to each other; there is no significant difference between them.
(C) Have no relationship to each other; we have insufficient evidence to make an inference
(D) Are equal to each other; we are 95% confident that they are equal.
Question 114. A sample of 28 temperature measurements in ◦ F, all taken at 12:00 p.m.,
was collected in a coastal town in NC. A second sample of 28 temperature readings
in a GA coastal town was also collected. Each GA measurement was recorded at the
same time and date as the NC data value, and turned out to be exactly 5◦ F higher
than the corresponding NC measurement. We calculate the standard deviation (S.D.)
of the two data sets and conclude that:
(A) The two data sets have the same standard deviations.
(B) The S.D. of the NC data exceeds the S.D. of the GA data by 5◦ F.
(C) The S.D. of the GA data exceeds the S.D. of the NC data by 5◦ F.
(D) There is not enough information to determine the relationship between the two S.D.’s.
12
Question 115.
median is:
For any data set, the proportion of the data that falls at or above the
(A) 95%
(B) 99.7%
(C) 25%
(D) 50%
(E) Dependent on the sample which was drawn from the population.
Question 116. A statistical study considers the question of whether highly educated
people are less likely to develop Alzheimer’s disease than others. In this study, the
alternative hypothesis is:
(A) There is no relationship between level of education and the development of Alzheimer’s
disease.
(B) Highly educated people are certain of developing Alzheimer’s disease.
(C) Insufficient information is given to allow us to determine the null hypothesis.
(D) There is a relationship between level of education and the development of Alzheimer’s
disease.
(E) Highly educated people are less likely than others to develop Alzheimer’s disease.
Question 117. For any data set, the standard deviation is:
(A) A measure of central tendency.
(B) The average of the deviations.
(C) A measure of the spread or variability of the data.
(D) The average of the sample mean and quartiles.
Question 118. It has been observed that participants in a statistical experiment sometimes respond differently than they otherwise would because they know that they are
in an experiment. This phenomenon is called the:
(A) Interacting effect.
(B) Confounding effect.
(C) Placebo effect.
(D) Hawthorne effect.
Question 119. Suppose ten studies were conducted to assess the relationship between
watching violence on television and subsequent violent behavior in children. Suppose
that none of the ten studies detected a statistically significant relationship. True or
false: It is possible for a meta-analysis to detect a statistically significant relationship
in this example.
(A) True
(B) False
13
Question 120. To estimate the unemployment rate, the Bureau of Labor Statistics
contacts about 60,000 households chosen randomly from a list of all known households.
The Bureau asks each adult in every sampled household whether they are in the labor
force, i.e., employed or seeking employment. The population of interest is:
(A) All known or unknown households.
(B) All adults who are in the labor force.
(D) All adults who refused to participate in the survey.
Question 121. Suppose that we run 10 separate hypothesis tests as part of an experiment.
Supposing that we wish to use the usual 0.05 signicance cutoff for p-values, how would
a Bonferroni correction work?
(A) No result would be declared significant unless its p-value is smaller than 1/10 = 0.10.
(B) No result would be declared significant unless its p-value is smaller than 0.05/10 =
0.005.
(C) Only the smallest p-value would be declared significant as long as it is smaller than
0.05.
(D) No result would be declared significant unless its p-value is smaller than 0.05×10 = 0.5.
(E) No result would be declared significant unless its p-value is smaller than 0.05.
14
``` # Sample size and power Jon Michael Gran MF9130 Introductory course in statistics # Lecture 10: Depicting Sampling Distributions of a Sample Proportion 