```STT200
Chapter 12
KM
Vocabulary:
Already known: Population, Sample, Parameter, Statistic, Surveys
Notation:
We typically use Greek letters to denote parameters and Latin letters to
denote statistics.
Sampling Frame - a list of individuals from which a sample is selected
Sampling Variability - each random sample is different
Sampling Design - a method of taking a sample
Sample Size – the number of elements in a sample
Do not sample more than 10% of the population.
If a goal is to estimate population parameters or distribution, a sample should
represent the whole population. To achieve this, sample individuals should be
selected at random, so that each set of the same size has the same chance of
being selected. A sample drawn in this way is called a simple random sample
(SRS)
Sampling Designs (to produce representative samples)
All statistical sampling designs have in common the idea that chance, rather than
human choice, is used to select the sample.
• SRS – Simple Random Sample - each set of size n has the same chance to
be selected (best!)
STT200
Chapter 12
KM
• Stratified Random Sample - a sampling design in which the population is
divided into groups (strata) and random samples are then drawn from each
stratum
• Cluster Sample - a sampling design in which first we select some parts of the
populations (clusters) at random, and then a census is performed within each of
them.
• Systematic Sample - there is pattern in selecting a sample, but the starting
point is random
• Multistage Sample - design that combines several methods
Bias – a failure of a sample to represent a population
Sources of bias
 Voluntary Response Sample - large group is invited to response, but
only those who responded count. This sample is not-representative.
 Convenience Sample - include in a sample individuals which are at
hand. This sample is not-representative.
 Undercoverage - some portion of a population is not sampled
 Nonresponse bias - when large fraction of sampled individuals fail to
response
 Response bias - wrong wording, order etc. of questions, that suggest
certain response
Data collected in an inappropriate way is useless.
Example (Gallup Poll).
GEORGE GALLUP (1901-1984) was the father of opinion surveys. He studied
journalism and psychology and dealt with market research, analysis of
advertising and marketing. He was a founder of Gallup Institute in 1935. His first
big triumph occurred in 1936 when F.D. Roosvelt run for his second term in the
White House against the Kansas governor Alfred Landon.
LITERARY DIGEST made a prediction of the election based on enormous
number of responses obtained after sending 10 million (!!!) postcards to
American households.
STT200
Chapter 12
F.D. ROOSEVELT
A. LANDON
44%
56%
KM
Literary Digest had an excellent reputation after correct prediction of the 1932
presidential election.
G. GALLUP sent only a few thousand postcards and announced a convincing
victory of F.D. Roosevelt.
The real result of the vote
F.D. ROOSEVELT
A. LANDON
22 809 638
15 758 901
63%
37%
On one hand, common sense suggests the more people are examined the more
accurate result is obtained.
On the other hand... Gallup chose his respondents at random in such a way that
every potential voter had the same chance or probability of being selected.
Literary Digest used addresses from a telephone books and car registration lists.
Clearly at that time car and telephone owners did not constitute a half of the
GALLUP case proved that what matters is the way a sample is selected and the
sample size is less important.
Exercises:
4. Drug tests. Major League Baseball tests players to see whether they are using
performance-enhancing drugs. Officials select a team at random, and a drug-testing
crew shows up unannounced to test all 40 players on the team. Each testing day can be
considered a study of drug use in Major League Baseball.
a) What kind of sample is this?
b) Is that choice appropriate?
STT200
Chapter 12
KM
10. The Gallup Poll interviewed 1007 randomly selected U.S. adults aged 18 and older,
March 23–25, 2007. Gallup reports that when asked when (if ever) the effects of global
warming will begin to happen, 60% of respondents said the effects had already begun.
Only 11% thought that they would never happen.
For the following reports about statistical studies, identify the following items (if
possible). If you can't tell, then say so—this often happens when we read about a
survey.
a) The population
b) The population parameter of interest
c) The sampling frame
d) The sample
e) The sampling method, including whether or not randomization was employed
f) Any potential sources of bias you can detect and any problems you see in
generalizing to the population of interest
```