# Sampling: Surveys and How to Ask Questions

```4.1 The Beauty of Sampling
Chapter 4
Sample Survey: a subgroup of a large
population questioned on set of topics.
Special type of observational study.
Less costly and less time than a census.
Sampling:
Surveys and
Questions
With proper methods, a sample of 1500
can almost certainly gauge the percentage
in the entire population who have a certain
trait or opinion to within 3%.
Example 4.1 The Importance of Religion
The Margin of Error
The sample proportion and the population
proportion with a certain trait or opinion differ by
less than the margin of error in at least 95% of
all random samples.
Conservative margin of error:
For proportions:
Very important
Fairly important
Not very important
No opinion
65%
23%
12%
0%
Approx. 95% confidence interval for the percent of
all adult Americans who say religion is very important:
65% ± 3% or 62% to 68%
Add and subtract margin of error to create an
approximate 95% confidence interval.
3
4
Advantages of a Sample Survey over
a Census
Interpreting Confidence Interval
The interval 62% to 68% may or may not
capture the percent of adult Americans who
considered religion to be very important in
their lives.
But, in the long run this procedure will
produce intervals that capture the unknown
population values about 95% of the time
=> called the 95% confidence level.
Poll of n = 1003 adult Americans: “How important
would you say religion is in your own life?”
Conservative margin of error is 3%:
For percents:
2
Sometimes a Census Isn’t Possible
when measurements destroy units
Speed
especially if population is large
Accuracy
devote resources to getting accurate sample results
5
6
1
4.2 Simple Random Sampling
and Randomization
Bias: How Surveys Can Go Wrong
Results based on a survey are biased if method used to
obtain those results would consistently produce values
that are either too high or too low.
Probability Sampling Plan: everyone in
population has specified chance of making it
into the sample.
Selection bias occurs if method for selecting
participants produces sample that does not represent
the population of interest.
Simple Random Sample: every conceivable
group of units of the required size has the
same chance of being the selected sample.
Nonresponse bias occurs when a representative
sample is chosen but a subset cannot be contacted
or doesn’t respond.
Response bias occurs when participants respond
differently from how they truly feel.
7
Choosing a Simple Random Sample
Class of 270 students.
Want a simple random sample of 10 students.
1.  Number the units: Students numbered 001 to 270 –
place in one column.
2.  Generate random numbers: in an adjacent column,
generate uniform random numbers (Calc – Random
Data – Uniform …).
3.  Sort: both columns by random numbers.
4.  Choose: the top ten students are those selected in the
sample.
9
10
Example 4.5 Assigning Children
Using a Table of Random Digits
in a Randomized Experiment
to Lift Weights
Randomization plays a key role in designing
experiments to compare treatments.
Completely randomized design = all units
are randomly assigned to treatment conditions.
Matched-pairs / Randomized Block design =
randomize order treatments are assigned within
pair/block.
8
Simple Random Sample of Students
You Need:
1.  List of the units in the population.
2.  Source of random numbers : Minitab
or random number tables (book).
11
In Case Study 3.2, 43 children randomly assigned –
the first 15 to Group 1, the next 16 to Group 2, and
the remaining 12 to Group 3.
Using random numbers to assign children to groups:
- assign labels to each child
-  assign a random number to each child
-  sort by random numbers
-  first 15 go to group 1, next 16 to group 2, last 12 to
group 3
12
2
4.3 Other Sampling Methods
Stratified Random Sampling
Not always practical to take a simple random sample,
can be difficult to get a numbered list of all units.
Example: College administration would like to
survey a sample of students living in dormitories.
Divide population of units into groups (called strata)
and take a simple random sample from each of the strata.
show a simple
random sample
of 30 rooms.
13
Take a simple
random sample
of 15 rooms from
each of the strata
for a total of 30
rooms.
Ideal: stratify
so little variability
in responses within
each of the strata.
14
Cluster Sampling
Systematic Sampling
Divide population of units into groups (called clusters),
take a random sample of clusters and
measure only those items in these clusters.
Order the population of units in some way, select one of
the first k units at random and then every kth unit thereafter.
College survey: Order list of rooms starting at top floor of 1st
undergrad dorm. Pick one of the first 11 rooms at random =>
room 3, then pick every 11th room after that.
College survey: Each floor of each dorm is a cluster.
Take a random sample
of 5 floors and all
rooms on those floors
are surveyed.
Note: often a
good alternative
to random
sampling but
biased sample.
a list of the clusters
instead of a list of all
individuals.
15
Random-Digit Dialing
Multistage Sampling
Method approximates a simple random sample of all
households in the United States that have telephones.
Using a combination of the sampling methods,
at various stages.
1.
2.
3.
4.
List all possible exchanges (= area code + next 3 digits).
Take a sample of exchanges (chance of being sampled
based on white pages proportion of households with a
specific exchange).
Take a random sample of banks (= next 2 digits) within
each sampled exchange.
Randomly generate the last two digits from 00 to 99.
Example:
•
•
•
Once a phone number determined, make multiple
attempts to reach someone at that household.
16
•
17
Stratify the population by region of the country.
For each region, stratify by urban, suburban, and
rural and take a random sample of communities
within those strata.
Divide the selected communities into city blocks
as clusters, and sample some blocks.
Everyone on the block or within the fixed area
may then be sampled.
18
3
Example 4.7
Example 4.8
The Nationwide Personal
Transportation Survey
A Los Angeles Times
National Poll
“… half of Americans polled said they view Jan. 1, 2000,
as ‘just another New Year’s Day’ …
About one in 10 report that they are stockpiling
goods.”
Los Angeles
Nationwide Personal Transportation Survey:
taken every 5 years by the U.S. Department of Transportation.
1995 Survey = 21,000 households. Interviews conducted by
telephone using a computer-assisted telephone interviewing
(CATI) system.
Times
Multistage Sample:
Times Poll
•  U.S. households were stratified by region of country, size
of metropolitan area, and whether there is a subway system.
•  Households were then selected by random-digit dialing.
•  Everyone in a selected household was included => each
household was a cluster.
•  1,249 adults nationwide by telephone.
•  Over a two-day period in February 1999.
•  Telephone numbers chosen from all exchanges in nation.
•  Random-digit dialing techniques used so listed and nonlisted numbers could be contacted.
19
4.4 Difficulties and
Disasters in Sampling
The sampling frame is the list of units from which the
sample is selected. This list may or may not be the same
as the list of all units in the desired “target” population.
Example: using telephone
directory to survey general
population excludes those
who move often, those with
unlisted home numbers, and
those who cannot afford a
telephone. Solution: use
random-digit dialing.
Using wrong sampling frame
Not reaching individuals selected
Self-selected sample
Convenience/Haphazard sample
21
Not Reaching the Individuals Selected
22
“In 1993 the GSS (General Social Survey) achieved its
highest response rate ever, 82.4%. This is five percentage
points higher than our average over the last four years.”
GSS News, Sept 1993
Telephone surveys tend to reach more women.
Some people are rarely home.
Others screen calls or may refuse to answer.
Quickie polls: almost impossible to get a
random sample in one night.
Nonresponse or Volunteer Response
Failing to contact or measure the individuals
who were selected in the sampling plan leads
to nonresponse bias.
•
•
•
•
20
Using the Wrong Sampling Frame
Some problems occur even when a
sampling plan has been well designed.
•
•
•
•
23
•  The lower the response rate, the less the results
can be generalized to the population as a whole.
•  Response to survey is voluntary. Those who
respond likely to have stronger opinions than
those who don’t.
•  Surveys often use reminders, follow up calls to
decrease nonresponse rate.
24
4
Example 4.9
Which Scientists
Trashed the Public?
Disasters in Sampling
Responses from a self-selected group, convenience
sample or haphazard sample rarely representative
of any larger group.
“82% (of scientists) trashed the media, agreeing with the
statement ‘The media do not understand statistics well
enough to explain new findings.’ ” Science (Mervis, 1998)
Example 4.10
•  1400 professionals (in science and in journalism).
•  Only 34% response rate among scientists.
•  Typical respondent was white, male physical scientist
over age of 50 doing basic research.
•  Respondents represent a narrow subset of scientists
=> inappropriate to generalize to all scientists.
A Meaningless Poll
“Do you support the President’s economic plan?”
Results from TV quickie poll and proper study:
Science Poll
Those dissatisfied more likely to respond to TV
poll and it did not give the “not sure” option.
25
Case Study 4.1 The Infamous Literary
26
Case Study 4.1 The Infamous Literary
Digest Poll of 1936
Digest Poll of 1936
Election of 1936: Democratic incumbent
Franklin D. Roosevelt and Republican Alf Landon
Election of 1936: Democratic incumbent
Franklin D. Roosevelt and Republican Alf Landon
Literary Digest Poll:
Gallup Poll:
•  Sent questionnaires to 10 million people from magazine
subscriber lists, phone directories, car owners, who
were more likely wealthy and unhappy with Roosevelt.
•  Only 2.3 million responses for 23% response rate.
Those with strong feelings, the Landon supporters
wanting a change, were more likely to respond.
•  (Incorrectly) Predicted a 3-to-2 victory for Landon.
•  George Gallup just founded the American Institute of Public
Opinion in 1935.
•  Surveyed a random sample of 50,000 people from list of
registered voters. Also took a random sample of 3000 people
from the Digest lists.
•  (Correctly) Predicted Roosevelt the winner. Also predicted
the (wrong) results of the Literary Digest poll within 1%.
27
Survey Questions
Possible Sources of
Response Bias in Surveys (cont)
Possible Sources of Response Bias in Surveys
•  Asking the Uninformed: People do not like to
•  Deliberate bias: The wording of a question can
deliberately bias the responses toward a desired answer.
•  Unintentional bias: Questions can be worded such
that the meaning is misinterpreted by a large percentage
of the respondents.
the person who is asking the question. Tend to understate
response to an undesirable social habit/opinion.
29
28
when you ask them a question.
•  Unnecessary Complexity: If questions are to be
understood, they must be kept simple. Some questions
ask more than one question at once.
•  Ordering of Questions: If one question requires
respondents to think about something that they may not
have otherwise considered, then the order in which
questions are presented can change the results.
30
5
Possible Sources of
Response Bias in Surveys (cont)
•  Be Sure You Understand What Was Measured:
•  Confidentiality and Anonymity: People will
often answer questions differently based on the degree
to which they believe they are anonymous.
Easier to ensure confidentiality, promise not to release
identifying information, than anonymity, researcher
does not know the identity of the respondents.
Words can have different meanings. Important to get
a precise definition of what was actually asked or
measured. E.g. Who is really unemployed?
•  Some Concepts Are Hard to Precisely Define:
E.g. How to measure intelligence?
•  Measuring Attitudes and Emotions:
E.g. How to measure self-esteem and happiness?
31
32
Case Study 4.2 No Opinion of Your Own?
•  Open or Closed Questions:
Should Choices Be Given?
Let Politics Decide
Open question = respondents allowed to answer
in own words.
1978 Poll, Cincinnati, Ohio: people asked whether they
“favored or opposed repealing the 1975 Public Affairs Act.”
No such act, about one-third expressed opinion.
Closed question = given list of alternatives,
usually offer choice of “other” and can fill in blank.
1995 Washington Post Poll: 1000 randomly selected
people asked “Some people say the 1975 Public Affairs
Act should be repealed. Do you agree or disagree that
it should be repealed?”
43% expressed opinion,
24% agreeing should be repealed.
If closed are preferred, they should first be presented
as open questions (in a pilot survey) for establishing
list of choices.
Results can be difficult to summarize with open
questions.
33
34
Case Study 4.2 No Opinion of Your Own?
Let Politics Decide (cont)
Second 1995 Washington Post Poll: polled two
separate groups of 500 randomly selected adults.
Group 1: “President Clinton [a Democrat] said that the 1975
Public Affairs Act should be repealed. Do you agree or
disagree?” Of those expressing an opinion:
36% of the Democrats agreed should be repealed,
16% of the Republicans agreed should be repealed.
Group 2: “The Republicans in Congress said that the 1975
Public Affairs Act should be repealed. Do you agree or
disagree?” Of those expressing an opinion:
36% of the Republicans agreed should be repealed,
19% of the Democrats agreed should be repealed.
35
6
```