Honors Introduction
to Statistics
Practice Questions
for Exam 2
1. (12 points) The
Admissions Office has developed a new 10 minute video to send to prospective
students to extol the virtues of attending
(a) Carefully describe an example of a statistical experiment that could be applied to this situation. Give explicit instructions on what the 12 students should do and be sure to indicate how randomization is used as part of your experiment.
Randomly assign the 12
participants to two groups of 6 each.
Have one group watch the old video and then the new one, and have the
second group watch the videos in the reverse order. To maintain blindness, don’t tell
participants which video is which. After
both videos, ask participants which they believe provides a more convincing argument for attending
(b) What specific question would you ask to measure a response variable in this experiment?
(c) Would you classify your response variable as categorical or quantitative?
(d)
Would you classify the experiment you have described as a randomized comparative experiment, a matched pairs design, or something else? Explain briefly.
It’s a matched pairs
design, where each participant receives both “treatments.”
2. The age distribution of students at City U is modeled by the
distribution shown to the right.
(a) Approximate the median student age on the graph based on the
distribution. Explain how you made your approximation.
About 27 years
old. About half the area under the
density curve is to the left of age 27, about half is to the right.
(b) Do you expect the mean student age to be higher or lower than the median? Explain briefly. Approximate the mean student age, based on the distribution.
The mean will be
greater than the median because the distribution is skewed to the right (high outliers
will pull the mean up).
(c) If we took random samples of size 5 from the student population, computed the average age within the sample, and looked at the distribution of these averages, would you expect the mean for the new distribution to be larger than, smaller than, or the same as, the mean you estimated in Part (b)? Explain briefly.
The same. The central limit theorem says the mean of
sample averages is the same as the population mean.
(d) If we took random samples of size 5 from the student population, computed the average age within the sample, and looked at the distribution of these averages, would you expect the standard deviation for the new distribution to be larger than, smaller than, or the same as, the standard deviation of the original distribution shown above? Explain briefly.
Smaller. The central limit theorem says the standard
deviation of sample averages is the population mean divided by the square root
of the sample size.
3. Despite the difficulties, it is sometimes possible to build a strong case for causation in the absence of experiments. The evidence that smoking causes lung cancer is about as strong as nonexperimental evidence can be. What criteria are necessary to suggest causation when we cannot do an experiment?
Strong association,
association is consistent across many studies of different sorts of people in
different environments, higher doses are associated with stronger responses
(more smoking is associated with higher cancer rates), smoking precedes the cancer
in time, cause is probable.
4. You are interested in determining the level of student support for student government activities. Create a question that is clearly biased, and one that is (to the extent possible) not biased. Briefly explain how you expect responses to the two questions to differ.
Biased toward
Biased against
Unbiased: Do you generally support or not support
student government activities?
5. A study of education followed a large group of fifth-grade children to see how many years of school they eventually completed. Let X be the highest year of school that a randomly chosen fifth grader completes. (Students who go on to college are included in the outcome X = 12.) The study found the following probability distribution for X.
|
Years |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
|
Probability |
0.010 |
0.007 |
0.007 |
0.013 |
0.032 |
0.068 |
0.070 |
0.041 |
0.752 |
(a) Carefully explain how you know this is a legitimate probability distribution.
All of the individual probabilities are between 0 and 1, and the sum of
the probabilities is exactly 1.
(b) What percent of fifth graders
eventually finished 12th grade?
75.2%
(c) Explain what P(X = 4) = 0.010 means in terms of children completing school.
The probability that a randomly chosen 5th grade student
will not complete the 5th grade (completes only grade 4 and drops
out before the end of 5th grade) is approximately 0.010.
(d) Find P(X
6).
0.9983
(e) Find the probability that a randomly chosen 5th grader finishes 12th grade, given that the student finished
9th grade.
0.752/(0.068+0.70+0.041+0.752) = about 0.81
6. Generate two random numbers between 0 and 1 and take Y to be their sum. The sum Y can take any value between 0 and 2. The density curve looks like a triangle with base from 0 to 2 and height 1.
(a) Sketch a graph of the density curve.

(b) Verify by geometry that the area under the curve is 1.
We know that the total area under a density curve is 1, so Area of a
triangle = ˝ * base * height = ˝ * 2 * height = 1. Solving for height, we get height = 1.
(c) What is the probability that Y is less than 1? (Shade the area that represents the probability on your density curve, then find that area.)

(d) What is the probability that Y is less than 0.5? (Again, shade the corresponding area on your density curve.)

7. Tetrahedral dice are shaped like pyramids, with 4 triangular faces, each of which is an equilateral triangle (all sides have the same length). Assume each die has sides labeled 1, 2, 3 and 4. When you roll a tetrahedral die, you “roll” the number on the down face.
(a) Give a probability model for rolling two such dice.
The possible pairs of rolls are given below (think of the first die as
red, the second as blue). All 16
probabilities are equally likely, with probability 1/16
|
1,1 |
1,2 |
1,3 |
1,4 |
|
2,1 |
2,2 |
2,3 |
2,4 |
|
3,1 |
3,2 |
3,3 |
3,4 |
|
4,1 |
4,2 |
4,3 |
4,4 |
(b) What is the probability the sum of the down-faces is 5?
4/16
8. A bottling company
uses a filling machine to fill glass bottles with beer. The bottles are supposed to contain 300
ml. In fact, contents vary according to
a normal distribution with mean
ml and standard
deviation
ml.
a. What is the probability that an
individual bottle contains less than 295 ml?
P(X < 295) = P(Z < (295-298)/3) = P(Z
< -1) = 0.1587
b. What is the probability that the mean
contents of the bottles in a six-pack is less than 295 ml?
P(mean < 295) = P[Z <
(295-298)/(3/sqrt(6))] = 0.0072
c. What important result guarantees the
difference between the previous two probabilities?
Central Limit Theorem
9. The carapace lengths (in mm) of 15 mature gopher tortoises randomly selected from the preserve in Abacoa are shown below.
320 295 284 303 315 308 303 305
272 315 291 294 276 318 278
a. Examine these data for shape, center, spread,
and outliers.

The
distribution is somewhat uniform, with center around 300 mm and spread from 272
to 318 mm. There are no serious
outliers.
b. We are making three assumptions in
our use of inference right now. List
those three assumptions and discuss the degree to which each is or is not met
in this situation.
·
we are told the tortoises were randomly selected, so yes, this assumption is
met
·
The distribution of all carapace lengths of mature
tortoises in the population is normal with mean m and standard
deviation s
the sample distribution could have come from a normal population, so this
assumption is probably okay
·
the mean m is unknown
and the standard deviation s is known
this assumption is unreasonable: we
don’t know sigma, we will have to make an assumption about it
c. Assuming that the standard deviation of
carapace lengths of all mature gopher tortoises in the preserve is s = 16 mm, give a 95% confidence interval for the mean carapace length
of all mature gopher tortoises in the preserve.
Write a complete sentence interpreting the meaning of your interval.
(Your sentence should say something about tortoises!).
95% CI is
, which in our case is
or (290.3, 306.6).
We are 95% confident that the average carapace length of all gopher tortoises
in the preserve is between 290 and 307 mm.
d. Estimate the sample size you would you need to compute a 95% confidence interval with a margin of error less than 3 mm.
We need
. Solving for n, we
need at least 110 gopher tortoises in the sample.
10. A social psychologist report: “In our sample, ethnocentrism was significantly higher (P < 0.05) among church attenders than among non-attenders.” Explain what this means in language understandable to someone who knows no statistics. Do not use the word “significance” in your answer.
The difference in
ethnocentrism that was observed between church attenders and non-attenders was
unlikely to occur by chance if the two groups in fact had the same
ethnocentrism.
11.
A random number generator is supposed to produce random numbers that are
uniformly distributed on the interval from 0 to 1. If this is true, the numbers generated come from
a population with mean
and standard deviation
. Unfortunately,
producing a good random number generator is quite difficult, and it is well
known that many such generators are not particularly random. You decide to test Excel’s random number
generator by generating 100 random numbers between 0 and 1. You want to perform a hypothesis test to
decide if Excel generates truly random numbers by looking at the mean from your
sample.
a. State your hypotheses.
H0: m = 0.5 (The
mean of all numbers generated by Excel’s random number generator is 0.5)
Ha:
m
≠ 0.5 (The mean of all numbers
generated by Excel’s random number generator is not 0.5)
b. Suppose the mean of the 100 numbers
generated by Excel is
. Calculate the value
of the test statistic. Find the p-value
for the test.
z
= (0.4365 – 0.5)/(0.2887/sqrt(100)) = -2.21
p
= 0.0136
c. Is the result significant at the 5%
level? At the 1% level?
Yes;
no.
d. What can you conclude (or not
conclude) based on your test? (Your
answer should say something about random numbers!)
The
data provide good evidence that the mean of all numbers generated by Excel’s
random number generator is different from 0.5.
12. True or False
________ The probability of an event can be described as the proportion of times the event occurs in many repeated trials of a random phenomenon.
________ Two events are independent when they cannot occur together.
________ If we
compute two confidence intervals, an 80% confidence interval and a 90%
confidence interval, based on the same sample, the 80% confidence interval will
be narrower.
________ The most important assumption in using
techniques of inference is that our samples are SRSs.
________ Significance tests can tell us if the observed effect was likely due to chance.
True, False, True, True, True
13.
Your mail-order company advertises that it ships 90% of its orders
within three working days. You select an
a. Explain why we expect the number of on-time
shipments in an
Each trial results in a success or failure, there are a fixed number of
trials (100), the trials are independent, and the probability of success in
each individual trial is the same (0.9).
These are the four conditions for the binomial setting. The parameters are n = 100 and p = 0.9.
b. If the company really ships 90% of its orders
on time, what is the probability that 86 or fewer in an
P(X <= 86) = P(Z <=
(86-90)/sqrt(100*0.9*0.1) ) = P(Z <= - 1.33) = 0.0968
c. A critic says, “You claim 90%,
but in your sample the on-time percentage is only 86%. So the 90% claim is wrong.” Explain in simple language why your
probability calculation in (a) shows that the result of the sample does not
refute the 90% claim.
If the company really does ship 90% of its orders within three days,
then the probability of that 86 or fewer orders from a sample of 100 arrive
within three days is 0.0968. In other
words, about 10% of the time, 86 or fewer of 100 orders would arrive “on-time”
even if the true on-time rate is 90%.
(It can happen by chance alone, and it wouldn’t be terribly unusual,
that only 86 of 100 arrive in three days.)
14.
Compute the following probabilities, based on a standard deck of 52
cards (no jokers).
a. Draw one card. What is the probability you draw a spade?
1/4
b. Draw one card. What is the probability you draw a jack given
that you draw a face card?
1/3
c. Draw two cards. What is the probability that your second card
is a spade, given that the first card you drew was a spade?
12/51
d.
Draw two cards. What is the
probability that your second card is a spade, given that the first card you
drew was a heart?
13/51
e.
Draw two cards. What is the
probability you draw two cards in the same suit?
12/51