Honors Introduction
to Statistics
Practice Problems for
Test 3
Show your work for full credit. Feel free to check your answers with your calculator, but answers without supporting work will receive little or no credit. Always interpret results in the context of the situation.
1.
Harley-Davidson motorcycles make up 14% of all
motorcycles registered in the
a. If Harley’s
make up 14% of motorcycles stolen, what would be the sampling distribution of
the proportion of Harleys in a sample of 9224 stolen motorcycles?
sample proportion
~ N(0.14, 0.003613)
b. Is the proportion of Harleys among
stolen bikes significantly higher than their share of all motorcycles? You could use a hypothesis test to answer
this question, but computations are really not necessary. Explain why not.
= 2490/9224 = .2699,
which is MANY standard deviations above the mean of .14, so yes, the proportion
of Harleys among stolen bikes is much higher than the proportion of Harleys in
all registered bikes. If we did the test,
the p-value would be TINY.
2.
A college president says, “99% of the alumni support my
firing of Coach Boggs.”
a. Describe the
population and explain in words what the parameter p is.
The population is all college alumni.
The parameter p is the
proportion of all college alumni that support the firing of Coach Boggs.
b. You contact
an
that estimates p.
= 152/200 = 0.76
c. Based on the responses of the alumni
you contacted, construct a 99% confidence interval for p. (Your work should exhibit a correct critical
value. Feel free to check your result
using your calculator, but an interval with no work will receive no credit.)
0.76 ± 2.576 * sqrt( 0.76*0.24/200) , or (0.682, 0.848)
d. Explain the meaning of your
computation in Part (c). How does your
response relate to the president’s assertion?
We are 99% certain that 68% to 85% of all alumni support the firing of Coach
Boggs. There is very good reason to
believe that the president seriously overestimated the level of support he has
from alumni.
3.
When we toss a coin and call heads or tails to make a
decision, we are generally assuming that coins are “fair,” that is, that there
are equal chances that a flipped coin will turn up heads and tails. What if, instead of flipping pennies, we tip
them? (Carefully set a penny on its edge
on a table or other sturdy but movable surface, then jar the table to make it
fall over.) Your friend claims that
pennies are more likely to turn up heads than tails when they are tipped, and
you decide to test her claim by performing a hypothesis test.
a. State your
hypotheses, both in symbols and in words.
H0: p = 0.5 The
proportion of all tipped pennies that come up heads is 0.5.
Ha: p > 0.5 The
proportion of all tipped pennies that come up heads is greater than 0.5.
b. Suppose you randomly choose 50
pennies, set them on their edges, and tip them.
Of the 50 pennies, 32 come up heads.
Decide if this is reason to believe your friend’s claim. (Compute the test statistic and the P-value,
and then clearly state your conclusion.)
=32/50=0.64
test statistic: z = (.64-.50)/sqrt(.5*.5/50) = 1.98
p-value: 0.0239
There is good evidence in support of my friend’s claim. The data suggest that tipped pennies are more
likely to turn up heads than tails.
4. The carapace lengths (in mm) of 15 mature gopher tortoises randomly selected from the preserve in Abacoa are shown below.
320 295 284 303 315 308 303 305
272 315 291 294 276 318 278

a. Examine these data
for shape, center, spread, and outliers.
The shape is roughly uniform, with a center (median) of
303 mm and a spread of 272 to 320 mm.
There are no
outliers.
b. Do you
believe the use of our inference techniques is justified in this
situation? Explain your answer.
Yes, the sample was an
c. Give a 95%
confidence interval for the mean carapace length of all mature gopher tortoises
in the preserve. Write a complete
sentence interpreting the meaning of your interval. (Your sentence should say
something about tortoises!).
298.467 ± 2.145*15.7837/sqrt(15) or (289.73, 307.21)
We are 95% confident that the mean carapace length of all gopher tortoises in
the preserve is between 289.7 and 307.3 mm.
d. Estimate the
sample size you would you need to compute a 95% confidence interval with a
margin of error less than 3 mm? Why
can’t you give an exact answer?
Need 2.145*15.7837/sqrt(n) = 3. Solve for n and round up to get that n should
be at least 128 tortoises. We can’t get
an exact answer because we don’t know the sample standard deviation and have to
estimate it with the one we have. We
also don’t know the correct t* critical value, and have used df = 14.
5. A study of computer-assisted learning examined the learning of “Blissymbols” by children. The researcher designed two computer lessons that taught the same content, one in which students interacted with the material, and one in which students controlled the pace of the lesson but otherwise did not interact with the program. After the lesson, the computer presented a quiz that asked the children to identify 56 Blisssymbols. Here are the numbers of correct identifications by the 24 children in the Active group:
|
29 |
28 |
24 |
31 |
15 |
24 |
27 |
23 |
20 |
22 |
23 |
21 |
|
24 |
35 |
21 |
24 |
44 |
28 |
17 |
21 |
21 |
20 |
28 |
16 |
And here are the counts for the 24 children in the Passive group:
|
16 |
14 |
17 |
15 |
26 |
17 |
12 |
25 |
21 |
20 |
18 |
21 |
|
20 |
16 |
18 |
15 |
26 |
15 |
13 |
17 |
21 |
19 |
15 |
12 |
a. Is there good
evidence that active learning is superior to passive learning? State your hypotheses, give a test statistic
and P-value, and clearly state your conclusion in the context of student
learning.
H0: ma = mp
The
mean number of correct identifications for active and passive learners is the
same
Ha: ma > mp The mean
number of correct identifications for active learners is greater than for
passive learners
T = (24.41667-17.875)/sqrt(6.31022/24 + 4.0252/24) = 4.28, df = 23
p < 0.0005
There is very strong evidence that the average score for all active learners is
greater than the average score for all passive learners.
b. Give a 90%
confidence interval for the difference in the mean number of Blissymbols identified correctly by the active learning
group and the passive learning group.
Interpret your result.
24.41667 – 17.875 ± 1.714* sqrt(6.31022/24 + 4.0252/24) or
(3.92,9.16)
We are 95% confident that the average score for all active learners is 3.9 to
9.2 points higher than the average score for all passive learners.
c. What assumptions
do your procedures from (a) and (b) require?
Do the data meet these assumptions?
Justify your answer.
We need SRSs from two populations, and the sum of the
two sample sizes to be at least 40 if there is skew in the distributions. Although we don’t have SRSs,
we have random assignment into experimental groups, which is reasonable. Each sample size is 24, so we’re fine. There is some skewness
in the distributions (a
back-to-back stem plot with split stems is handy here), but it’s not bad. So, yes, we should be okay to use the
t-procedures.
6. Twelve
runners are asked to run a 10-kilometer race on each of two consecutive
weeks. In one of the races the runners
wear one brand of shoe and in the other a second brand. The brand they wear in each race is
determined at random. All runners are
timed and are asked to run their best in each race. The results (in minutes) are given below.
|
Runner |
Brand 1 |
Brand 2 |
Difference |
|
1 |
31.23 |
32.02 |
-0.79 |
|
2 |
29.33 |
28.98 |
0.35 |
|
3 |
30.50 |
30.63 |
-0.13 |
|
4 |
32.20 |
32.67 |
-0.47 |
|
5 |
33.08 |
32.95 |
0.13 |
|
6 |
31.52 |
31.53 |
-0.01 |
|
7 |
30.68 |
30.83 |
-0.15 |
|
8 |
31.05 |
31.10 |
-0.05 |
|
9 |
33.00 |
33.12 |
-0.12 |
|
10 |
29.67 |
29.50 |
0.17 |
|
11 |
30.55 |
30.57 |
-0.02 |
|
12 |
32.12 |
32.20 |
-0.08 |
Use the appropriate procedure to determine if there is evidence that the brand of the shoe affects runners’ times. State your hypotheses, compute the test statistic, give the P-value (or an estimate of it), and interpret your result.
This is a matched pairs design and the differences in times are computed
above. We apply the one sample t-test to
the differences.
H0: m = 0 The
mean difference in times (brand 1 minus brand 2) is 0
Ha: m ≠ 0 The mean difference in times is not 0
Test statistic: T = -0.0975/(.2958/sqrt(12)) = - 1.1418, df =
11
.10 < p < .20
There is no real evidence that shoe brand matters. The mean difference in times (brand 1 minus
brand 2) could plausibly be 0.
7.
The Physician’s Health Study examined the effects of
taking an aspirin every other day.
Earlier studies suggested that aspirin might reduce the risk of heart
attacks. The subjects were 22,071 healthy
male physicians at least 40 years old.
The study assigned 11,037 of the subjects at random to take
aspirin. The others took a placebo. The study was double-blind. The researchers found that 119 participants
in the Aspirin group had strokes, while 98 of those in the Placebo group had
strokes. Is this difference significant? Conduct the appropriate test, be sure the
technical conditions for the test have been satisfied, and state your
conclusion.
H0: pa = pp
The
proportion of aspirin takers who have strokes is equal to the proportion of
placebo takers
who have strokes
Ha: pa ≠ pp
= 119/11037
= 98/11034
= 217/22071
Since the subjects were randomly assigned, we are willing to treat them as SRSs. The sample
sizes are plenty big to apply the two-sample procedures for proportions.
test statistic:
Z = (119/11037 – 98/11034) / sqrt(217/22071*(1-217/22071)*(1/11037+1/11034))
= 1.43
p = 0.1528
No, the data do not provide evidence that taking aspirin has a significant
effect on the incidence of strokes.
8.
How do we estimate the standard deviation of the
sampling distribution when computing confidence intervals for the difference in
proportions? When conducting
significance tests to compare proportions from two populations? Explain why we use different things, and how
the two are related.
For CIs, we use the standard error formula: 
For HTs, we use the standard error formula: 
For CIs, we make no assumptions about the relationship
between the two population proportions, so the standard error estimate just
replaces the unknown population proportions that appear in the standard
deviation formula (for the sample proportion) with their corresponding sample
proportions. For HTs,
we are assuming the two population proportions are the same, so we replace both
population proportions by the pooled sample proportion (total number of
successes divided by the total number of observations). The formulas are the same except for this
substitution, though they look slightly different because a common term has
been factored out in the HT version.
1. We might be interested in the number of final exams that are canceled (including ones given as a take-home or other alternate form). Is the frequency of departures from an "in-class" final related to the subject area? Suppose that 45 courses are randomly selected and the type of final exam in each is classified to give the two-way table below.
|
|
In-class |
Other |
|
Humanities |
6 |
11 |
|
Social Sciences |
9 |
6 |
|
Natural Sci/Math |
12 |
1 |
a. What sort of test
would you perform to answer the question "Is the frequency of departures
from an 'in-class' final related to the subject area?" State your hypotheses.
H0: ph = ps = pn The proportion of “other” finals is the same for all
three subject areas
Ha: Not all proportions are
the same
b. Given that the value of the test statistic is 9.98, test to determine if there is any relationship between the subject area of the course and the type of final given. Estimate a P-value and state your conclusions in a complete sentence (say something about finals and subject areas).
df = 2 and 0.005 < p < 0.010
There is strong evidence that thee is a relationship between subject area and type
of final.