Honors Introduction to Statistics

Practice Questions for Exam 2

 

1.  (12 points) The Admissions Office has developed a new 10 minute video to send to prospective students to extol the virtues of attending City U.  Before mass-producing the tape, they would like to test whether it is more effective than the current video.  Suppose that we have 12 high school student volunteers who have agreed to take part in an experiment.  The factor to be studied is the video, with two levels, OLD and NEW.

 

(a) Carefully describe an example of a statistical experiment that could be applied to this situation.  Give explicit instructions on what the 12 students should do and be sure to indicate how randomization is used as part of your experiment.

 

Randomly assign the 12 participants to two groups of 6 each.  Have one group watch the old video and then the new one, and have the second group watch the videos in the reverse order.  To maintain blindness, don’t tell participants which video is which.  After both videos, ask participants which they believe provides a  more convincing argument for attending City U.

 

(b) What specific question would you ask to measure a response variable in this experiment?

 

Which video provides a more convincing argument for attending City U, the first one you saw, or the second?

 

(c) Would you classify your response variable as categorical or quantitative?

 

Categorial

 

(d) Would you classify the experiment you have described as a randomized comparative experiment, a matched pairs design, or something else?  Explain briefly.

 

It’s a matched pairs design, where each participant receives both “treatments.”

 

2.  The age distribution of students at City U is modeled by the

distribution shown to the right. 

 

(a)  Approximate the median student age on the graph based on the

distribution.  Explain how you made your approximation.

 

About 27 years old.  About half the area under the density curve is to the left of age 27, about half is to the right.

 

(b)  Do you expect the mean student age to be higher or lower than the median?   Explain briefly.  Approximate the mean student age, based on the distribution.

 

The mean will be greater than the median because the distribution is skewed to the right (high outliers will pull the mean up).

 

(c)  If we took random samples of size 5 from the student population, computed the average age within the sample, and looked at the distribution of these averages, would you expect the mean for the new distribution to be larger than, smaller than, or the same as, the mean you estimated in Part (b)?  Explain briefly.

 

The same.  The central limit theorem says the mean of sample averages is the same as the population mean.

 

(d)  If we took random samples of size 5 from the student population, computed the average age within the sample, and looked at the distribution of these averages, would you expect the standard deviation for the new distribution to be larger than, smaller than, or the same as, the standard deviation of the original distribution shown above?  Explain briefly.

 

Smaller.  The central limit theorem says the standard deviation of sample averages is the population mean divided by the square root of the sample size.

 

3.  Despite the difficulties, it is sometimes possible to build a strong case for causation in the absence of experiments.  The evidence that smoking causes lung cancer is about as strong as nonexperimental evidence can be.  What criteria are necessary to suggest causation when we cannot do an experiment? 

 

Strong association, association is consistent across many studies of different sorts of people in different environments, higher doses are associated with stronger responses (more smoking is associated with higher cancer rates), smoking precedes the cancer in time, cause is probable.

 

4.  You are interested in determining the level of student support for student government activities.  Create a question that is clearly biased, and one that is (to the extent possible) not biased.  Briefly explain how you expect responses to the two questions to differ.

 

Biased toward SGA:  Our student government sponsors many great events on campus, and all students are invited to those events.  Given the importance of community on our campus, do you support student government activities?

 

Biased against SGA:  Our student government gets a lot of money from student fees, and squanders that money buying food for campus events, and paying large sums for bands and carnival-type entertainment.  Given the state budget crisis, do you support student government activities?

 

Unbiased:  Do you generally support or not support student government activities?

 

5. A study of education followed a large group of fifth-grade children to see how many years of school they eventually completed.  Let X be the highest year of school that a randomly chosen fifth grader completes.  (Students who go on to college are included in the outcome X = 12.)  The study found the following probability distribution for X.

Years

4

5

6

7

8

9

10

11

12

Probability

0.010

0.007

0.007

0.013

0.032

0.068

0.070

0.041

0.752

 

(a)  Carefully explain how you know this is a legitimate probability distribution.

 

All of the individual probabilities are between 0 and 1, and the sum of the probabilities is exactly 1.


(b)  What percent of fifth graders eventually finished 12th grade?

 

75.2%

 

(c)  Explain what P(X = 4) = 0.010 means in terms of children completing school.

 

The probability that a randomly chosen 5th grade student will not complete the 5th grade (completes only grade 4 and drops out before the end of 5th grade) is approximately 0.010.

 

(d)  Find P(X 6).

 

0.9983

 

(e)  Find the probability that a randomly chosen 5th grader finishes 12th grade, given that the student finished

9th grade.

 

0.752/(0.068+0.70+0.041+0.752) = about 0.81

 

6.  Generate two random numbers between 0 and 1 and take Y to be their sum.  The sum Y can take any value between 0 and 2.  The density curve looks like a triangle with base from 0 to 2 and height 1.

(a) Sketch a graph of the density curve.

 


(b) Verify by geometry that the area under the curve is 1.

 

We know that the total area under a density curve is 1, so Area of a triangle = ˝ * base * height = ˝ * 2 * height = 1.  Solving for height, we get height = 1.

 

(c) What is the probability that Y is less than 1?  (Shade the area that represents the probability on your density curve, then find that area.)

 

(d) What is the probability that Y is less than 0.5?  (Again, shade the corresponding area on your density curve.)

 

 

7.  Tetrahedral dice are shaped like pyramids, with 4 triangular faces, each of which is an equilateral triangle (all sides have the same length).  Assume each die has sides labeled 1, 2, 3 and 4.  When you roll a tetrahedral die, you “roll” the number on the down face.

(a) Give a probability model for rolling two such dice.

 

The possible pairs of rolls are given below (think of the first die as red, the second as blue).  All 16 probabilities are equally likely, with probability 1/16

 

1,1

1,2

1,3

1,4

2,1

2,2

2,3

2,4

3,1

3,2

3,3

3,4

4,1

4,2

4,3

4,4

 

(b) What is the probability the sum of the down-faces is 5?

 

4/16

8.  A bottling company uses a filling machine to fill glass bottles with beer.  The bottles are supposed to contain 300 ml.  In fact, contents vary according to a normal distribution with mean  ml and standard deviation ml.

a.  What is the probability that an individual bottle contains less than 295 ml?

 

            P(X < 295) = P(Z < (295-298)/3) = P(Z < -1) = 0.1587


b.  What is the probability that the mean contents of the bottles in a six-pack is less than 295 ml?

 

            P(mean < 295) = P[Z < (295-298)/(3/sqrt(6))] = 0.0072


c.  What important result guarantees the difference between the previous two probabilities?

 

            Central Limit Theorem

9.  The carapace lengths (in mm) of 15 mature gopher tortoises randomly selected from the preserve in Abacoa are shown below.

 

320      295      284      303      315      308      303      305     

272      315      291      294      276      318      278

         

a.  Examine these data for shape, center, spread, and outliers. 

 

The distribution is somewhat uniform, with center around 300 mm and spread from 272 to 318 mm.  There are no serious outliers.


b.  We are making three assumptions in our use of inference right now.  List those three assumptions and discuss the degree to which each is or is not met in this situation.

 

·        SRS 
we are told the tortoises were randomly selected, so yes, this assumption is met

·        The distribution of all carapace lengths of mature tortoises in the population is normal with mean m and standard deviation s  
the sample distribution could have come from a normal population, so this assumption is  probably okay

·        the mean m is unknown and the standard deviation s  is known
this assumption is unreasonable: 
we don’t know sigma, we will have to make an assumption about it

 

c.  Assuming that the standard deviation of carapace lengths of all mature gopher tortoises in the preserve is s = 16 mm, give a 95% confidence interval for the mean carapace length of all mature gopher tortoises in the preserve.  Write a complete sentence interpreting the meaning of your interval. (Your sentence should say something about tortoises!).

 

95% CI is , which in our case is or (290.3, 306.6).
We are 95% confident that the average carapace length of all gopher tortoises in the preserve is between 290 and 307 mm.

 

d.  Estimate the sample size you would you need to compute a 95% confidence interval with a margin of error less than 3 mm. 

 

We need.  Solving for n, we need at least 110 gopher tortoises in the sample.

 

10.  A social psychologist report:  “In our sample, ethnocentrism was significantly higher (P < 0.05) among church attenders than among non-attenders.”  Explain what this means in language understandable to someone who knows no statistics.  Do not use the word “significance” in your answer.

 

The difference in ethnocentrism that was observed between church attenders and non-attenders was unlikely to occur by chance if the two groups in fact had the same ethnocentrism.

 

11.  A random number generator is supposed to produce random numbers that are uniformly distributed on the interval from 0 to 1.  If this is true, the numbers generated come from a population with mean  and standard deviation .  Unfortunately, producing a good random number generator is quite difficult, and it is well known that many such generators are not particularly random.  You decide to test Excel’s random number generator by generating 100 random numbers between 0 and 1.  You want to perform a hypothesis test to decide if Excel generates truly random numbers by looking at the mean from your sample.

a.  State your hypotheses.

 

H0:  m = 0.5     (The mean of all numbers generated by Excel’s random number generator is 0.5)

Ha: m ≠ 0.5      (The mean of all numbers generated by Excel’s random number generator is not 0.5)


b.  Suppose the mean of the 100 numbers generated by Excel is .  Calculate the value of the test statistic.  Find the p-value for the test.

z = (0.4365 – 0.5)/(0.2887/sqrt(100)) = -2.21

p = 0.0136


c.  Is the result significant at the 5% level?  At the 1% level?

 

Yes; no.


d.  What can you conclude (or not conclude) based on your test?  (Your answer should say something about random numbers!)

 

The data provide good evidence that the mean of all numbers generated by Excel’s random number generator is different from 0.5.

12.  True or False

 

________  The probability of an event can be described as the proportion of times the event occurs in many repeated trials of a random phenomenon.

________  Two events are independent when they cannot occur together. 

________  If we compute two confidence intervals, an 80% confidence interval and a 90% confidence interval, based on the same sample, the 80% confidence interval will be narrower.

________  The most important assumption in using techniques of inference is that our samples are SRSs.

________  Significance tests can tell us if the observed effect was likely due to chance.

 

True, False, True, True, True

 

13.  Your mail-order company advertises that it ships 90% of its orders within three working days.  You select an SRS of 100 of the 5000 orders received in the past week for an audit.  The audit reveals that 86 of these orders were shipped on time. 

 

a.  Explain why we expect the number of on-time shipments in an SRS of size 100 to obey a binomial distribution.  What are the relevant parameters?

 

Each trial results in a success or failure, there are a fixed number of trials (100), the trials are independent, and the probability of success in each individual trial is the same (0.9).  These are the four conditions for the binomial setting.  The parameters are n = 100 and p = 0.9.

 

b.  If the company really ships 90% of its orders on time, what is the probability that 86 or fewer in an SRS of 100 orders are shipped on time?  (Use the normal approximation to the binomial distribution.)

 

P(X <= 86) = P(Z <= (86-90)/sqrt(100*0.9*0.1) ) = P(Z <= - 1.33) = 0.0968


c.  A critic says, “You claim 90%, but in your sample the on-time percentage is only 86%.  So the 90% claim is wrong.”  Explain in simple language why your probability calculation in (a) shows that the result of the sample does not refute the 90% claim.

 

If the company really does ship 90% of its orders within three days, then the probability of that 86 or fewer orders from a sample of 100 arrive within three days is 0.0968.  In other words, about 10% of the time, 86 or fewer of 100 orders would arrive “on-time” even if the true on-time rate is 90%.  (It can happen by chance alone, and it wouldn’t be terribly unusual, that only 86 of 100 arrive in three days.)

 

14.  Compute the following probabilities, based on a standard deck of 52 cards (no jokers).

a.  Draw one card.  What is the probability you draw a spade?

 

1/4


b.  Draw one card.  What is the probability you draw a jack given that you draw a face card?

 

1/3


c.  Draw two cards.  What is the probability that your second card is a spade, given that the first card you drew was a spade?

12/51

 

d.  Draw two cards.  What is the probability that your second card is a spade, given that the first card you drew was a heart?

13/51

 

e.  Draw two cards.  What is the probability you draw two cards in the same suit?

12/51