These
examples are taken from the book CliffsNotes Statistics Quick Review
(Cliffsquickreview).
That book (shown in the right-hand column of this page unless your
browser is blocking ads) is an overview of a standard introductory
Statistics class with typical examples. The book solves all its
examples with standard statistical formulas and tables. I have taken
each of the book's worked-out examples and shown here how to solve them
using Resampling Stats without any formulas. Most of the problems end
up having quite simple Resampling Stats solutions.
You can cut examples out, paste them in
Statistics101's editor window
and run them.
'From CliffsQuickReview Statistics, p. 38,
example 1
'What is the probability of simultaneously
'flipping 3 coins and having them all land
heads?
COPY (0 1) coin
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 3 coin flip
COUNT flip =1 heads
SCORE heads result
END
COUNT result =3 successes
DIVIDE successes rptCount probability
PRINT probability
'From CliffsQuickReview Statistics, p. 39,
example 2
'What is the probability of randomly
drawing an ace
'from a deck of cards (without
replacement) and then
'drawing an ace again from the same deck
on the next
'draw? Calculated answer= 1/(52*51) =
0.000377.
COPY 1,13 1,13 1,13 1,13 deck
COPY 100000 rptCount
REPEAT rptCount
SHUFFLE deck deck
TAKE deck 1 card1
IF card1 =1
TAKE deck 2
card2
IF card2
=1
SCORE
1 successes
END
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount
probability
PRINT probability
'From CliffsQuickReview Statistics, p. 41,
example 3
'What is the probability of at least one
spade or one
'club being randomly chosen in one draw
from a deck
'of cards? Calculated result: 13/52 +
13/52 = 0.5.
COPY 13#1 13#2 13#3 13#4 deck
'(1=spade, 2=club, 3=heart,
4=diamond)
COPY 1000 rptCount
REPEAT rptCount
SAMPLE 1 deck suit
IF suit =1
SCORE 1
successes
END
IF suit =2
SCORE 1
successes
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount
probability
PRINT probability
'From CliffsQuickReview Statistics, p. 42,
example 4
'What is the probability of at least one
head in
'two coin flips? Calculated result:
0.75
COPY 0,1 coin
COPY 1000 rptCount
REPEAT rptCount
SAMPLE 2 coin flips
SUM flips heads
IF heads >=1
SCORE 1
successes
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount
probability
PRINT probability
5
Back to top
Drawing either a spade or an ace from a deck of cards
'From CliffsQuickReview Statistics, p. 43,
example 5
'What is the probability of drawing either
a spade
'or an ace from a deck of cards?
'Calculated result: 16/52 = 0.308
'Simple method:
'4 aces + 13 spades - 1 ace of spaces =
16
COPY 16#1 36#2 deck
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 1 deck card
SCORE card result
END
COUNT result =1 successes
DIVIDE successes rptCount probability
PRINT probability
' More general alternative way
COPY 1,13 value
COPY 1,4 suit
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 1 value cardValue
IF cardValue = 1
SCORE 1
successes
END
IF cardValue
<> 1
SAMPLE 1
suit cardSuit
IF cardSuit
= 1
SCORE
1 successes
END
END
END
COUNT successes =1 successCount
DIVIDE successCount rptCount
probability
PRINT probability
'From CliffsQuickReview Statistics, p. 47,
example 6
'If you flip a coin 10 times what is the
'probability of getting exactly 5
heads?
'Calculated result using binomial formula:
0.246
COPY 0,1 coin
'heads = 1
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 coin flips
SUM flips heads
'count heads
IF heads = 5
SCORE 1
result
END
END
COUNT result =1 successes
DIVIDE successes rptCount probability
PRINT probability
7
Back to top
Mean and standard deviation for 10 flips of a fair coin
'From CliffsQuickReview Statistics, p. 47,
example 7
'What is the mean and standard deviation
for a
'binomial probability distribution for 10
flips
'of a fair coin?
'Calculated result using binomial formula:
'mean = 5, standard deviation = 1.58
COPY 0,1 coin
'heads =
1
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 coin flips
SUM flips heads
'count heads
SCORE heads result
'save the count in a list
END
MEAN result mean
STDEV result stdDev
PRINT mean stdDev
'from CliffsQuickReview Statistics p. 54
Example 1
'If the population mean of number of fish
caught
'per trip to a particular fishing hole is
3.2
'and the population standard deviation is
1.8,
'what are the population mean and the
standard
'error of the mean of 40 trips?
'NOTE: you can plug in different numbers
for
'popStdDev, popMean, and sampleSize to
compute
'any standard error of the mean.
COPY 1.8 popStdDev
COPY 3.2 popMean
COPY 40 sampleSize
REPEAT 1000
NORMAL sampleSize popMean
popStdDev sample
MEAN sample sampleMean
SCORE sampleMean means
END
MEAN means popMean
STDEV means stdError
PRINT popMean stdError
' From CliffsQuickReview Statistics, p.
56, example 2
' A normal distribution of retail store
purchases has
' a mean of $14.31 and a standard
deviation of 6.40.
' What percentage of purchases were under
$10?
COPY 100000 size
NORMAL size 14.31 6.4 population
COUNT population <=10.0
purchasesBelowTen
DIVIDE purchasesBelowTen size
percentageBelowTen
PRINT percentageBelowTen
'From CliffsQuickReview Statistics, p. 58,
example 3
'A normal distribution of retail store
purchases
'has a mean of $14.31 and a standard
deviation of
'6.40. What purchase amount marks the
lower 10%
'of the distribution?
COPY 100000 size
NORMAL size 14.31 6.4 population
PERCENTILE population (10) pcval
PRINT pcval
'From CliffsQuickReview Statistics, p. 60,
example 4
'Assuming an equal chance of a new baby
being a
'boy or a girl (that is pi=0.5), what is
the
'likelihood that 60 or more out of the
next 100
'births at a local hospital will be
boys?
'The answer computed from the
cumulative
'binomial distribution is 0.02844. The
book's answer,
'0.0228, is based on the normal
approximation to the
'binomial, and is therefore somewhat in
error.
COPY (0 1) birth '0 =
girl, 1 = boy
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 100 birth births
COUNT births =1 boys
SCORE boys results
END
COUNT results >=60 successes
DIVIDE successes rptCount probability
PRINT probability
'From Cliffs QuickReview: Statistics pg
71
'avg wt of 10 player sample is 198
lbs
'population std dev is 11.5 lbs.
'What is the 90% confidence interval for
the
'population weight if you assume the
player's
'weights are normally distributed?
REPEAT 1000
NORMAL 10 198 11.5
weights
MEAN weights avg
SCORE avg averages
END
PRINT averages
'histogram averages
PERCENTILE averages (5 95)
confidenceInterval
PRINT confidenceInterval
'From Cliffs QuickReview: Statistics pg
75
'avg age of 50 viewer sample is 19
yrs
'population std dev is 1.7 yrs.
'What is the 90% confidence interval for
the
'viewer age if you assume the player's
ages
'are normally distributed
REPEAT 1000
NORMAL 50 19 1.7 ages
MEAN ages avg
SCORE avg averages
END
PRINT averages
histogram averages
PERCENTILE averages (5 95)
confidenceInterval
PRINT confidenceInterval
'From: CliffsQuickReview Statistics, p 77,
Example 1.
'A herd of 1500 steers was fed a special
high-protein
'diet for a month. A random sample of 29
were
'weighed and had gained an average of 6.7
pounds.
'If the standard deviation of weight gain
for the
'entire herd is 7.1, what is the
likelihood that the
'average weight gain per steer for
the
'month was at least 5 pounds?
'Null hypothesis: avg gain was
< 5.
'Reject null hypothesis if probability
< 0.05.
COPY 10000 numTrials
REPEAT numtrials
NORMAL 29 6.7 7.1 sample
MEAN sample avgGain
IF avgGain < 5
SCORE 1
successes 'score gains < 5 for null hypothesis
END
END
COUNT successes = 1 successCount
DIVIDE successCount numTrials
probability
PRINT probability
IF probability < 0.05
OUTPUT "Null hypothesis is
rejected.\n"
END
IF probability >= 0.05
OUTPUT "Null hypothesis is
NOT rejected.\n"
END
'From: CliffsQuickReview Statistics, p 77,
Example 2.
'In national use, a vocabulary test is
known to
'have a mean score of 68 and a standard
deviation
'of 13. A class of 19 students takes the
test and
'has a mean score of 65. Is the class
typical of
'others who have taken the test?
'Assume a significance level of
p<0.05.
'Null hypothesis: avg gain was
< 5.
'Reject null hypothesis if probability
< 0.05.
REPEAT 1000
NORMAL 19 68 13 sample
MEAN sample sampleMean
SCORE sampleMean means
END
'This is a two tail problem, so divide the
0.05 in half
'to set the lower and upper limits.
PERCENTILE means (2.5 97.5) limits
'Confidence
interval
PRINT limits
TAKE limits 1 lowLimit
TAKE limits 2 highLimit
'Output the conclusion:
IF 65 between lowLimit highLimit
OUTPUT "Null hypothesis can
NOT be rejected.\n"
END
'From: CliffsQuickReview Statistics, p
78.
'A sample of 12 machine pins has a mean
diameter
'of 1.15 inches, and the population
standard
'deviation is known to be 0.04. What is a
99
'percent confidence interval of diameter
width
'for the population?
'Note that the 99 percent interval is from
0.5% to 99.5%.
COPY 1000 numTrials
REPEAT numTrials
NORMAL 12 1.15 0.04
sample
MEAN sample mean
SCORE mean means
END
PERCENTILE means (0.5 99.5)
confidenceInterval
PRINT confidenceInterval
17
Back to top
Hypothesis test (SD unknown. t distribution one tail)
'From cliffsQuickReview Statistics p. 80,
example 5
'A professor wants to know if her
introductory
'statistics class has a good grasp of
basic math.
'Six students are chosen at random from
the class
'and given a math proficiency test. The
professor
'wants the class to be able to score at
least 70
'on the test. The six students get scores
of
'62 92 75 68 83 95. Can the professor be
at least
'90 percent certain that the mean score
for the class
'on the test would be at least 70?
'Null hypothesis: mean score
< 70.
COPY (62 92 75 68 83 95) scores
MEAN scores actualScoresMean
'Computed for reference only
STDEV scores
actualScoresStdDev 'Computed for ref. only
COPY 1000 numTrials
REPEAT numTrials
SAMPLE 6 scores sample
MEAN sample sampleMean
IF sampleMean
< 70
SCORE 1
successes
END
END
COUNT successes = 1 result
DIVIDE result numTrials probability
PRINT actualScoresMean actualScoresStdDev
probability
18
Back to top
Hypothesis test (SD unknown. t distribution two tail)
'From CliffsQuickReview Statistics,
Example 6, Page 81:
'A Little League baseball coach wants to
know if
'his team is representative of other teams
in scoring
'runs. Nationally, the average number of
runs scored
'by a Little League team in a game is 5.7.
He
'chooses five games at random in which his
team
'scored 5, 9, 4, 11, and 8 runs. Is it
likely that
'his team's scores could have come from
the
'national distribution?
'Assume an alpha level of 0.05.
'Null hypothesis: Team's mean equals the
national
'mean (5.7).
COPY (5 9 4 11 8) gameScores
MEAN gameScores mean
PRINT mean
COPY 1000 numTrials
REPEAT numTrials
SAMPLE 5 gameScores
newSample
MEAN newSample
newSampleMean
SCORE newSampleMean
means
END
'This is a two-tail problem, so the 0.05,
or
'5 percent should be split between the
high
'and low end of the range.
PERCENTILE means (2.5 97.5)
meansRange
PRINT meansRange
'Print conclusion:
TAKE meansRange 1 lowLim
TAKE meansRange 2 highLim
IF 5.7 between lowLim highLim
OUTPUT "Null hypothesis can
not be rejected\n"
END
19
Back to top
Confidence interval for population mean using t
'From CliffsQuickReview Statistics,
Example 7, Page 82:
'Using the Little League baseball data
from the previous
'example, what is a 95 percent confidence
interval for
'runs scored per team per game?
'Repeating the previous examples info:
Nationally,
'the average number of runs scored by a
Little League team in a
'game is 5.7. He chooses five games at
random in which his team
'scored 5, 9, 4, 11, and 8 runs.
'ANS: In resampling terms, this is really
the same problem as
'the previous one. The only difference is
that here we're not
'deciding whether to reject a Null
hypothesis.
COPY (5 9 4 11 8) gameScores
MEAN gameScores mean
PRINT mean
COPY 1000 numTrials
REPEAT numTrials
SAMPLE 5 gameScores
newSample
MEAN newSample
newSampleMean
SCORE newSampleMean
means
END
PERCENTILE means (2.5 97.5)
confidenceInterval
PRINT confidenceInterval
20
Back to top
Two-sample z-test for comparing two means
'From CliffsQuickReview Statistics,
Example 8, Page 83:
'The amount of a certain trace element in
blood is
'known to vary with a standard deviation
of 14.1ppm
'(parts per million) for male blood donors
and 9.5 ppm
'for female donors. Random samples of 75
male and 50
'female donors yield concentration means
of 28 and
'33 ppm, respectively. What is the
likelihood that the
'population means of concentrations of the
element are
'the same for men and women?
'Null hypothesis: the means are the same
(their difference
'is zero).
'Alternate hypothesis: the means are
different.
'SOLUTION: create male and female samples
with the given
'sample sizes and standard deviations, but
with the same
'means. For the common mean you can use
either the male
'mean (28), the female mean (33), or the
mean of those
'two means (30.5). In this program the
significanceLevel,
'the commonMean, and the rptCount have
been made variables
'at the top of the program so you can
easily change them.
COPY 0.05 significanceLevel
COPY 28 commonMean 'assume same mean
for both
COPY 1000 rptCount
REPEAT rptCount
NORMAL 75 commonMean 14.1
maleSample
NORMAL 50 commonMean 9.5
femaleSample 'assume same mean for both
MEAN maleSample
maleSampleMean
MEAN femaleSample
femaleSampleMean
SUBTRACT maleSampleMean
femaleSampleMean difference
SCORE difference
differences
END
ABS differences differences 'make
differences positive
PRINT differences
COUNT differences >5 outliers
DIVIDE outliers rptCount probability
PRINT probability
'Print out the conclusion:
IF probability
< significanceLevel
OUTPUT "Null hypothesis is
rejected at a significance level of %10.4F.\n" significanceLevel
END
IF probability >=
significanceLevel
OUTPUT "Null hypothesis is
NOT rejected at a significance level of %10.4F.\n"
significanceLevel
END
21
Back to top
Two-sample t-test for comparing two means (hypothesis test)
'From CliffsQuickReview Statistics,
Example 9, Page 84:
'An experiment is conducted to determine
whether
'intensive tutoring is more effective than
paced tutoring.
'Two randomly chosen groups are tutored
separately and
'then administered proficiency tests. Use
a significance
'level of alpha < 0.05.
'DATA:
'Group Method
n
sampleMean sampleStdDev
' 1 intensive
12
46.31
6.44
' 2 paced
10 42.79
7.52
'
'Null hypothesis: mean of intensive
tutoring is <= that
'of paced tutoring.
'
'SOLUTION: In the problem, the authors
give the summary
'statistics. These statistics came from
sampled data.
'It would be better, using the Resampling
method, to work
'with the actual data rather than the
summary statistics.
'But since that data is unavailable, we'll
use the summary
'statistics to generate our own
samples.
COPY 1000 rptCount
REPEAT rptCount
NORMAL 12 46.31 6.44
intensiveSample
NORMAL 10 42.79 7.52
pacedSample
MEAN intensiveSample
intensiveMean
MEAN pacedSample
pacedMean
IF intensiveMean <=
pacedMean
SCORE 1
successes
END
END
COUNT successes = 1 successCount
DIVIDE successCount rptCount
probability
PRINT probability
IF probability >= 0.05
OUTPUT "Null hypothesis is
accepted.\n"
END
IF probability < 0.05
OUTPUT "Null hypothesis is
rejected.\n"
END
22
Back to top
Two-sample t-test for comparing two means (confidence interval)
'From CliffsQuickReview Statistics,
Example 10, Page 85:
'Estimate a 90 percent confidence interval
for the difference
'between the number of raisins per box in
two brands of
'breakfast cereal.
'
'DATA:
'Brand sampleSize
sampleMean
sampleStdDev
' A
6
102.1
12.3
' B
9
93.6
7.52
'
'SOLUTION: In the problem, the authors
give the summary
'statistics. These statistics came from
sampled data.
'It would be better, using the Resampling
method, to work
'with the actual data rather than the
summary statistics.
'But since that data is unavailable, we'll
use the summary
'statistics to generate our own
samples.
COPY 1000 rptCount
REPEAT rptCount
NORMAL 6 102.1 12.3
brandASample
NORMAL 9 93.6 7.52
brandBSample
MEAN brandASample
brandAMean
MEAN brandBSample
brandBMean
SUBTRACT brandAMean
brandBMean diff
SCORE diff differences
END
PERCENTILE differences (5 95)
confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics,
Example 11, Page 87:
'Does right- or left-handedness affect how
fast people type?
'Random samples of students from a typing
clas are given
'a typing speed test (words per minute)
and the results
'are compared. Significance level for the
test: 0.10.
'Because you are looking for a difference
between the
'groups in either direction, this is a
two-tailed test.
'Null hypothesis: Means are equal.
'
'DATA:
'Group sampleSize
sampleMean
sampleStdDev
'right
16
55.8
5.7
'left
9
59.3
4.3
'
'SOLUTION: In the problem, the authors
give the summary
'statistics. These statistics came from
sampled data.
'It would be simpler, using the Resampling
method, to work
'with the actual data rather than the
summary statistics.
'But since that data is unavailable, we'll
use the summary
'statistics to generate our own
samples.
'Since the authors are using "variance
pooling",
'which assumes that the (unknown) standard
deviations are equal,
'we will simulate that process to choose a
pooled standard
'deviation and while we are at it, a
pooled mean.
'Compute the "pooled" statistics:
REPEAT 1000
NORMAL 16 55.8 5.7
rightSample
NORMAL 9 59.3 4.3
leftSample
COPY rightSample leftSample
pooledSample
STDEV pooledSample
pooledSampleStdDev
MEAN pooledSample
pooledSampleMean
SCORE pooledSampleStdDev
stdDevs
SCORE pooledSampleMean
means
END
MEAN stdDevs pooledStdDev
MEAN means pooledMean
PRINT pooledMean pooledStdDev
COPY 1000 rptCount
REPEAT rptCount
NORMAL 16 pooledMean
pooledStdDev rightSample
NORMAL 9 pooledMean
pooledStdDev leftSample
MEAN rightSample
rightMean
MEAN leftSample leftMean
SUBTRACT rightMean leftMean
diff
SCORE diff differences
END
PERCENTILE differences (5 95)
acceptanceRegion
PRINT acceptanceRegion
TAKE acceptanceRegion 1 lowLimit
TAKE acceptanceRegion 2 highLimit
OUTPUT "Conclusion: The Null hypothesis is
"
'(3.5 is the difference between the
original sample means)
IF 3.5 between lowLimit highLimit
OUTPUT "NOT "
END
OUTPUT "rejected.\n"
'From CliffsQuickReview Statistics,
Example 12, Page 88:
'A farmer decides to try out a new
fertilizer on a test plot
'containing 10 stalks of corn. Before
applying the fertilizer,
'he measures the height of each stalk. Two
weeks later, he
'measures the stalks again, being careful
to match each
'stalk's new height to its previous one.
The stalks would
'have grown an average of six inches
during that time even
'without the fertilizer. Did the
fertilizer help? Use a
'significance level of 0.05.
'Null hypothesis: Fertilizer had no
effect, i.e., height
'change <= 6.
copy 0.05 significanceLevel
COPY 1000 rptCount
COPY (35.5 31.7 31.2 36.3 22.8 28.0 24.6
26.1 34.5 27.7)
beforeHeights
COPY (45.3 36.0 38.6 44.7 31.4 33.5 28.8
35.8 42.9 35.0)
afterHeights
SUBTRACT afterHeights beforeHeights
changes
REPEAT rptCount
SAMPLE 10 changes
bootstrapSample
MEAN bootstrapSample
sampleMean
SCORE sampleMean means
END
COUNT means <=6 successes
DIVIDE successes rptCount probability
PRINT probability
OUTPUT "Conclusion: null hypothesis is
"
IF probability <=
significanceLevel
OUTPUT "NOT "
END
OUTPUT "accepted at the %10.4F
significance level\n"
significanceLevel
25
Back to top Test
for a single population proportion (hypothesis test)
'From CliffsQuickReview Statistics,
Example 13, Page 89:
'The sponsors of a city marathon have been
trying to encourage
'more women to participate in the event. A
sample of 70 runners
'is taken, of which 32 are women. The
sponsors would like to
'be 90 percent certain that at least 40
percent of the
participants
'are women. Were their recruitment efforts
successful?
'Null hypothesis: sample proportion
< 0.4
'Alternate hypothesis: sample proportion
>= 0.4
COPY 1000 rptCount
COPY 0.1 significanceLevel ' 100% -
90% as a decimal
fraction
COPY 38#0 32#1 runners '0=men,
1=women
REPEAT rptCount
SAMPLE 70 runners
newSample
COUNT newSample =1 women
DIVIDE women 70.0
proportion
SCORE proportion results
END
COUNT results < 0.4 successes
DIVIDE successes rptCount probability
PRINT probability
OUTPUT "Conclusion: null hypothesis is
"
IF probability
< significanceLevel
OUTPUT "NOT "
END
OUTPUT "accepted at the %10.4F
significance level.\n"
significanceLevel
26
Back to top Test
for a single population proportion (confidence interval)
'From CliffsQuickReview Statistics,
Example 14, Page 90:
'A sample of 100 voters selected at random
in a congressional
district
'prefer Candidate Smith to Candidate Jones
by a ratio of 3 to 2.
'What is a 95 percent confidence interval
of the percentage of
'voters in the district who prefer
Smith?
COPY 1000 rptCount
COPY 100 sampleSize
COPY 3#1 2#2 voters '1=Smith
2=Jones
REPEAT rptCount
SAMPLE sampleSize voters
sample
COUNT sample =1
smithVoters
SCORE smithVoters
results
END
DIVIDE results sampleSize results
PERCENTILE results (2.5 97.5)
confidenceInterval
PRINT confidenceInterval
27
Back to top
Choosing a sample size for a given confidence interval
'From CliffsQuickReview Statistics,
Example 15, Page 91:
'How large a sample is needed to estimate
the preference of
'voters for Candidate Smith with a margin
of error of
'+ or - 4 percent at a 95 percent
significance level?
'To be conservative, assume voters are
split 50/50.
'This one requires a little trial and
error on your part.
'You choose a sample size, run the program
and see if you
'get a confidence interval of around (0.46
0.54). If not,
'choose another sample size and try again.
After a few
'tries you'll settle on 600 as the right
choice.
COPY 600 sampleSize
COPY 1000 rptCount
COPY (1 2) voters '1=Smith 2=Jones.
Assume voters 50% split
REPEAT rptCount
SAMPLE sampleSize voters
sample
COUNT sample =1
smithVoters
SCORE smithVoters
results
END
DIVIDE results sampleSize results
PERCENTILE results (2.5 97.5)
confidenceInterval
PRINT confidenceInterval
28
Back to top
Comparing two proportions (hypothesis test)
'From CliffsQuickReview Statistics,
Example 16, Page 92:
'A swimming school wants to determine
whether a recently
'hired instructor is working out. Sixteen
out of 25 of
'Instructor A's students passed the
lifeguard certification
'test on the first try. In comparison, 57
out of 72 of more
'experienced Instructor B's students
passed the test on the
'first try. Is Instructor A's success rate
worse than
'Instructor B's? Use alpha = 0.10.
'Null hypothesis: A's rate is >= B's
rate
'Alternate hypothesis: A's rate is
< B's rate
'This is a one-tailed test.
COPY 1000 rptCount
COPY 0.10 significanceLevel
COPY 16#1 9#0 studentsOfA '1=passed,
0=failed
COPY 57#1 15#0 studentsOfB
REPEAT rptCount
SAMPLE 25 studentsOfA
sampleA
SAMPLE 72 studentsOfB
sampleB
COUNT sampleA =1 passedA
COUNT sampleB =1 passedB
DIVIDE passedA 25
passedARate
DIVIDE passedB 72
passedBRate
IF passedARate >=
passedBRate
SCORE 1
successes
END
END
COUNT successes =1 successesA
DIVIDE successesA rptCount
probability
PRINT probability
OUTPUT "Conclusion: null hypothesis is
"
IF probability
< significanceLevel
OUTPUT "NOT "
END
OUTPUT "accepted at a %10.4F significance
level."
significanceLevel
29
Back to top
Comparing two proportions (confidence interval)
'From CliffsQuickReview Statistics,
Example 17, Page 93:
'A public health researcher wants to know
how two high
'schools, one in the inner city and one in
the suburbs,
'differ in the percentage of students who
smoke. A
'random survey of students gives the
following results:
'
'Population sampleSize
Smokers
'inner-city 125
47
'suburban
153
52
'
'What is a 90 percent confidence interval
for the
'difference between the two schools?
COPY 1000 rptCount
COPY 47#1 78#0 innerCity
COPY 52#1 101#0 suburban
REPEAT rptCount
SAMPLE 125 innerCity
innerCitySample
SAMPLE 153 suburban
suburbanSample
COUNT innerCitySample =1
innerCitySmokers
COUNT suburbanSample =1
suburbanSmokers
DIVIDE innerCitySmokers 125
innerCityPercentage
DIVIDE suburbanSmokers 153
suburbanPercentage
SUBTRACT innerCityPercentage
suburbanPercentage difference
SCORE difference
differences
END
PERCENTILE differences (5 95)
confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics,
Example 1, Page 99:
'Compute the correlation coefficient for
the relationship
'between months of exercise-machine
ownership and hours
'of exercise per week. (The data is given
in the program
'below.
'
'NOTE: this is not a resampling or Monte
Carlo simulation.
'It is simply a use of the Statistics101
built-in CORR
'command, which computes the Pearson's
product moment
'correlation coefficient.
DATA (5 10 4 8 2 7 9 6 1 12)
monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3)
hoursExercised
CORR monthsOwned hoursExercised
correlationCoefficient
PRINT correlationCoefficient
31
Back to top
Finding significance of the Correlation Coefficient
'From CliffsQuickReview Statistics,
Example 1 partB, Page 100:
'Compute the significance level for the
correlation
'coefficient for the relationship between
months of exercise-machine
'ownership and hours of exercise per week.
(The data is given in
'the program below.)
'
'The null hypothesis is that the data are
not correlated, i.e.,
'that the population correlation
coefficient = 0.
'Therefore, we can bootstrap the two data
items separately.
'That means we choose pairs of elements
independently.
'Then we see how often the original
sample's correlation
'coefficient, r,(independent of its
sign)
'shows up based on the assumption that
they are uncorrelated.
DATA (5 10 4 8 2 7 9 6 1 12)
monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3)
hoursExercised
CORR monthsOwned hoursExercised r
PRINT "Sample correlation coefficient: "
r
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 monthsOwned
monthsOwnedBootstrap
SAMPLE 10 hoursExercised
hoursExercisedBootstrap
CORR monthsOwnedBootstrap
hoursExercisedBootstrap bootstrapCorrelationCoefficient
SCORE
bootstrapCorrelationCoefficient correlationCoefficients
END
HISTOGRAM percent binsize 0.1
correlationCoefficients
'Compute 2-sided probability:
rPlus = ABS(r)
rMinus = -ABS(r)
COUNT correlationCoefficients <= rMinus
coeffCountMinus
COUNT correlationCoefficients >= rPlus
coeffCountPlus
significanceLevel = (coeffCountMinus +
coeffCountPlus) / rptCount
PRINT significanceLevel
32
Back to top
Confidence interval for the Correlation Coefficient
'This problem is not in the
CliffsQuickReview book. I've just
'added it to demonstrate the
technique.
'
'Compute the 95 percent confidence
interval for the correlation
'coefficient for the relationship between
months of
exercise-machine
'ownership and hours of exercise per week.
(The data is given in
'the program below.
'
'Since the data pairs are correlated, we
must sample them
'in pairs, always taking for any random
position in one,
'the corresponding element in the other.
We do that using
'a "chooser" variable and the TAKE command
as you see below.
'
DATA (5 10 4 8 2 7 9 6 1 12)
monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3)
hoursExercised
COPY 1000 rptCount
REPEAT rptCount
SAMPLE 10 1,10 chooser
TAKE monthsOwned chooser
monthsOwnedBootstrap
TAKE hoursExercised chooser
hoursExercisedBootstrap
CORR monthsOwnedBootstrap
hoursExercisedBootstrap correlationCoefficient
SCORE correlationCoefficient
coefficients
END
percentile coefficients (2.5 97.5)
confidenceInterval
PRINT confidenceInterval
'Here is a solution to the problem using
the "jackknife" method
'instead of the "bootstrap" used above.
Thanks to Gaj Vidmar for
'this solution.
DATA (5 10 4 8 2 7 9 6 1 12)
monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3)
hoursExercised
COPY 1,10 is
FOREACH i is
WEED is =i j
TAKE monthsOwned j
monthsOwnedJackknife
TAKE hoursExercised j
hoursExercisedJackknife
CORR monthsOwnedJackknife
hoursExercisedJackknife correlationCoefficientJackknife
SCORE
correlationCoefficientJackknife coefficientsJackknife
END
PERCENTILE coefficientsJackknife (2.5
97.5)
confidenceIntervalJackknife
PRINT confidenceIntervalJackknife
'From CliffsQuickReview Statistics, Page
102:
'Compute the linear regression
coefficients for
'the relationship between months of
exercise-machine
'ownership and hours of exercise per week.
(The data
'is given in the program below.
'
'NOTE: this is not a resampling or Monte
Carlo simulation.
'It is simply a use of the Statistics101
built-in REGRESS
'command, which computes the linear
regression coefficients.
'
DATA (5 10 4 8 2 7 9 6 1 12)
monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3)
hoursExercised
REGRESS hoursExercised monthsOwned
coefficients
PRINT coefficients
TAKE coefficients 1 slope
TAKE coefficients 2 yIntercept
PRINT slope yIntercept
34
Back to top
Confidence interval for the linear regression slope
'From CliffsQuickReview Statistics,
Example 2 Page 105:
'Compute the 95% confidence interval for
the slope of the
'regression line for the relationship
between months of
'exercise-machine ownership and hours of
exercise per week.
'(The data is given in the program
below.)
'
'Since the data pairs are correlated, we
must sample them
'in pairs, always taking for any random
position in one,
'the corresponding element in the other.
We do that using
'a "chooser" variable and the TAKE command
as you see below.
'
DATA (5 10 4 8 2 7 9 6 1 12)
monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3)
hoursExercised
COPY 10000 rptCount
REPEAT rptCount
SAMPLE 10 1,10 chooser
TAKE monthsOwned chooser
monthsOwnedBootstrap
TAKE hoursExercised chooser
hoursExercisedBootstrap
REGRESS
hoursExercisedBootstrap monthsOwnedBootstrap linearCoefficients
TAKE linearCoefficients 1
slope
SCORE slope slopes
END
PERCENTILE slopes (2.5 97.5)
confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics,
Example 3 Page 107:
'What is a 90% confidence interval for the
number of
'hours spent exercising per week if the
exercise machine
'is owned 11 months?
'(The data is given in the program
below.)
'
'Since the data pairs are correlated, we
must sample them
'in pairs, always taking for any random
position in one,
'the corresponding element in the other.
We do that using
'a "chooser" variable and the TAKE command
as you see below.
'
DATA (5 10 4 8 2 7 9 6 1 12)
monthsOwned
DATA (5 2 8 3 8 5 5 7 10 3)
hoursExercised
COPY 100000 rptCount
REPEAT rptCount
SAMPLE 10 1,10 chooser
TAKE monthsOwned chooser
monthsOwnedBootstrap
TAKE hoursExercised chooser
hoursExercisedBootstrap
REGRESS
hoursExercisedBootstrap monthsOwnedBootstrap linearCoefficients
TAKE linearCoefficients 1
slope
TAKE linearCoefficients 2
yIntercept
'compute y value for x = 11
months:
MULTIPLY 11 slope term1
ADD term1 yIntercept
yValue
SCORE yValue yValues
END
PERCENTILE yValues (5 95)
confidenceInterval
PRINT confidenceInterval
'From CliffsQuickReview Statistics, Page
110:
'Suppose 125 children are shown three TV
commercials
'A, B, and C, for breakfast cereal and are
asked to
'pick which they liked best. The results
are:
'
'
A
B
C
Totals
'Boys 30
29
16
75
'Girls 12
33
5
50
'Totals 42
62
21 125
'
'Is the choice of favorite commercial
related to
'whether the child is a boy or a
girl?
'Null hypothesis: the commercial choice is
not
'related to the sex of the child. This can
be
'restated as: How often (or what is
the
'probability that) the contents of the six
inner
'cells would be as far or farther than
they
'currently are from their expected
values?
'Compare results for alpha = 0.05 vs.
alpha =0.01.
'
'Setup vectors to hold the expected values
and
'the observed values of the table.
COPY (25.2 37.2 12.6 16.8 24.8 8.4)
expectedValues
COPY (30 29 16 12 33 5)
observedData
CHISQUARE observedData expectedValues
chiSquare
PRINT chiSquare
'Compute and record (SCORE) chi-square
values for
'many simulated table cell entries.
COPY 42#1 62#2 21#3 ads
COPY 5000 rptcount
REPEAT rptcount
SHUFFLE ads ads
TAKE ads 1,75
boys
TAKE ads 76,125 girls
COUNT boys =1 boysAdA
COUNT boys =2 boysAdB
COUNT boys =3 boysAdC
COUNT girls =1 girlsAdA
COUNT girls =2 girlsAdB
COUNT girls =3 girlsAdC
'Rebuild a new table with the
simulated data
COPY boysAdA boysAdB boysAdC
girlsAdA girlsAdB girlsAdC observedData$
CHISQUARE observedData$
expectedValues chiSquare$
SCORE chiSquare$
chiSquareScores
END
COUNT chiSquarescores >= chiSquare
chiCount
DIVIDE chiCount rptCount
significanceLevel
PRINT significanceLevel
OUTPUT "Conclusions:\n"
OUTPUT "The null hypothesis is "
IF significanceLevel >= 0.05
OUTPUT "NOT "
END
OUTPUT "rejected at the 0.05 significance
level.\n"
OUTPUT "The null hypothesis is "
IF significanceLevel >= 0.01
OUTPUT "NOT "
END
OUTPUT "rejected at the 0.01 significance
level.\n"
|