Resampling Statistics Examples

Affiliate Link Disclosure

Statistics Textbooks

Books on Bootstrapping

Statistics Equations Reference Chart

Parameters, Variables, Intervals, Proportions Reference Chart

Probability Quick Study Reference Chart

These examples are taken from the book CliffsNotes Statistics Quick Review (Cliffsquickreview). That book (shown in the right-hand column of this page unless your browser is blocking ads) is an overview of a standard introductory Statistics class with typical examples. The book solves all its examples with standard statistical formulas and tables. I have taken each of the book's worked-out examples and shown here how to solve them using Resampling Stats without any formulas. Most of the problems end up having quite simple Resampling Stats solutions.

You can cut examples out, paste them in Statistics101's editor window and run them.

1 Flipping three coins

2 Drawing two aces at random

3 One spade or one club

4 At least one head in two coin flips

5 Drawing either a spade or an ace from a deck of cards

6 Exactly five heads out of ten

7 Mean and standard deviation for 10 flips of a fair coin

8 Standard error of the mean

9 Area under normal curve

10 Percentile

11 Sixty boys out of next 100 births

12 Confidence interval (SD known)

13 Confidence interval (SD known)

14 Hypothesis test 1 (SD known)

15 Hypothesis test 2 (SD known)

16 Confidence interval (SD known)

17 Hypothesis test (SD unknown. t distribution one tail)

18 Hypothesis test (SD unknown. t distribution two tail)

19 Confidence interval for population mean using t

20 Two-sample z-test for comparing two means

21 Two-sample t-test for comparing two means (hypothesis test)

22 Two-sample t-test for comparing two means (confidence interval)

23 Pooled Variance method

24 Paired difference t-test

25 Test for a single population proportion (hypothesis test)

26 Test for a single population proportion (confidence interval)

27 Choosing a sample size for a given confidence interval

28 Comparing two proportions (hypothesis test)

29 Comparing two proportions (confidence interval)

30 Correlation Coefficient

31 Finding significance of the Correlation Coefficient

32 Confidence interval for the Correlation Coefficient

33 Simple Linear Regression

34 Confidence interval for the linear regression slope

35 Confidence interval for the prediction

36 Chi-square test

1 Back to top Flipping three coins

'From CliffsQuickReview Statistics, p. 38, example 1

'What is the probability of simultaneously

'flipping 3 coins and having them all land heads?

COPY (0 1) coin

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 3 coin flip

COUNT flip =1 heads

SCORE heads result

END

COUNT result =3 successes

DIVIDE successes rptCount probability

PRINT probability

2 Back to top Drawing two aces at random

'From CliffsQuickReview Statistics, p. 39, example 2

'What is the probability of randomly drawing an ace

'from a deck of cards (without replacement) and then

'drawing an ace again from the same deck on the next

'draw? Calculated answer= 1/(52*51) = 0.000377.

COPY 1,13 1,13 1,13 1,13 deck

COPY 100000 rptCount

REPEAT rptCount

SHUFFLE deck deck

TAKE deck 1 card1

IF card1 =1

TAKE deck 2 card2

IF card2 =1

SCORE 1 successes

END

COUNT successes =1 successCount

DIVIDE successCount rptCount probability

PRINT probability

3 Back to top One spade or one club

'From CliffsQuickReview Statistics, p. 41, example 3

'What is the probability of at least one spade or one

'club being randomly chosen in one draw from a deck

'of cards? Calculated result: 13/52 + 13/52 = 0.5.

COPY 13#1 13#2 13#3 13#4 deck

'(1=spade, 2=club, 3=heart, 4=diamond)

COPY 1000 rptCount

REPEAT rptCount

SAMPLE 1 deck suit

IF suit =1

SCORE 1 successes

END

IF suit =2

SCORE 1 successes

END

COUNT successes =1 successCount

DIVIDE successCount rptCount probability

PRINT probability

4 Back to top At least one head in two coin flips

'From CliffsQuickReview Statistics, p. 42, example 4

'What is the probability of at least one head in

'two coin flips? Calculated result: 0.75

COPY 0,1 coin

COPY 1000 rptCount

REPEAT rptCount

SAMPLE 2 coin flips

SUM flips heads

IF heads >=1

SCORE 1 successes

END

COUNT successes =1 successCount

DIVIDE successCount rptCount probability

PRINT probability

5 Back to top Drawing either a spade or an ace from a deck of cards

'From CliffsQuickReview Statistics, p. 43, example 5

'What is the probability of drawing either a spade

'or an ace from a deck of cards?

'Calculated result: 16/52 = 0.308

'Simple method:

'4 aces + 13 spades - 1 ace of spaces = 16

COPY 16#1 36#2 deck

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 1 deck card

SCORE card result

END

COUNT result =1 successes

DIVIDE successes rptCount probability

PRINT probability

' More general alternative way

COPY 1,13 value

COPY 1,4 suit

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 1 value cardValue

IF cardValue = 1

SCORE 1 successes

END

IF cardValue <> 1

SAMPLE 1 suit cardSuit

IF cardSuit = 1

SCORE 1 successes

END

COUNT successes =1 successCount

DIVIDE successCount rptCount probability

PRINT probability

6 Back to top Exactly five heads out of ten

'From CliffsQuickReview Statistics, p. 47, example 6

'If you flip a coin 10 times what is the

'probability of getting exactly 5 heads?

'Calculated result using binomial formula: 0.246

COPY 0,1 coin 'heads = 1

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 10 coin flips

SUM flips heads 'count heads

IF heads = 5

SCORE 1 result

END

COUNT result =1 successes

DIVIDE successes rptCount probability

PRINT probability

7 Back to top Mean and standard deviation for 10 flips of a fair coin

'From CliffsQuickReview Statistics, p. 47, example 7

'What is the mean and standard deviation for a

'binomial probability distribution for 10 flips

'of a fair coin?

'Calculated result using binomial formula:

'mean = 5, standard deviation = 1.58

COPY 0,1 coin 'heads = 1

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 10 coin flips

SUM flips heads 'count heads

SCORE heads result 'save the count in a list

END

MEAN result mean

STDEV result stdDev

PRINT mean stdDev

8 Back to top Standard error of the mean

'from CliffsQuickReview Statistics p. 54 Example 1

'If the population mean of number of fish caught

'per trip to a particular fishing hole is 3.2

'and the population standard deviation is 1.8,

'what are the population mean and the standard

'error of the mean of 40 trips?

'NOTE: you can plug in different numbers for

'popStdDev, popMean, and sampleSize to compute

'any standard error of the mean.

COPY 1.8 popStdDev

COPY 3.2 popMean

COPY 40 sampleSize

REPEAT 1000

NORMAL sampleSize popMean popStdDev sample

MEAN sample sampleMean

SCORE sampleMean means

END

MEAN means popMean

STDEV means stdError

PRINT popMean stdError

9 Back to top Area under normal curve

' From CliffsQuickReview Statistics, p. 56, example 2

' A normal distribution of retail store purchases has

' a mean of $14.31 and a standard deviation of 6.40.

' What percentage of purchases were under $10?

COPY 100000 size

NORMAL size 14.31 6.4 population

COUNT population <=10.0 purchasesBelowTen

DIVIDE purchasesBelowTen size percentageBelowTen

PRINT percentageBelowTen

10 Back to top Percentile

'From CliffsQuickReview Statistics, p. 58, example 3

'A normal distribution of retail store purchases

'has a mean of $14.31 and a standard deviation of

'6.40. What purchase amount marks the lower 10%

'of the distribution?

COPY 100000 size

NORMAL size 14.31 6.4 population

PERCENTILE population (10) pcval

PRINT pcval

11 Back to top Sixty boys out of next 100 births

'From CliffsQuickReview Statistics, p. 60, example 4

'Assuming an equal chance of a new baby being a

'boy or a girl (that is pi=0.5), what is the

'likelihood that 60 or more out of the next 100

'births at a local hospital will be boys?

'The answer computed from the cumulative

'binomial distribution is 0.02844. The book's answer,

'0.0228, is based on the normal approximation to the

'binomial, and is therefore somewhat in error.

COPY (0 1) birth '0 = girl, 1 = boy

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 100 birth births

COUNT births =1 boys

SCORE boys results

END

COUNT results >=60 successes

DIVIDE successes rptCount probability

PRINT probability

12 Back to top Confidence interval (SD known)

'From Cliffs QuickReview: Statistics pg 71

'avg wt of 10 player sample is 198 lbs

'population std dev is 11.5 lbs.

'What is the 90% confidence interval for the

'population weight if you assume the player's

'weights are normally distributed?

REPEAT 1000

NORMAL 10 198 11.5 weights

MEAN weights avg

SCORE avg averages

END

PRINT averages

'histogram averages

PERCENTILE averages (5 95) confidenceInterval

PRINT confidenceInterval

13 Back to top Confidence interval (SD known)

'From Cliffs QuickReview: Statistics pg 75

'avg age of 50 viewer sample is 19 yrs

'population std dev is 1.7 yrs.

'What is the 90% confidence interval for the

'viewer age if you assume the player's ages

'are normally distributed

REPEAT 1000

NORMAL 50 19 1.7 ages

MEAN ages avg

SCORE avg averages

END

PRINT averages

histogram averages

PERCENTILE averages (5 95) confidenceInterval

PRINT confidenceInterval

14 Back to top Hypothesis test 1 (SD known)

'From: CliffsQuickReview Statistics, p 77, Example 1.

'A herd of 1500 steers was fed a special high-protein

'diet for a month. A random sample of 29 were

'weighed and had gained an average of 6.7 pounds.

'If the standard deviation of weight gain for the

'entire herd is 7.1, what is the likelihood that the

'average weight gain per steer for the

'month was at least 5 pounds?

'Null hypothesis: avg gain was < 5.

'Reject null hypothesis if probability < 0.05.

COPY 10000 numTrials

REPEAT numtrials

NORMAL 29 6.7 7.1 sample

MEAN sample avgGain

IF avgGain < 5

SCORE 1 successes 'score gains < 5 for null hypothesis

END

COUNT successes = 1 successCount

DIVIDE successCount numTrials probability

PRINT probability

IF probability < 0.05

OUTPUT "Null hypothesis is rejected.\n"

END

IF probability >= 0.05

OUTPUT "Null hypothesis is NOT rejected.\n"

END

15 Back to top Hypothesis test 2 (SD known)

'From: CliffsQuickReview Statistics, p 77, Example 2.

'In national use, a vocabulary test is known to

'have a mean score of 68 and a standard deviation

'of 13. A class of 19 students takes the test and

'has a mean score of 65. Is the class typical of

'others who have taken the test?

'Assume a significance level of p<0.05.

'Null hypothesis: avg gain was < 5.

'Reject null hypothesis if probability < 0.05.

REPEAT 1000

NORMAL 19 68 13 sample

MEAN sample sampleMean

SCORE sampleMean means

END

'This is a two tail problem, so divide the 0.05 in half

'to set the lower and upper limits.

PERCENTILE means (2.5 97.5) limits 'Confidence interval

PRINT limits

TAKE limits 1 lowLimit

TAKE limits 2 highLimit

'Output the conclusion:

IF 65 between lowLimit highLimit

OUTPUT "Null hypothesis can NOT be rejected.\n"

END

16 Back to top Confidence interval (SD known)

'From: CliffsQuickReview Statistics, p 78.

'A sample of 12 machine pins has a mean diameter

'of 1.15 inches, and the population standard

'deviation is known to be 0.04. What is a 99

'percent confidence interval of diameter width

'for the population?

'Note that the 99 percent interval is from 0.5% to 99.5%.

COPY 1000 numTrials

REPEAT numTrials

NORMAL 12 1.15 0.04 sample

MEAN sample mean

SCORE mean means

END

PERCENTILE means (0.5 99.5) confidenceInterval

PRINT confidenceInterval

17 Back to top Hypothesis test (SD unknown. t distribution one tail)

'From cliffsQuickReview Statistics p. 80, example 5

'A professor wants to know if her introductory

'statistics class has a good grasp of basic math.

'Six students are chosen at random from the class

'and given a math proficiency test. The professor

'wants the class to be able to score at least 70

'on the test. The six students get scores of

'62 92 75 68 83 95. Can the professor be at least

'90 percent certain that the mean score for the class

'on the test would be at least 70?

'Null hypothesis: mean score < 70.

COPY (62 92 75 68 83 95) scores

MEAN scores actualScoresMean 'Computed for reference only

STDEV scores actualScoresStdDev 'Computed for ref. only

COPY 1000 numTrials

REPEAT numTrials

SAMPLE 6 scores sample

MEAN sample sampleMean

IF sampleMean < 70

SCORE 1 successes

END

COUNT successes = 1 result

DIVIDE result numTrials probability

PRINT actualScoresMean actualScoresStdDev probability

18 Back to top Hypothesis test (SD unknown. t distribution two tail)

'From CliffsQuickReview Statistics, Example 6, Page 81:

'A Little League baseball coach wants to know if

'his team is representative of other teams in scoring

'runs. Nationally, the average number of runs scored

'by a Little League team in a game is 5.7. He

'chooses five games at random in which his team

'scored 5, 9, 4, 11, and 8 runs. Is it likely that

'his team's scores could have come from the

'national distribution?

'Assume an alpha level of 0.05.

'Null hypothesis: Team's mean equals the national

'mean (5.7).

COPY (5 9 4 11 8) gameScores

MEAN gameScores mean

PRINT mean

COPY 1000 numTrials

REPEAT numTrials

SAMPLE 5 gameScores newSample

MEAN newSample newSampleMean

SCORE newSampleMean means

END

'This is a two-tail problem, so the 0.05, or

'5 percent should be split between the high

'and low end of the range.

PERCENTILE means (2.5 97.5) meansRange

PRINT meansRange

'Print conclusion:

TAKE meansRange 1 lowLim

TAKE meansRange 2 highLim

IF 5.7 between lowLim highLim

OUTPUT "Null hypothesis can not be rejected\n"

END

19 Back to top Confidence interval for population mean using t

'From CliffsQuickReview Statistics, Example 7, Page 82:

'Using the Little League baseball data from the previous

'example, what is a 95 percent confidence interval for

'runs scored per team per game?

'Repeating the previous examples info: Nationally,

'the average number of runs scored by a Little League team in a

'game is 5.7. He chooses five games at random in which his team

'scored 5, 9, 4, 11, and 8 runs.

'ANS: In resampling terms, this is really the same problem as

'the previous one. The only difference is that here we're not

'deciding whether to reject a Null hypothesis.

COPY (5 9 4 11 8) gameScores

MEAN gameScores mean

PRINT mean

COPY 1000 numTrials

REPEAT numTrials

SAMPLE 5 gameScores newSample

MEAN newSample newSampleMean

SCORE newSampleMean means

END

PERCENTILE means (2.5 97.5) confidenceInterval

PRINT confidenceInterval

20 Back to top Two-sample z-test for comparing two means

'From CliffsQuickReview Statistics, Example 8, Page 83:

'The amount of a certain trace element in blood is

'known to vary with a standard deviation of 14.1ppm

'(parts per million) for male blood donors and 9.5 ppm

'for female donors. Random samples of 75 male and 50

'female donors yield concentration means of 28 and

'33 ppm, respectively. What is the likelihood that the

'population means of concentrations of the element are

'the same for men and women?

'Null hypothesis: the means are the same (their difference

'is zero).

'Alternate hypothesis: the means are different.

'SOLUTION: create male and female samples with the given

'sample sizes and standard deviations, but with the same

'means. For the common mean you can use either the male

'mean (28), the female mean (33), or the mean of those

'two means (30.5). In this program the significanceLevel,

'the commonMean, and the rptCount have been made variables

'at the top of the program so you can easily change them.

COPY 0.05 significanceLevel

COPY 28 commonMean 'assume same mean for both

COPY 1000 rptCount

REPEAT rptCount

NORMAL 75 commonMean 14.1 maleSample

NORMAL 50 commonMean 9.5 femaleSample 'assume same mean for both

MEAN maleSample maleSampleMean

MEAN femaleSample femaleSampleMean

SUBTRACT maleSampleMean femaleSampleMean difference

SCORE difference differences

END

ABS differences differences 'make differences positive

PRINT differences

COUNT differences >5 outliers

DIVIDE outliers rptCount probability

PRINT probability

'Print out the conclusion:

IF probability < significanceLevel

OUTPUT "Null hypothesis is rejected at a significance level of %10.4F.\n" significanceLevel

END

IF probability >= significanceLevel

OUTPUT "Null hypothesis is NOT rejected at a significance level of %10.4F.\n" significanceLevel

END

21 Back to top Two-sample t-test for comparing two means (hypothesis test)

'From CliffsQuickReview Statistics, Example 9, Page 84:

'An experiment is conducted to determine whether

'intensive tutoring is more effective than paced tutoring.

'Two randomly chosen groups are tutored separately and

'then administered proficiency tests. Use a significance

'level of alpha < 0.05.

'DATA:

'Group Method n sampleMean sampleStdDev

' 1 intensive 12 46.31 6.44

' 2 paced 10 42.79 7.52

'Null hypothesis: mean of intensive tutoring is <= that

'of paced tutoring.

'SOLUTION: In the problem, the authors give the summary

'statistics. These statistics came from sampled data.

'It would be better, using the Resampling method, to work

'with the actual data rather than the summary statistics.

'But since that data is unavailable, we'll use the summary

'statistics to generate our own samples.

COPY 1000 rptCount

REPEAT rptCount

NORMAL 12 46.31 6.44 intensiveSample

NORMAL 10 42.79 7.52 pacedSample

MEAN intensiveSample intensiveMean

MEAN pacedSample pacedMean

IF intensiveMean <= pacedMean

SCORE 1 successes

END

COUNT successes = 1 successCount

DIVIDE successCount rptCount probability

PRINT probability

IF probability >= 0.05

OUTPUT "Null hypothesis is accepted.\n"

END

IF probability < 0.05

OUTPUT "Null hypothesis is rejected.\n"

END

22 Back to top Two-sample t-test for comparing two means (confidence interval)

'From CliffsQuickReview Statistics, Example 10, Page 85:

'Estimate a 90 percent confidence interval for the difference

'between the number of raisins per box in two brands of

'breakfast cereal.

'DATA:

'Brand sampleSize sampleMean sampleStdDev

' A 6 102.1 12.3

' B 9 93.6 7.52

'SOLUTION: In the problem, the authors give the summary

'statistics. These statistics came from sampled data.

'It would be better, using the Resampling method, to work

'with the actual data rather than the summary statistics.

'But since that data is unavailable, we'll use the summary

'statistics to generate our own samples.

COPY 1000 rptCount

REPEAT rptCount

NORMAL 6 102.1 12.3 brandASample

NORMAL 9 93.6 7.52 brandBSample

MEAN brandASample brandAMean

MEAN brandBSample brandBMean

SUBTRACT brandAMean brandBMean diff

SCORE diff differences

END

PERCENTILE differences (5 95) confidenceInterval

PRINT confidenceInterval

23 Back to top Pooled Variance method

'From CliffsQuickReview Statistics, Example 11, Page 87:

'Does right- or left-handedness affect how fast people type?

'Random samples of students from a typing clas are given

'a typing speed test (words per minute) and the results

'are compared. Significance level for the test: 0.10.

'Because you are looking for a difference between the

'groups in either direction, this is a two-tailed test.

'Null hypothesis: Means are equal.

'DATA:

'Group sampleSize sampleMean sampleStdDev

'right 16 55.8 5.7

'left 9 59.3 4.3

'SOLUTION: In the problem, the authors give the summary

'statistics. These statistics came from sampled data.

'It would be simpler, using the Resampling method, to work

'with the actual data rather than the summary statistics.

'But since that data is unavailable, we'll use the summary

'statistics to generate our own samples.

'Since the authors are using "variance pooling",

'which assumes that the (unknown) standard deviations are equal,

'we will simulate that process to choose a pooled standard

'deviation and while we are at it, a pooled mean.

'Compute the "pooled" statistics:

REPEAT 1000

NORMAL 16 55.8 5.7 rightSample

NORMAL 9 59.3 4.3 leftSample

COPY rightSample leftSample pooledSample

STDEV pooledSample pooledSampleStdDev

MEAN pooledSample pooledSampleMean

SCORE pooledSampleStdDev stdDevs

SCORE pooledSampleMean means

END

MEAN stdDevs pooledStdDev

MEAN means pooledMean

PRINT pooledMean pooledStdDev

COPY 1000 rptCount

REPEAT rptCount

NORMAL 16 pooledMean pooledStdDev rightSample

NORMAL 9 pooledMean pooledStdDev leftSample

MEAN rightSample rightMean

MEAN leftSample leftMean

SUBTRACT rightMean leftMean diff

SCORE diff differences

END

PERCENTILE differences (5 95) acceptanceRegion

PRINT acceptanceRegion

TAKE acceptanceRegion 1 lowLimit

TAKE acceptanceRegion 2 highLimit

OUTPUT "Conclusion: The Null hypothesis is "

'(3.5 is the difference between the original sample means)

IF 3.5 between lowLimit highLimit

OUTPUT "NOT "

END

OUTPUT "rejected.\n"

24 Back to top Paired difference t-test

'From CliffsQuickReview Statistics, Example 12, Page 88:

'A farmer decides to try out a new fertilizer on a test plot

'containing 10 stalks of corn. Before applying the fertilizer,

'he measures the height of each stalk. Two weeks later, he

'measures the stalks again, being careful to match each

'stalk's new height to its previous one. The stalks would

'have grown an average of six inches during that time even

'without the fertilizer. Did the fertilizer help? Use a

'significance level of 0.05.

'Null hypothesis: Fertilizer had no effect, i.e., height

'change <= 6.

copy 0.05 significanceLevel

COPY 1000 rptCount

COPY (35.5 31.7 31.2 36.3 22.8 28.0 24.6 26.1 34.5 27.7) beforeHeights

COPY (45.3 36.0 38.6 44.7 31.4 33.5 28.8 35.8 42.9 35.0) afterHeights

SUBTRACT afterHeights beforeHeights changes

REPEAT rptCount

SAMPLE 10 changes bootstrapSample

MEAN bootstrapSample sampleMean

SCORE sampleMean means

END

COUNT means <=6 successes

DIVIDE successes rptCount probability

PRINT probability

OUTPUT "Conclusion: null hypothesis is "

IF probability <= significanceLevel

OUTPUT "NOT "

END

OUTPUT "accepted at the %10.4F significance level\n" significanceLevel

25 Back to top Test for a single population proportion (hypothesis test)

'From CliffsQuickReview Statistics, Example 13, Page 89:

'The sponsors of a city marathon have been trying to encourage

'more women to participate in the event. A sample of 70 runners

'is taken, of which 32 are women. The sponsors would like to

'be 90 percent certain that at least 40 percent of the participants

'are women. Were their recruitment efforts successful?

'Null hypothesis: sample proportion < 0.4

'Alternate hypothesis: sample proportion >= 0.4

COPY 1000 rptCount

COPY 0.1 significanceLevel ' 100% - 90% as a decimal fraction

COPY 38#0 32#1 runners '0=men, 1=women

REPEAT rptCount

SAMPLE 70 runners newSample

COUNT newSample =1 women

DIVIDE women 70.0 proportion

SCORE proportion results

END

COUNT results < 0.4 successes

DIVIDE successes rptCount probability

PRINT probability

OUTPUT "Conclusion: null hypothesis is "

IF probability < significanceLevel

OUTPUT "NOT "

END

OUTPUT "accepted at the %10.4F significance level.\n" significanceLevel

26 Back to top Test for a single population proportion (confidence interval)

'From CliffsQuickReview Statistics, Example 14, Page 90:

'A sample of 100 voters selected at random in a congressional district

'prefer Candidate Smith to Candidate Jones by a ratio of 3 to 2.

'What is a 95 percent confidence interval of the percentage of

'voters in the district who prefer Smith?

COPY 1000 rptCount

COPY 100 sampleSize

COPY 3#1 2#2 voters '1=Smith 2=Jones

REPEAT rptCount

SAMPLE sampleSize voters sample

COUNT sample =1 smithVoters

SCORE smithVoters results

END

DIVIDE results sampleSize results

PERCENTILE results (2.5 97.5) confidenceInterval

PRINT confidenceInterval

27 Back to top Choosing a sample size for a given confidence interval

'From CliffsQuickReview Statistics, Example 15, Page 91:

'How large a sample is needed to estimate the preference of

'voters for Candidate Smith with a margin of error of

'+ or - 4 percent at a 95 percent significance level?

'To be conservative, assume voters are split 50/50.

'This one requires a little trial and error on your part.

'You choose a sample size, run the program and see if you

'get a confidence interval of around (0.46 0.54). If not,

'choose another sample size and try again. After a few

'tries you'll settle on 600 as the right choice.

COPY 600 sampleSize

COPY 1000 rptCount

COPY (1 2) voters '1=Smith 2=Jones. Assume voters 50% split

REPEAT rptCount

SAMPLE sampleSize voters sample

COUNT sample =1 smithVoters

SCORE smithVoters results

END

DIVIDE results sampleSize results

PERCENTILE results (2.5 97.5) confidenceInterval

PRINT confidenceInterval

28 Back to top Comparing two proportions (hypothesis test)

'From CliffsQuickReview Statistics, Example 16, Page 92:

'A swimming school wants to determine whether a recently

'hired instructor is working out. Sixteen out of 25 of

'Instructor A's students passed the lifeguard certification

'test on the first try. In comparison, 57 out of 72 of more

'experienced Instructor B's students passed the test on the

'first try. Is Instructor A's success rate worse than

'Instructor B's? Use alpha = 0.10.

'Null hypothesis: A's rate is >= B's rate

'Alternate hypothesis: A's rate is < B's rate

'This is a one-tailed test.

COPY 1000 rptCount

COPY 0.10 significanceLevel

COPY 16#1 9#0 studentsOfA '1=passed, 0=failed

COPY 57#1 15#0 studentsOfB

REPEAT rptCount

SAMPLE 25 studentsOfA sampleA

SAMPLE 72 studentsOfB sampleB

COUNT sampleA =1 passedA

COUNT sampleB =1 passedB

DIVIDE passedA 25 passedARate

DIVIDE passedB 72 passedBRate

IF passedARate >= passedBRate

SCORE 1 successes

END

COUNT successes =1 successesA

DIVIDE successesA rptCount probability

PRINT probability

OUTPUT "Conclusion: null hypothesis is "

IF probability < significanceLevel

OUTPUT "NOT "

END

OUTPUT "accepted at a %10.4F significance level." significanceLevel

29 Back to top Comparing two proportions (confidence interval)

'From CliffsQuickReview Statistics, Example 17, Page 93:

'A public health researcher wants to know how two high

'schools, one in the inner city and one in the suburbs,

'differ in the percentage of students who smoke. A

'random survey of students gives the following results:

'Population sampleSize Smokers

'inner-city 125 47

'suburban 153 52

'What is a 90 percent confidence interval for the

'difference between the two schools?

COPY 1000 rptCount

COPY 47#1 78#0 innerCity

COPY 52#1 101#0 suburban

REPEAT rptCount

SAMPLE 125 innerCity innerCitySample

SAMPLE 153 suburban suburbanSample

COUNT innerCitySample =1 innerCitySmokers

COUNT suburbanSample =1 suburbanSmokers

DIVIDE innerCitySmokers 125 innerCityPercentage

DIVIDE suburbanSmokers 153 suburbanPercentage

SUBTRACT innerCityPercentage suburbanPercentage difference

SCORE difference differences

END

PERCENTILE differences (5 95) confidenceInterval

PRINT confidenceInterval

30 Back to top Correlation Coefficient

'From CliffsQuickReview Statistics, Example 1, Page 99:

'Compute the correlation coefficient for the relationship

'between months of exercise-machine ownership and hours

'of exercise per week. (The data is given in the program

'below.

'NOTE: this is not a resampling or Monte Carlo simulation.

'It is simply a use of the Statistics101 built-in CORR

'command, which computes the Pearson's product moment

'correlation coefficient.

DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned

DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised

CORR monthsOwned hoursExercised correlationCoefficient

PRINT correlationCoefficient

31 Back to top Finding significance of the Correlation Coefficient

'From CliffsQuickReview Statistics, Example 1 partB, Page 100:

'Compute the significance level for the correlation

'coefficient for the relationship between months of exercise-machine

'ownership and hours of exercise per week. (The data is given in

'the program below.)

'The null hypothesis is that the data are not correlated, i.e.,

'that the population correlation coefficient = 0.

'Therefore, we can bootstrap the two data items separately.

'That means we choose pairs of elements independently.

'Then we see how often the original sample's correlation

'coefficient, r,(independent of its sign)

'shows up based on the assumption that they are uncorrelated.

DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned

DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised

CORR monthsOwned hoursExercised r

PRINT "Sample correlation coefficient: " r

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 10 monthsOwned monthsOwnedBootstrap

SAMPLE 10 hoursExercised hoursExercisedBootstrap

CORR monthsOwnedBootstrap hoursExercisedBootstrap bootstrapCorrelationCoefficient

SCORE bootstrapCorrelationCoefficient correlationCoefficients

END

HISTOGRAM percent binsize 0.1 correlationCoefficients

'Compute 2-sided probability:

rPlus = ABS(r)

rMinus = -ABS(r)

COUNT correlationCoefficients <= rMinus coeffCountMinus

COUNT correlationCoefficients >= rPlus coeffCountPlus

significanceLevel = (coeffCountMinus + coeffCountPlus) / rptCount

PRINT significanceLevel

32 Back to top Confidence interval for the Correlation Coefficient

'This problem is not in the CliffsQuickReview book. I've just

'added it to demonstrate the technique.

'Compute the 95 percent confidence interval for the correlation

'coefficient for the relationship between months of exercise-machine

'ownership and hours of exercise per week. (The data is given in

'the program below.

'Since the data pairs are correlated, we must sample them

'in pairs, always taking for any random position in one,

'the corresponding element in the other. We do that using

'a "chooser" variable and the TAKE command as you see below.

DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned

DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised

COPY 1000 rptCount

REPEAT rptCount

SAMPLE 10 1,10 chooser

TAKE monthsOwned chooser monthsOwnedBootstrap

TAKE hoursExercised chooser hoursExercisedBootstrap

CORR monthsOwnedBootstrap hoursExercisedBootstrap correlationCoefficient

SCORE correlationCoefficient coefficients

END

percentile coefficients (2.5 97.5) confidenceInterval

PRINT confidenceInterval

'Here is a solution to the problem using the "jackknife" method

'instead of the "bootstrap" used above. Thanks to Gaj Vidmar for

'this solution.

DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned

DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised

COPY 1,10 is

FOREACH i is

WEED is =i j

TAKE monthsOwned j monthsOwnedJackknife

TAKE hoursExercised j hoursExercisedJackknife

CORR monthsOwnedJackknife hoursExercisedJackknife correlationCoefficientJackknife

SCORE correlationCoefficientJackknife coefficientsJackknife

END

PERCENTILE coefficientsJackknife (2.5 97.5) confidenceIntervalJackknife

PRINT confidenceIntervalJackknife

33 Back to top Simple Linear Regression

'From CliffsQuickReview Statistics, Page 102:

'Compute the linear regression coefficients for

'the relationship between months of exercise-machine

'ownership and hours of exercise per week. (The data

'is given in the program below.

'NOTE: this is not a resampling or Monte Carlo simulation.

'It is simply a use of the Statistics101 built-in REGRESS

'command, which computes the linear regression coefficients.

DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned

DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised

REGRESS hoursExercised monthsOwned coefficients

PRINT coefficients

TAKE coefficients 1 slope

TAKE coefficients 2 yIntercept

PRINT slope yIntercept

34 Back to top Confidence interval for the linear regression slope

'From CliffsQuickReview Statistics, Example 2 Page 105:

'Compute the 95% confidence interval for the slope of the

'regression line for the relationship between months of

'exercise-machine ownership and hours of exercise per week.

'(The data is given in the program below.)

'Since the data pairs are correlated, we must sample them

'in pairs, always taking for any random position in one,

'the corresponding element in the other. We do that using

'a "chooser" variable and the TAKE command as you see below.

DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned

DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised

COPY 10000 rptCount

REPEAT rptCount

SAMPLE 10 1,10 chooser

TAKE monthsOwned chooser monthsOwnedBootstrap

TAKE hoursExercised chooser hoursExercisedBootstrap

REGRESS hoursExercisedBootstrap monthsOwnedBootstrap linearCoefficients

TAKE linearCoefficients 1 slope

SCORE slope slopes

END

PERCENTILE slopes (2.5 97.5) confidenceInterval

PRINT confidenceInterval

35 Back to top Confidence interval for the prediction

'From CliffsQuickReview Statistics, Example 3 Page 107:

'What is a 90% confidence interval for the number of

'hours spent exercising per week if the exercise machine

'is owned 11 months?

'(The data is given in the program below.)

'Since the data pairs are correlated, we must sample them

'in pairs, always taking for any random position in one,

'the corresponding element in the other. We do that using

'a "chooser" variable and the TAKE command as you see below.

DATA (5 10 4 8 2 7 9 6 1 12) monthsOwned

DATA (5 2 8 3 8 5 5 7 10 3) hoursExercised

COPY 100000 rptCount

REPEAT rptCount

SAMPLE 10 1,10 chooser

TAKE monthsOwned chooser monthsOwnedBootstrap

TAKE hoursExercised chooser hoursExercisedBootstrap

REGRESS hoursExercisedBootstrap monthsOwnedBootstrap linearCoefficients

TAKE linearCoefficients 1 slope

TAKE linearCoefficients 2 yIntercept

'compute y value for x = 11 months:

MULTIPLY 11 slope term1

ADD term1 yIntercept yValue

SCORE yValue yValues

END

PERCENTILE yValues (5 95) confidenceInterval

PRINT confidenceInterval

36 Back to top Chi-square test

'From CliffsQuickReview Statistics, Page 110:

'Suppose 125 children are shown three TV commercials

'A, B, and C, for breakfast cereal and are asked to

'pick which they liked best. The results are:

' A B C Totals

'Boys 30 29 16 75

'Girls 12 33 5 50

'Totals 42 62 21 125

'Is the choice of favorite commercial related to

'whether the child is a boy or a girl?

'Null hypothesis: the commercial choice is not

'related to the sex of the child. This can be

'restated as: How often (or what is the

'probability that) the contents of the six inner

'cells would be as far or farther than they

'currently are from their expected values?

'Compare results for alpha = 0.05 vs. alpha =0.01.

'Setup vectors to hold the expected values and

'the observed values of the table.

COPY (25.2 37.2 12.6 16.8 24.8 8.4) expectedValues

COPY (30 29 16 12 33 5) observedData

CHISQUARE observedData expectedValues chiSquare

PRINT chiSquare

'Compute and record (SCORE) chi-square values for

'many simulated table cell entries.

COPY 42#1 62#2 21#3 ads

COPY 5000 rptcount

REPEAT rptcount

SHUFFLE ads ads

TAKE ads 1,75 boys

TAKE ads 76,125 girls

COUNT boys =1 boysAdA

COUNT boys =2 boysAdB

COUNT boys =3 boysAdC

COUNT girls =1 girlsAdA

COUNT girls =2 girlsAdB

COUNT girls =3 girlsAdC

'Rebuild a new table with the simulated data

COPY boysAdA boysAdB boysAdC girlsAdA girlsAdB girlsAdC observedData$

CHISQUARE observedData$ expectedValues chiSquare$

SCORE chiSquare$ chiSquareScores

END

COUNT chiSquarescores >= chiSquare chiCount

DIVIDE chiCount rptCount significanceLevel

PRINT significanceLevel

OUTPUT "Conclusions:\n"

OUTPUT "The null hypothesis is "

IF significanceLevel >= 0.05

OUTPUT "NOT "

END

OUTPUT "rejected at the 0.05 significance level.\n"

OUTPUT "The null hypothesis is "

IF significanceLevel >= 0.01

OUTPUT "NOT "

END

OUTPUT "rejected at the 0.01 significance level.\n"

CliffsNotes Statistics Quick Review (Cliffsquickreview)

Statistics101 Tutorials

Tutorials (700kb) Shows the main features of the Statistics101 program.

Command Summary Gives a one-line description of each of the Resampling Stats commands.

Intro to Statistics101 programming (PDF or Kindle) This doc gets you started with Statistics101 and the Resampling Stats language.

User's Guide to Statistics101 (HTML) Describes the Statistics101 program and all the Resampling Stats commands in full detail. (Also included with the program download.)

Resampling: The New Statistics Julian Simon's online textbook with full explanation and examples of applied resampling.

More Examples: Originally from Peter Bruce's website showing a variety of statistical problems solved using Resampling Stats. These will all work in the Statistics101 program. Used with permission.

Contact me if you have any questions or comments about Statistics101, or if you find a bug.