Copyright © October 2004 Ted Nissen
2 Review and More Introduction
3 Central Values and the Organization of Data
5 Correlation and Regression
6 SCORE TRANSFORMATIONS
7 LINEAR TRANSFORMATIONS
8 Theoretical Distributions
Including the Normal Distribution
9 Samples and Sampling Distributions
10 Differences between Means
11 The t Distribution and the
12 Analysis of Variance: One-Way
13 Analysis of Variance: Factorial
14 The Chi Square Distribution
15 Nonparametric Statistics
16 Vista Formulas and Analysis
1.2 Statistics Definition
1.2.1 Algebra and Statistics
is a generalization of arithmetic in which letters representing numbers are
combined according to the rules of arithmetic
product of an algebraic expression, which combines several scores, is a
1.2.2 Descriptive Statistic
1.2.3 Inferential Statistics
1.3 Purpose of Statistics
1.4.2 Populations, Samples, and
population consists of all members of some specified group. Actually, in
statistics, a population consists of the measurements on the members and not
the members themselves. A sample is a subset of a population. A subsample is a
subset of a sample. A population is arbitrarily defined by the investigator and
includes all relevant cases.
are always interested in some population. Populations are often so large that
not all the members can be measured. The investigator must often resort to
measuring a sample that is small enough to be manageable but still
representative of the population.
are often divided into subsamples and relationships among the subsamples
determined. The investigator would then look for similarities or differences
among the subsamples.
to the use of samples and subsamples introduces some uncertainty into the
conclusions because different samples from the same population nearly always
differ from one another in some respects. Inferential statistics are used to
determine whether or not such differences should be attributed to chance.
1.4.3 Parameters and Statistics
parameter is some numerical characteristic of a population. A statistic is some
numerical characteristic of a sample or subsample. A parameter is constant; it
does not change unless the population itself changes. There is only one number
that is the mean of the population; however, it often cannot be computed,
because the population is too large to be measured. Statistics are used as
estimates of parameters, although, as we suggested above, a statistic tends to
differ from one sample to another. If you have five samples from the same
population, you will probably have five different sample means. Remember that
parameters are constant; statistics are variable.
variable is something that exists in more than one amount or in more than one
form. Memory is a variable. The Wechsler Memory Scale is used to measure
people’s memory ability, and variation is found among the memory scores of any
group of people. The essence of measurement is the assignment of numbers on the
basis of variation.
variables can be classified as quantitative variables. When a quantitative
variable is measured, the scores tell you something about the amount or degree
of the variable. At the very least, a larger score indicates more of the
variable than a smaller score does.
score has a range consisting of an upper limit and lower limit, which defines
the range. For example, 103=102.5-103.5, the numbers 102.5 and 103.5 are called
the lower limit and the upper limit of the score. The idea is that a score can
take any fractional value between 102.5 and 103.5, but all scores in that range
are rounded off to 103.
variables are qualitative variables. With such variables, the scores (number)
are simple used as names; they do not have quantitative meaning. For example,
political affiliation is a qualitative variable.
1.5 Scales of Measurement
mean different things in different situations. Numbers are assigned to objects
according to rules. You need to distinguish clearly between the thing you are
interested in and the number that symbol9izes or stands for the thing. For
example, you have had lots of experience with the numbers 2 and 4. You can
state immediately that 4 is twice as much as 2. That statement is correct if
you are dealing with numbers themselves, but it may or may not be true when
those numbers are symbols for things. The statement is true if the numbers
refer to apples; four apples are twice as many as two apples. The statement is
not true if the numbers refer to the order that runners finish in a race.
Fourth place is not twice anything in relation to second place-not twice as
slow or twice as far behind the first-place runner. The point is that the
numbers 2 and 4 are used to refer to both apples and finish places in a race,
but the numbers mean different things in those two situations.
S. Stevens (1946) identified
four different measurement scales that help distinguish different kings of
situation in which numbers are assigned to objects. The four scales are;
nominal, ordinal, interval, and ratio.
1.5.2 Nominal Scale
are used simply as names and have no real quantitative value. It is the scale
used for qualitative variables. Numerals on sports uniforms are an example;
here, 45 is different from 32, but that is about all we can say. The person
represented by 45 is not “more than” the person represented by 32, and
certainly it would be meaningless to try to add 45 and 32. Designating
different colors, different sexes, or different political parties by numbers
will produce nominal scales. With a nominal scale, you can even reassign the
numbers and still maintain the original meaning, which as only that the
numbered things differ. All things that are alike must have the same number.
1.5.3 Ordinal Scale
ordinal scale, has the characteristic of the nominal scale (different numbers
mean different things) plus the characteristic of indicating “greater than” or
“less than”. In the ordinal scale, the object with the number 3 has less or
more of something than the object with the number 5. Finish places in a race
are an example of an ordinal scale. The runners finish in rank order, with “1”
assigned to the winner, “2” to the runner-up, and so on. Here, 1 means less
time than 2. Other examples of ordinal scales are house number, Government
Service ranks like GS-5 and GS-7, and statements like “She is a better
mathematician than he is.”
1.5.4 Interval Scale
interval scale has properties of both the ordinal and nominal scales, plus the
additional property that intervals between the numbers are equal. “Equal
interval” means that the distance between the things represented by ”2” and “3” is the same as the distance
between the things represented by “3” and “4”. The centigrade thermometer is
based on an interval scale. The difference is temperature between 10° and 20°
is the same as the difference between 40° and 50°. The centigrade thermometer,
like all interval scales, has an arbitrary zero point. On the centigrade, this
zero point is the freezing point of water at sea level. Zero degrees on this
scale does not mean the complete absence of heat; it is simply a convenient
starting point. With interval data, we have one restriction; we may not make
simple ratio statements. We may not say that 100° is twice as hot as 50° or
that a person with an IQ of 60 is half as intelligent as a person with an IQ of
1.5.5 Ratio Scale
fourth kind of scale, the ratio scale, has all the characteristics of the
nominal, ordinal, interval scales, plus one: it has a true zero point, which
indicates a complete absence of the thing measured. On a ratio scale, zero
means “none”. Height, weight, and time are measured with ratio scales. Zero
height, zero weight, and zero time mean thaqt no amount of these variables is
present. With a true zero point, you can make ratio statements like “16
kilograms is four times heavier than 4 kilograms.”
illustrated with examples the distinctions among these four scales-it is
sometimes difficult to classify the variables used in the social and behavioural
sciences. Very often they appear to fall between the ordinal and interval
scales. It may happen that a score provides more information than simply rank,
but equal intervals cannot be proved. Intelligence test scores are an example.
In such cases, researchers generally treat the data as if they were based on an
main reason why this section on scales of measurement is important is that the
kind of descriptive statistics you can compute on your numbers depends to some
extent upon the kind of scale of measurement the numbers represent. For
example, it is not meaningful to compute a mean on nominal data such as the
numbers on football players’ jerseys. If the quarterback’s number is 12 and a
running back’s number is 23, the mean of the two numbers (17.5) has no meaning
1.6 Statistics and Experimental
involves the manipulation of numbers and the conclusions based on those
manipulations. Experimental design deals with how to get the numbers in the first
1.6.2 Independent and Dependent
the design of a typical simple experiment, the experimenter is interested in
the effect that one variable (called the independent variable) has on some
other variable (called the dependent variable). Much research is designed to
discover cause-and-effect relationships. In such research, differences in the
independent variable are the presumed cause for differences in the dependent
variable. The experimenter chooses values for the independent variable, administers
a different value of the independent variable to each group of subjects, and
then measures the dependent variable for each subject. If the scores on the
dependent variable differ as a result of differences in the independent
variable, the experimenter may be able to conclude that there is a
1.6.3 Extraneous (Confounding)
of the problems with drawing cause-and-effect conclusions is that you must be
sure that changes in the scores on the dependent variable are the result of
changes in the independent variable and not the result of changes in some other
variables. Variables other than the independent variable that can cause changes
in the dependent variable are called extraneous variables.
is important, then, that experimenters be aware of and control extraneous
variables that might influence their results. The simplest way to control an
extraneous variable is to be sure all subjects are equal on that variable.
variables are often referred to as treatments because the experimenter
frequently asks “If I treat this group of subjects this way and treat another
group another way, will there be a difference in their behaviour?” The ways
that the subjects are treated constitute the levels of the independent variable
being studied, and experiments typically have two or more levels.
1.7 Brief History of Statistics
2.1 Review of Fundamentals
2.1.1 This section is designed to
provide you with a quick review of the rules of arithmetic and simple algebra.
We recommend that you work the problems as you come to them, keeping the
answers covered while you work. We assume that you once knew all these rules
and procedures but that you need to refresh your memory. Thus, we do not
include much explanation. For a textbook that does include basic explanations,
see Helen M. Walker.
answer to an addition problem is called a sum. In Chapter
12, you will calculate a sum
of squares, a quantity that is obtained by adding together some squared
The answer to a subtraction problem is called a difference.
Much of what you will learn in statistics deals with differences and the extent
to which they are significant. In Chapter 10, you will encounter a
statistic called the standard error of a
difference. Obviously, this statistic involves subtraction.
The answer to a multiplication problem is called a product. Chapter
7 is about the product-moment
correlation coefficient, which requires multiplication. Multiplication
problems are indicated either by an x or by parentheses. Thus, 6 x 4 and (6)(4)
call for the same operation.
answer to a division problem is called a quotient. The IQ or intelligence quotient is based on the
division of two numbers. The two ways to indicate a division problem are and —. Thus, 9 4 and 9/4 call for
the same operation. It is a good idea to
think of any common fraction as a division problem. The numerator is to be
divided by the denominator.
Addition and Subtraction of
There is only one rule about the addition and subtraction of
numbers that have decimals: keep .the
decimal points in a vertical line. The decimal point in the answer goes
directly below those in the problem. This rule is illustrated in the five
188.8.131.52.2 Example #1
Multiplication of Decimals
The basic rule for multiplying decimals is that the number of
decimal places in the answer is found by adding up the number of decimal places
in the two numbers that are being multiplied. To place the decimal point in the
product, count from the right.
184.108.40.206.2 Example #2
Division of Decimals
Two methods have been used to teach division of decimals. The
older method required the student to move the decimal in the divisor (the
number you are dividing by) enough places to the right to make the divisor a
whole number. The decimal in the dividend was then moved to the right the same
number of places, and division was carried out in the usual way. The new
decimal places were identified with carets, and the decimal place in the
quotient was just above the caret in the dividend. For example,
220.127.116.11.2 Example # 3a&b
18.104.22.168.3 The newer method of teaching the division of
decimals is to multiply both the divisor and the dividend by the number that
will make both of them whole numbers. (Actually, this is the way the caret
method works also.) For example:
22.214.171.124.4 Example #4
Both of these methods work. Use the one you are more familiar
general, there are two ways to deal with fractions
126.96.36.199.1 Convert the fraction to a decimal and
perform the operations on the decimals
Work directly with
the fractions, using a set of rules for each operation. The rule for addition
and subtraction is: convert the fractions to ones with common denominators, add
or subtract the numerators, and place the result over the common denominator.
The rule for multiplication is: multiply the numerators together to get the
numerator of the answer, and multiply the denominators together for the
denominator of the answer. The rule for division is: invert the divisor and multiply
For statistics problems, it is usually easier to convert the
fractions to decimals and then work with the decimals. Therefore, this is the
method that we will illustrate. However, if you are a whiz at working directly
with fractions, by all means continue with your method. To convert a fraction
to a decimal, divide the lower number into the upper one. Thus, 3/4 = .75, and
13/17 = .765
188.8.131.52.4 Examples Fractions
2.1.5 Negative Numbers
of Negative numbers
184.108.40.206.1 Any number without a sign is understood to
220.127.116.11.2 To add a series of negative numbers, add
the numbers in the usual way, and attach a negative sign to the total
18.104.22.168.3 Example #1
22.214.171.124.4 To add two numbers, one positive and one negative,
subtract the smaller number from the larger and attach the sign of the larger
to the result
126.96.36.199.5 Example #2
188.8.131.52.6 To add a series of numbers, of which some
are positive and some negative, add all the positive numbers together, all the
negative numbers together (see above) and then combine the two sums (see above)
184.108.40.206.7 Example #3
of Negative Numbers
220.127.116.11.1 To subtract a negative number, change it to
positive and add it. Thus
18.104.22.168.2 Example #4
of Negative Numbers
22.214.171.124.1 When the two numbers to be multiplied are
both negative, the product is positive
126.96.36.199.2 (-3)(-3)=9 (-6)(-8)=48
188.8.131.52.3 When one of the number is negative and the
other is positive, the product is negative
184.108.40.206.4 (-8)(3)=-24 14 X –2= -28
of Negative Numbers
220.127.116.11.1 The rule in division is the same as the
rule in multiplication. If the two numbers are both negative, the quotient is
18.104.22.168.2 (-10) (-2)=-5 (-4) (-20)= .20
22.214.171.124.3 If one number is negative and the other
positive, the quotient is negative
126.96.36.199.4 (-10) 2= -5 6 (-18)= -.33
188.8.131.52.5 14 (-7)= -2 (-12) 3= -4
2.1.6 Proportions and Percents
A proportion is a part of a whole and
can be expressed as a fraction or as a decimal. Usually, proportions are
expressed as decimals. If eight students in a class of 44 received A's, we may
express 8 as a proportion of the whole (44). Thus, 8/44, or .18. The proportion
that received A's is .18.
To convert a proportion to a percent
(per one hundred), multiply by 100. Thus: .18 x 100 = 18; 18 percent of the
students received A's. As You can see proportions and percents are two ways to
express the same idea.
you know a proportion (or percent) and the size of the original whole, you can
find the number that the proportion represents. If .28 of the students were
absent due to illness,
and there are 50 students in all, then. 28 of the 50 were absent. (.28)(50) = 14 students who were absent. Here are some
Proportions and Percents
2.1.7 Absolute Value
The absolute value of a number ignores the sign of the number. Thus, the absolute value of -6 is 6. This is expressed
with symbols as |-6| =
6. It is expressed
verbally as "the absolute value of negative six is
six. " In a similar way, the absolute value of 4 - 7 is 3. That is, |4 – 7| =
| - 3| = 3.
A sign ("plus or minus"
sign) means to both add and subtract. A problem always has two answers.
.In the expression 52, 2 is
the exponent. The 2 means that 5 is to be multiplied by itself. Thus, 52
= 5 x 5 = 25.
elementary statistics, the only exponent used is 2, but it will be used
frequently. When a number has an exponent of 2, the number is said to be
squared. The expression 42 (pronounced "four squared")
means 4 x4, and the product is 16. The squares of whole numbers between 1 and
1000 can be found in Tables in the Appendix of most stats text books.
rules will suffice for the kinds of complex expressions encountered in
operations within the parentheses first. If there are brackets in the
expression, perform the operations within the parentheses and then the
operations within the brackets.
operations in the numerator and those in the denominator separately, and
finally, carry out the division.
solve a simple algebra problem, isolate the unknown (x) on one side of the
equal sign and combine the numbers on the other side. To do this, remember that
you can multiply or divide both sides of the equation by the same number without
affecting the value of the unknown. For example,
6 a & b
a similar way, the same number can be added to or subtracted from both sides of
the equation without affecting the value of the unknown.
will combine some of these steps in the problems we will work for you. Be sure
you see shat operation is being performed on both sides in each step
2.2 Rules, Symbols, and Shortcuts
2.2.1 Rounding Numbers
are two parts to the rule for rounding a number. If the digit that is to be
dropped is less than 5, simply drop it. If the digit to be dropped is 5 or
greater, increase the number to the left of it by one. These are the rules
built into most electronic calculators. These two rules are illustrated below
9 a & B
A reasonable question is "How
many decimal places should an answer in statistics have?" A good rule of
thumb in statistics is to carry all operations to three decimal places and
then, for the final answer, round back to two decimal places.
Sometimes this rule of thumb could get
you into trouble, though. For example, if half way through some work you had a
division problem of .0016 .0074, and if you
dutifully rounded those four decimals to three (.002 .007), you would get
an answer of .2857, which becomes .29. However, division without rounding gives
you an answer of .2162 or .22. The difference between .22 and .29 may be quite
substantial. We will often give you cues if more than two decimal places are
necessary but you will always need to be alert to the problems of rounding.
2.2.2 Square Roots
Statistics problems often require that
a square root be found. Three possible solutions to this problem are
A calculator with a square-root key
The paper-and-pencil method
Use a Table the back of a statistics book.
the three, a calculator provides the quickest and simplest way to find a square
root. If you have a calculator, you're set. The paper -and-pencil method is
tedious and error prone, so we will not discuss it. We'll describe
the use of Tables and we recommend that you use it if you don't have access to
If you need the square root of a three-digit number (000 to
999), a table will give it to you directly. Simply look in the left-hand column
for the number and ad the square root in the third column, under . For example;
the square root of 225 is 15.00, and = 8.37. Square roots
are usually carried (or rounded) to two decimal places.
between 0 and 10
For numbers between 0 and 10 that have two decimal places
(.01 to 9.99), The tables will give you the square root. Find your number in the left-hand column by
thinking of its decimal point as two places to the right. Find the square root
in the column by moving the
decimal point one place to the left. For example, = 1.50. Be sure you understand how these square roots were
found: = 2.52, = .66, and = .28.
between 10 and
1000 That Have Decimals
numbers between 10 and 1000 with decimals interpolation is necessary. To
interpolate a value for , find a value that is half way (.5 of the distance) between and . Thus, the square root of 22.5 will be
(approximately) half way between 4.69 and 4.80, which is 4.74. For a second
example, we will find . = 9.17, and =9.22. will be .35 into the interval between and
. That interval is .05 (9.22 - 9.17). Thus' (.35)(.05) = .02, and = 9.17 + .02 = 9.19. Interpolation is also necessary with numbers between 100 and 1000 that have decimals;
these can usually be estimated rather quickly because the difference between
the square roots of the whole numbers is so small. Look at the difference
between and , for example.
Larger Than 1000
For numbers larger than 1000, the square root can be estimated fairly closely by using the
second column in Table A (N 2). Find the large number under N2,
and read the square root from the N column. For example, = 123, and =34. Most large
numbers you encounter will not be found in the N2 column, and you
will just have to estimate the square root as closely as possible.
This section is about a professional
shortcut. This shortcut is efficient if multiplication is easier for you than
division. If you prefer to divide rather than multiply, skip this section.
reciprocal of a number (N) is 1/N. Multiplying a number by 1/N
is equivalent to dividing it by N. For example, 82 = 8 x (1/2) = 8 x .5 = 4.0; 25 -7- 5 = 25 x (1/5) = 25 x .20 = 5.0. These examples are
easy, but we can also illustrate with more difficult
problems; 541 98 =
541 x (1/98) = 541 x
.0102 = 5.52. So far, this should be clear, but there
should be one nagging question. How did we know that 1/98 = .01O2? The answer
is the versatile Table A. Table A contains a column 1/N, and, by looking
up 98, you will find that 1/98 = .0102.
you must do many division problems on paper, we recommend reciprocals to you.
If you have access to an electronic calculator, on the other hand, you won't
need the reciprocals in Table A.
2.2.4 Estimating Answers
looking at a problem and making an estimate of the answer before you do any
calculating is a very good idea. This is referred to as eyeballing the data and
Edward Minium (1978) has captured its importance with Minium's First Law of
Statistics: "The eyeball is the statistician's most powerful
Estimating answers should keep you
from making gross errors, such as misplacing a decimal point. For example,
31.5/5 can be estimated as a little more than 6 If
you make this estimate before you divide, you are likely to recognize that an
answer of 63.or .63 is incorrect.
estimated answer to the problem (21)(108) is 2000, since (20)(100) = 2000.
problem (.47)(.20) suggests an estimated answer of .10, since (1/2)(.20) = .10.
With .10 in mind, you are not likely to write.94 for the answer, which is .094.
Estimating answers is also important if you are finding a square root. You can
estimate that is about
10, since = 10; is about 1.
calculate a mean, eyeball the numbers and estimate the mean. If you estimate a
mean of 30 for a group of numbers that are primarily in the 20s, 30s, and 40s,
a calculated mean of 60 should arouse your suspicion that you have made an
2.2.5 Statistical Symbols
Although as far as we know,
there has never been a clinical case of neoiconophobia (An extreme and unreasonable fear of
new symbols) some
students show a mild form of this behavior. Symbols like , ,
and may cause a
grimace, a frown, or a droopy eyelid. In more severe cases, the behavior
involves avoiding a statistics course entirely. We're rather sure that you
don't have such a severe case, since you have read this far. Even so, if you
are a typical beginning student in statistics, symbols like (, ,
and are not very meaningful to you, and they may
even elicit feelings, of uneasiness. We also know from our teaching experience
that, by the end of the course, you will know what these symbols mean and be
able to approach them with an unruffled psyche-and perhaps even approach them
joyously. This section should help you over that initial, mild neoiconophobia,
if you suffer from it at all.
Below are definitions and
pronunciations of the symbols used in the next two chapters. Additional symbols will be defined
as they occur. Study this list until you know it.
Pay careful attention to symbols. They
serve as shorthand notations for the ideas and concepts you are learning. So,
each time a new symbol is introduced, concentrate on it-learn it-memorize its
definition and pronunciation. The more meaning a symbol has for you, the
better you understand the concepts it represents and, of course, the easier the
course will be.
we will need to distinguish between two different ('s or two X's. We will use subscripts, and the
results will look like 1 and 2, or X1 and X2. Later, we will use subscripts other than
numbers to identify a symbol. You will see x and erg. The point to learn here is
that subscripts are for identification purposes only; they never indicate
not mean ()().
additional comments-to encourage and to caution you. We encourage you to do
more in this course than just read the text, work the problems, and pass the
tests, however exciting that may be. We encourage you to occasionally get
beyond this elementary text and read journal articles or short portions of
other statistics textbooks. We will indicate our recommendations with footnotes
at appropriate places. The word of caution that goes with this encouragement is
that reading statistics texts is like reading a Russian novel-the same
characters have different names in different places. For example, the mean of a
sample in some texts is symbolized M rather than , and, in some texts, S.D., and are used as
symbols for the standard deviation. If you expect such differences, it will be
less difficult for you to make the necessary translations.
3.1.1 A typical or representative
score from the sample population is a measure of central tendency.
3.1.2 Mode (Mo)
most frequently occurring score in the distribution.
scores in the distribution do not affect the mode.
3.1.3 Median (Md)
score cuts the distribution of scores in half. That is half the scores in the
distribution fall above the middle score and half fall below the middle
score. The steps involved in computing
the median are
184.108.40.206.1 Rank the scores from lowest to highest
220.127.116.11.2 In the case of an odd number of scores pick
the middle score that divides the scores so that an equal number of scores are
above that score and an equal number are below that score. Example
18.104.22.168.2.1 2 5 7 8 12 14 18= 8 would be the median in
the aforementioned distribution of scores.
22.214.171.124.3 In the case of an even number of scores
pick the two middle scores which divide the scores so that an equal number of
scores are above those scores and an equal number of scores are below those
scores. Then add the two middle scores and divide the product by two. Example
126.96.36.199.3.1 2 5 7 8 12 14 18 20=8+12=20/2=10 would be
the median in the aforementioned distribution.
scores in the distribution do not affect the median.
3.1.4 Mean (Average)
mean is the sum of scores divided by the number of scores.
188.8.131.52.1.2 In the above formula X= the sum of the
scores and N= the number of scores.
scores in the distribution will affect the mean.
term average is often used to describe the mean and is usually accurate.
Sometimes however the word average is used to describe other measures of
central tendency such as mode and median.
Now that the preliminaries are out of
the way, you are ready to start on the basics of descriptive statistics. The
starting point is an unorganized group of scores or measures, all obtained from
the same test or procedure. In an experiment, the scores are measurements on
the dependent variable. Measures of central value (often called measures of
central tendency) give you one score or measure that represents or is typical
of, the entire group You will recall that in Chapter 1 we discussed the mean
(arithmetic average). This is one of the three central value statistics.
Recall from Chapter 1 that for every statistic there is also a parameter.
Statistics are characteristics of samples and parameters are characteristics of
population. Fortunately, in the case of the mean, the calculation of the parameter is identical to
the calculation of the statistic. This is not true for the standard deviation. (Chapter 4) Throughout this
book, we will refer to the sample mean (a statistic) with the symbol pronounced
"ex-bar"-and to the population mean a parameter with the symbol
a mean based on a population is interpreted differently from a mean based on a sample. For a
population, there is only
one mean, . Any sample, however, is only
one of many possible samples, and will vary from sample to
sample. A population mean is obviously better than a sample mean, but often it
is impossible to measure the entire population. Most of the time, then, we must resort to a sample and use as an estimate of .
In this chapter you will learn to
Organize data gathered on a dependent
Calculate central values from the
organized data and determine whether they are statistics or parameters, and
Present the data graphically.
3.3 Finding the mean of Unorganized Data
Table 3.1 presents the scores of 100
fourth-grade students on an arithmetic achievement test. These scores were
taken from an alphabetical list of the students' names; therefore, the scores
themselves are in no meaningful order. You probably already know how to compute
the mean of this set of scores. To find the mean, add the scores and divide
that sum by the number of scores.
3.3.2 Formula Mean
3.3.3 Table 3.1
If these 100 scores are a population,
then 39.43 would be a , but if the 100 scores are a sample from some larger
population, 39.43 would be the sample mean, .
This mean provides a valuable bit of
information. Since a score of 40 on this test is considered average (according
to the test manual that accompanies it), this group of youngsters, whose mean
score is 39.43, is about average in arithmetic achievement.
3.4 Arranging Scores in Descending
Order and Finding the Median
Look again at Table 3.1. If you knew
that a score of 40 were considered average, could you tell just by looking that
this group is about average? Probably not. Often, in research, so many
measurements are made on so many subjects that just looking at all those
numbers is a mind-boggling experience. Although you can do many computations
using unorganized data, it is often very helpful to organize the numbers in
some way. Meaningful organization will permit you to get some general
impressions about characteristics of the scores by simply' 'eyeballing"
the data (looking at it carefully). In addition, organization is almost a
necessity for finding a second central value-the median.
One way of making some order out of
the chaos in Table 3.1 is to rearrange the numbers into a list, from
highest to lowest. Table 3.2 presents this rearrangement of the arithmetic
achievement scores. (It is usual in statistical tables to put the high numbers
at the top and the low numbers at the bottom.) Compare the unorganized data of
Table 3.1 with the rearranged data of Table 3.2. The ordering from high to low
permits you to quickly gain some insights that would have been very difficult
to glean from the unorganized data. For example, by simply looking at
the center of the table, you get an idea of what the central value is. The
highest and lowest scores are readily apparent and you get the
impression that there are large differences in the achievement levels of these
children. You can see that some scores (such as 44) were achieved by several
people and that some (such as 33) were not achieved by anyone. All this
information is gleaned simply by quickly eyeballing the rearranged data.
3.4.3 Table 3.2
3.4.4 Error Detection
Eyeballing data is a valuable means of
avoiding large errors. If the answers you calculate differ from what you expect
on the basis of eyeballing, wisdom dictates that you try to reconcile the
difference. You have either overlooked something when eyeballing or made a
mistake in your computations.
This simple rearrangement of data also
permits you to find easily another central value statistic, which can be found
only with extreme difficulty from Table 3.1. This statistic is called the
median. The median is defined as the point (Note that the median, like the mean, is a
point and not necessarily an actual score.) on the scale
of scores above, which half the scores fall and below which half the scores
fall. That is, half of the scores are larger than the median, and half
are smaller. Like the mean, the sample median is calculated exactly the same as
the population median. Only the interpretations differ.
In Table 3.2 there are 100 scores;
therefore, the median will be a point above which there are 50 scores and below
which there are 50 scores. This point is somewhere among the scores of 39. Remember
from Chapter 1, that any number actually stands for a range of numbers that has
a lower and upper limit. This number, 39, has a lower limit of 38.5 and an
upper limit of 39.5. To find the exact median somewhere within the range of
38.5-39.5 you use a procedure called interpolation. We will give you the procedure
and the reasoning that goes with it at the same time. Study it until you
understand it. It will come up again.
There are 42 scores below 39. You will
need eight more (50 - 42 = 8) scores to reach the median. Since there are ten
scores of 39, you need 8/10 of them to reach the median. Assume that those ten
scores of 39 are distributed evenly throughout the interval of 38.5 and 39.5 and that,
therefore, the median is 8/10 of the way through the
interval. Adding. 8 to the lower limit of the interval, 38.5, gives you 39,3,
which is the median for these scores.
There are occasions when you will need
the median of a small number of scores. In such cases, the method we have just
given you will work, but it usually is not necessary to go through that whole
procedure. For example, if N is an odd number and the middle score has a
frequency of 1, then it is the median. In the five scores 2, 3, 4, 12, 15, the
median is 4. If there had been more than one 4, interpolation would have to be
When N is an even number, as in
the six scores 2, 3, 4, 5, 12, 15, the point dividing the scores into two equal
halves will lie halfway between 4 and 5. The median, then, is 4.5. If there had
been more than one 4 or 5, interpolation would have to be used. Sometimes the
distance between the two middle numbers will be larger, as in the scores 2, 3,
7, 11. The same principle holds: the median is halfway between 3 and 7. One-way
of finding that point is to take the mean of the two numbers: (3 + 7) / 2 = 5, which
is the median.
There is no accepted symbol to
differentiate the median of a population from the median of a sample. When we
need to make this distinction, we do it with words.
3.5 The Simple Frequency
A more common (and often more useful)
method of organizing data is to construct a simple frequency distribution. Table
3.3 is a simple frequency distribution for the arithmetic achievement data in
The most efficient way to reduce
unorganized data like Table 3.1 into a simple frequency distribution like Table
3.3 is to follow these steps:
Find the highest and lowest scores. In
Table 3.1, the highest score is 65 and the lowest score is 23.
In column form, write down in
descending order all possible scores between the highest score (65) and the
lowest score (23). Head this column with the letter X.
with the number in the upper-left-hand comer of the unorganized scores (a score
of 40 in Table 3.1),
draw a line through it, and place a tally
mark beside 40 in your frequency distribution.
this process through all the scores.
the number of tallies by each score and place that number beside the tallies in
the column headed ƒ. Add up the
numbers in the ƒ column to be sure they equal N You
have now constructed a simple frequency distribution.
0ften, when simple frequency
distributions are presented formally, the tally marks and all scores with a
frequency of zero are deleted.
worry about the ƒ X column in Table 3.3 yet. It is not part of a simple
frequency distribution, and we will discuss it in the next section.
3.5.4 Table 3.3
3.6 Finding Central Values of a
Simple Frequency Distribution
Computation of the mean from a simple
frequency distribution is illustrated in table 3.3. Remember that the numbers
in the ƒ column represent the number of people
making each of the scores. To get N, you must add the numbers in the f
column because that's where the people are represented. If you are a
devotee of shortcut arithmetic, you may already have discovered or may already
know the basic idea behind the procedure: multiplication is shortcut addition.
In Table 3.3, the column headed ƒ X means what it says algebraically: multiply f (the number of
people making a score) times X, (the score they made) for each of the scores.
The reason this is done. is that everyone who made a particular score must be
taken into account in the computation of the mean. Since only one person made a
score of 65, multiply 1 x 65, and put a 65 in the ƒ
X column. No one made a score of 64 and 0 x 64 = 0;
put a zero in the ƒ X column. Since
four people had scores of 55, multiply 4 x 55 to get 220. After ƒX is computed for all scores, obtain ƒX by adding up the ƒX
column. Notice that ƒX in the simple frequency distribution is
exactly the same as :X in Table 3.1. To compute the mean from a simple frequency
distribution, use the formula
The procedure for finding the median
of scores arranged in a simple frequency distribution is the same as that for
scores arranged in descending order, except that you must now use the frequency
column to find the number of people making each score
The median is still the point with
half the scores above and half below it, and is the same point whether you
start from the bottom of the distribution or from the top. If you start from
the top of Table 3.3, you find that 48 people have scores of 40 or above. Two
more are needed to get to 50, the halfway point in the distribution. There are
ten scores of 39, and you need two of them. Thus 2/10 should be subtracted from
39.5 (the lower limit of the score of 40); 39.5 - .2 = 39.3.
184.108.40.206.1 Calculating the median by starting from the
top of the distribution will produce the same answer as calculating it by
starting from the bottom.
You may also find the third
central-value statistic from the simple frequency distribution. This statistic is
called the mode.
The mode is the score made by the greatest number
of people-the score with the greatest frequency.
Distribution may have more than one
mode. A bimodal distribution is one with two high frequency scores separated by
one or more low frequency scores. However, although a distribution may have
more than one mode, it can have only one mean and one median.
A sample mode and a population mode
are determined in the same way.
In Table 3.3, more people had a score
of 39 than any other score, so 39 is the mode. You will note, however, that it
was close. Ten people scored 39, but nine scored 34 and eight scored 41. A few
lucky guesses by children taking the achievement test could have caused
significant changes in the mode. This instability of the mode limits its
3.7 The Grouped Frequency
There is a way of condensing the data
of Table 3.1 even further. The result of such a condensation is called a grouped frequency
distribution and Table 3.4 is an example of such a
distribution, again using the arithmetic achievement-test scores.5
3.7.2 A formal grouped frequency
distribution does not include the tally marks or the X and ƒX columns.
The grouping of data began as a-way of
simplifying computations in the days before the invention of all these
marvellous computational aids such as computers and calculators. Today, most
researchers group their data only when they want to construct a graph or when N's
are very large. These two occasions happen often enough to make it
important for you to learn about it.
3.7.4 In the grouped frequency distribution, X values are grouped into
ranges called class
intervals. In Table 3.4, the entire range of
scores, from 65 to 23 has been reduced to 15 class intervals, each interval
covers three scores and, the size of the interval (the number of scores
covered) is indicated by i. For Table 3.4, i = 3. The midpoint of each interval
represents all scores in that interval for example, there were nine children who
had scores of 33, 34 or 35. The midpoint of the class interval 33-35 is 34. All
nine children are represented by 34. Obviously, this procedure may introduce
some inaccuracy into computations; however, the amount of error introduced is
usually very slight. For example, the mean computed from Table 3.4 is 39.40. The
mean computed from
ungrouped data is 39.43.
Class intervals have upper and lower
limits, much like simple scores obtained by measuring a quantitative
variable. A class interval of 33-35 has a lower limit of 32.5 and an upper
limit of 35.5. Similarly, a class interval of 40-49 has a lower limit of 39.5
and an upper limit of 49.5.
3.7.6 Table 3.4
3.7.7 Establishing Class Intervals
There are three conventions that are
usually followed in establishing class intervals. We call them
conventions because they are customs rather than hard-and-fast rules. There are
two justifications for these conventions. First, they allow you to get
maximum information from your data with minimum effort. Second, they
provide some standardization of procedures, which aids in communication among
scientists. These conventions are
should be grouped into not fewer than 10 and not more than 20 class
The primary purpose of grouping data is to provide a clearer
picture of trends in the data and to make computations easier. (For
example, Table 3.4 shows that there are normally frequencies near the center of
the distribution with fewer and fewer as the upper and lower ends of the
distribution are approached. If the data are grouped into fewer than 10
intervals, such trends are not as apparent. In Table 3.5, the same scores are
grouped into only five class intervals. The concentration of frequencies in the
center of the distribution is not nearly so apparent.
Another reason for using at least 10 class intervals is that,
as you reduce the number of class intervals, the errors caused by grouping
increase. With fewer than 10 class intervals, the errors may no longer be
minor. For example, the mean computed from Table 3.4 was 39.40-only .03 points
away from the exact mean of 39.43 computed from ungrouped data. The mean
computed from Table 3.5, however, is 39.00-an error of .43 points.
On the other hand, the use of
more than 20 class intervals may tend to exaggerate fluctuations in the data that are really due. to chance occurrences. . You
also sacrifice much of the ease of computation, with little gain in
control over errors. So, the convention
is: use 10 to 20 class intervals.
The size of the class intervals (i)
should be an odd number or 10 or a multiple of 10. (Some writers include i= 2 as acceptable. Some
also object to the use of i= 7 or 9. In actual practice, the most frequently
seen i's are 3, 5, 10, and multiples of 10.)
The reason for this is simply computational ease. The
midpoint of the interval is used as representative of all scores in the
interval; and if i is an odd number,
the midpoint will be a whole number. If hs an even number, the midpoint will
be a decimal number. In the interval 12-14 (i = 3), the midpoint is the whole
number 13. In an interval 12 to 15 (i = 4), the midpoint is the decimal number
13.5. However, if the range of scores is so great that you cannot include all
of them in 20 groups with i = 9 or less, it is conventional to
place 10 scores or a multiple of 10 in each class interval.
Begin each class interval with
a multiple of .i.
For example, if the lowest score is 44 and i = 5, the first
class interval should be 40-44 because 40 is a multiple of 5. This convention
is violated fairly often. However, the practice is followed more often than not.
A violation that seems to be justified occurs when i = 5. When the interval
size is 5, it may be more convenient to begin the interval such that multiples
of 5 will fall at the midpoint, since multiples of 5 are easier to manipulate.
For example, an interval 23-27 has 25 as its midpoint, while an interval 25-29
has 27 as its midpoint. Multiplying by 25 is easier than multiplying by 27.
In addition to these three conventions, remember that the
highest scores go at the top and the lowest scores at the bottom.
3.7.8 Converting Unorganized Data into
a Grouped Frequency Distribution
Now that you know the conventions for
establishing class intervals, we will go through the steps for converting
a mass of data like that in Table 3.1 into a grouped frequency distribution
like Table 3.4:
Find the highest and lowest scores. In
Table 3.1, the highest score is 65, and the lowest score is 23.
Find the range of scores by
subtracting the lowest score from the highest and adding 1: 65 - 23 + I =
43. The 1 is added so that the upper limit of the highest score and the lower
limit of the lowest score will be included.
Determine i by a trial-and-error
procedure. Remember that there are to be 10 to 20 class intervals and that the
interval size should be odd, 10, or a multiple of 10. Dividing the range by a
potential i value tells the number of class intervals that will result. For
example, dividing the range of 43 by 5 provides a quotient of 8.60. Thus, i =
5 produces 8.6 or 9 class intervals. That does not satisfy the rule calling for
at least 10 intervals, but it is close and might be acceptable. In most such
cases, however it is better to use a smaller I and get a larger number of
intervals. Dividing the range by 3 (43/3) gives you 14.33 or 15 class
intervals. It sometimes happens that this process results in an extra class
interval. This occurs when the lowest score is such that extra scores must be
added to the bottom of the distribution to start the interval with a multiple
of i. For the data in Table 3.1, the most appropriate interval size is 3,
resulting in 15 class intervals.
Begin the bottom interval with the
lowest score. if it is a multiple of i. If the lowest score is not a multiple
of i, begin the interval with the next lower number that is a multiple of i. In
the data of Table 3.1, the lowest score, 23, is not a multiple of i. Begin the
interval with 21. since it is a multiple of 3. The lowest class interval, then,
is 21-23. From there on, it's easy. Simply' begin the next interval with the
next number and end it such that it includes three numbers (24-26). Look at the
class intervals in Table 3.4. Notice that each interval begins with a number
evenly divisible by 3.
rest of the process is the same as for a simple frequency distribution. For
each score in the unorganized data, put a tally mark beside its class interval
and cross out the score. Count the tally marks and put the number into the
frequency column. Add the frequency column to be sure that: ƒ= N.
to the Future
220.127.116.11.1 The distributions that you have been
constructing are empirical distributions based on scores actually gathered in
experiments. This chapter and the next two are about these empirical frequency
distributions. Starting with Chapter 8, and throughout the rest of the book,
you will also make use of theoretical distributions-distributions based on
mathematical formulas and logic rather than on actual observations.
3.8 Finding Central Values of a
Grouped Frequency Distribution
The procedure for finding the mean of
a grouped frequency distribution is similar to that for the simple frequency
distribution. In the grouped distribution, however, the midpoint of each
interval represents all the scores in the interval. Look again at Table 3.4.
Notice the column headed with the letter X. The numbers in that column are the
midpoints of the intervals. Assume that the scores in the interval are evenly
distributed throughout the interval. Thus, X is the mean for all scores within
the interval. After the X column is filled, multiply each X by its ƒ value in order to include all frequencies in that interval. Place
the product in the ƒ X column.
Summing the ƒ X ..column
provides ƒX, which, when
divided-by N, yields the mean. In terms of a formula,
Finding the median of a grouped
distribution requires interpolation within the interval containing the median.
We will use the data in Table 3.4 to illustrate the procedure. Remember that
the median is the point in the distribution that has half the frequencies
above it and half the frequencies below it. Since N= 100, the median
will have 50 frequencies above it and 50 below it. Adding frequencies from the
bottom of the distribution, you find that there are 42 who scored below the
interval 39-41. You need
8 more frequencies (50 - 42 = 8) to find the median. Since 23 people scored
in the interval 39-41, you need 8 of these 23
frequencies or 8/23. Again, you assume that the 23 people in the
interval are evenly distributed through the interval. Thus, you need the
same proportion of score points in the interval as you have frequencies-that
is, 8/23 or, 35 of the 3 score points in the interval. Since .35 x 3 = 1.05,
you must go 1.05 score points into the interval to reach the median. Since the
lower limit of the interval is 38.5, add 1.05 to find the median, which is
39.55. Figure 3.1 illustrates this procedure.
In summary, the steps for finding the
median in a grouped frequency distribution are as follows.
Divide N by 2
Starting at the bottom of the
distribution, add the frequencies until you find the interval containing the
from N/2 the total frequencies of all intervals below the interval
containing the median.
the difference found in step 3 by the number of frequencies in the interval
containing the median.
the proportion found in step 4 by i
the product found in step 5 to the lower limit of the interval containing the
median. That sum is the median.
The third central value, the mode, is
the midpoint of the interval having the greatest number of frequencies. In Table 3.4, the interval 39-41 has the greatest number of frequencies-23. The midpoint of
that interval, 40, is the mode.
3.9 Graphic Presentation of Data
In order to better communicate your
findings to colleagues (and to understand them better yourself), you will often
find it useful to present the results in the form of a graph. It has been said,
with considerable truth, that one picture is worth a thousand words; and a
graph is a type of picture. Almost any data can be presented graphically. The
major purpose of a graph is to get a clear, overall picture of the data.
Graphs are composed of a
horizontal axis (variously called the baseline, X axis or abscissa) and
a vertical axis called the Y-axis or ordinate. We will take
what seems to be the simplest course and use the terms X and Y.
3.9.3 We will describe two kinds of
graphs. The first kind is used to present frequency distributions like those
you have been constructing. Frequency polygons, Histograms, and bar graphs are
examples of this first kind of graph. The second kind we will describe is the
line graph, which is used to present the relationship
between two different variables.
3.9.4 Illustration XY Axis
3.9.5 Presenting Frequency
Whether you use a frequency polygon, a
histogram, or a bar graph to present a frequency distribution depends on the
kind of variable you have measured. A frequency polygon or histogram is used
for quantitative data, and the bar graph is used for qualitative data. It is
not wrong to use a bar graph for quantitative data; but most researchers follow
the rule given above. Qualitative data, however, should not be presented
with a frequency polygon or a histogram. The arithmetic achievement scores
(Table 3.1) are an example of quantitative data.
3.2 shows a-frequency polygon based on the frequency distribution in Table 3.4.
We will use it to demonstrate the characteristics of all frequency polygons. On
the X-axis we placed the midpoints of the class intervals. Notice that the
midpoints are spaced at equal intervals, with the smallest midpoint at the left
and the largest midpoint at the right. The Y-axis
is labeled "Frequencies” and is also marked off into equal intervals.
Graphs are designed to "look right." They look right
if the height of the figure is 60 percent
to 75 percent of its length. Since
the midpoints must be plotted along the X axis, you must divide the Y axis into units that will satisfy
this rule. Usually, this requires a little juggling on your part. Darrell Huff
(1954) offers an excellent demonstration of the misleading effects that occur
when this convention is violated.
intersection of the X and Y axes is
considered the zero point for both variables. For the Y-axis in Figure 3.2, this is indeed the case. The distance on the Y axis is the same from zero to two as
from two to four, and so on. On the X axis, however, that is not the case.
Here, the scale jumps from zero to 19 and then is divided into equal units of
three. It is conventional to indicate a break in the measuring scale by
breaking the axis with slash marks between zero and the lowest score used, as
we did in Figure 3.2. It is also conventional to close a polygon at both ends by
connecting the curve to the X-axis.
point of the frequency polygon represents two numbers; the class midpoint
directly below it on the X-axis and
the frequency of that class directly across from it on the Y-axis. By looking at the points in Figure 3.2, you can readily see
that three people are represented by the midpoint 22, nine people by each of
the midpoints 31, 34, and 37, 23 people by the midpoint 40, and so on.
major purpose of the frequency polygon is to gain an overall view of the
distribution of scores. Figure 3.2 makes it clear, for example, that the
frequencies are greater for the lower scores than for the higher ones. It also
illustrates rather dramatically that the greatest number of children scored in
the center of the distribution.
Figure 3.3 is a histogram constructed from the same data that
were used for the frequency polygon of Figure 3.2. Researchers may choose
either of these methods for a given distribution of quantitative data, but the
frequency polygon is usually preferred for several reasons: it is easier to
construct, gives a generally clearer picture of trends in the data, and can be
used to compare different distribution, on the same graph. However frequencies
are easier to read from a histogram.
18.104.22.168.3 Actually, the two figures are very
similar. They differ only in that the histogram is made by raising bars from
the X axis to the appropriate frequencies instead of plotting points above the
midpoints. The width of a bar is from the lower to the upper limit of its class
interval. Notice that there is no space between the bars.
The third type of graph that presents frequency distributions
is the bar graph. A bar graph presents frequencies of the categories of a
qualitative variable. An example of a qualitative variable is laundry
detergent; the there are many different brands (types of the variable), but the
brands don't tell you the order they go in, for example
22.214.171.124.2 With quantitative variables, the measurements
of the variable impose an order on themselves. Arithmetic achievement scores
of 43 and 51 tell you the order they belong in. "Tide" and
"Lux" do not signify any order.
Figure 3.4 is an example of a bar graph. Notice that each bar
is separated by a small space. This bar graph was constructed by a grocery store manager who had a
practical problem to solve. One side of an aisle in his store was stocked with
laundry detergent, and he had no more space for this kind of product. How much
of the available space should he allot for each brand? For one week, he kept a
record of the number of boxes of each brand sold. From this frequency
distribution of scores on a qualitative variable, he constructed the bar graph
in Figure 3.4. (He, of course, used the names of the brands. We wouldn't dare!)
E, H, and K are obviously the big sellers and should get the greatest amount of
space. Brands A and D need very little space. The other brands fall between
these. The grocer, of course, would probably consider the relative profits from
the sale of the different brands in order to determine just how much space to
allot to each. Our purpose here is only to illustrate the use of the bar graph
to present qualitative data.
3.9.6 The Line Graph
Perhaps the most frequently used graph
in scientific books and journal articles is the line graph. A line graph is
used to present the relationship between two variables.
. A point on a line -graph represents
the two scores made by' one person on each of the two variables. Often, the
mean of a group is used rather than one person, but the idea is the same: a
group with a mean score of X on one variable had a mean score of Y on
the other variable. The point on the graph represents the means of that group
on both variables.
3-4 & 3-5
Figure 3.5 is an example of a line
graph of the relationship between subjects scores on an anxiety test and their
scores on a difficult problem-solving task. Many studies have discovered this
general relationship. Notice that performance on the task is better and better
for subjects with higher and higher anxiety scores up to the middle range of
anxiety. But as anxiety scores continue to increase, performance scores decrease.
Chapter 5, "Correlation and Regression," will make extensive use of a
version of this type of line graph.
A variation of the line graph places
performance scores on the Y-axis and some condition of training on the
X-axis. Examples of such training conditions are: number of trials, hours of food
deprivation, year in school, and: amount of reinforcement. The "score” on
the training condition is assigned by the experimenter.
Figure 3.6 is a generalized learning
curve with a performance measure (scores) on Y axis and number of
reinforced trials on the .X axis. Early in training (after only one or two
trials), performance is poor. As trials continue, performance improves rapidly
at first and then more and more slowly. Finally, at the extreme right-hand
portion of the graph, performance has levelled off; continued trials do not
produce further changes in the scores.
A line graph, then, presents a picture
of the relationship between two variables. By looking at the line, you can tell
what changes take place in the Y variable as the value of the X variable
3.10 Skewed Distributions
back at Table 3.4 graphed as Figure 3.2. Notice that the largest frequencies
are found in the middle of the distribution. The same thing is true in Problem
3 of this chapter. These distributions are not badly skewed; they are
reasonably symmetrical ln some data, however, the largest frequencies are found at one end of
the distribution rather than in the middle. Such distributions are said to he
The word skew is similar to the
word skewer, the name of the cooking implement used !n making shish
kebab. A skewer is long and pointed and is thicker at one end than the other
(not symmetrical). Although skewed distributions do not function like skewers
(you would have a terrible time poking one through a chunk of lamb), the, name
does help you remember that a skewed distribution has a thin point on one side.
Figures 3.7 and 3.8 are illustrations
of skewed distributions. Figure 3.7 is positive skewed; the thin point is toward the high scores, and
the most frequent scores are low ones. Figure 3.8 is negatively skewed; the
thin point or skinny end is toward .the low scores and most frequent scores are
is a mathematical of measuring the degree of skewness that is more precise
than eyeballing, but it is beyond the scope of this book. However, figuring the
relationship of the mean to the median is an objective way to determine the direction
of the skew. When the mean is numerically smaller than the median, there is some amount of negative
When the mean is larger than the
median there is positive skew. The reason for this is that the mean is affected by the size of the numbers
and is pulled in the direction
of the extreme scores. The median is not influenced by the size of the
scores. The relationship between the mean and the
median is illustrated by Figure 3.9. The size of the difference between the
mean and the median gives you an indication of how much the distribution is
3.11 The Mean, Median, and Mode Compared
A common question is [Which measure of
central value should I use?" The general answer is "Given a choice,
use the mean. " Sometimes, however, the data give you no choice. For example if the frequency
distribution is for a nominal variable the mode is the
only appropriate measure of central value.
It is meaningless to find a median or
to add up the scores and divide to find a mean for data based on a nominal
scale. For the data from the voting-behavior experiment the mode is the only
measure of central value that is meaningful. For a frequency distribution of an
ordinal variable, the median or the mode is appropriate. For data based on
interval or ratio data, the mean, median or mode may be used-you have a choice.
Even if you have interval or ratio
data, there are two situations in which the mean is inappropriate because it
gives an erroneous impression of the distribution. The first situation is the
case of a severely skewed distribution. The following story demonstrates why
the mean is inappropriate for severely skewed distributions.
The developer of Swampy Acres
Retirement Home sites is attempting, with a computer-selected mailing list, to
sell the lots in his southern paradise to northern buyers. The marks express
concern that flooding might occur. The developer reassures them by explaining
that the average elevation of his lots is 78.5 feet and that the water has
never exceeded 25 feet in that area. On the average, he has told the truth; but
this average truth is misleading. Look at the actual lay of the land in Figure
3.10 and examine the frequency distribution in Table 3.6, which summarizes the
The mean elevation as the developer
said is 78.5 feet; however, only 20 lots, all on a cliff, are out of the flood
zone. The other 80 lots are, on the average, under water. The mean, in this
case, is misleading. In this instance, the central value that describes the
typical case is the median because it is unaffected by the size of the few
extreme lots on the cliff. The median elevation is 12.5 feet, well below the
Darrell Huff's delightful and
informative book, How to Lie with Statistics (1954) gives a
number of such examples. We heartily recommend this book to you. It provides many
cautions concerning misinformation conveyed through the use of the inappropriate
statistic. A more recent and equally delightful book is Flaws and Fallacies
in Statistical Thinking by Stephen Campbell (1974).
There is another instance that
requires a median, even though you have a symmetrical distribution. This is
when the class interval with the largest (or smallest) scores is not limited.
In such a case, you do not nave a midpoint and, therefore, cannot compute
a mean. For example, age data are sometimes reported with the highest category
as "75 and over. " The mean cannot be computed. Thus, when one or
both of the extreme, class intervals is not limited, the median is the
appropriate measure of central value. To reiterate: given a choice, use the mean.
3.12 The Mean of a Set of Means
Occasions arise in which means are
available from several samples taken from the same population. If these means
are combined, the mean of the set of means will give you the best estimate of
the population parameter, . If every sample has the same N. you can
compute the average mean simply by adding the means and dividing; by the number
of means. If, however, the means to be averaged have varying N 's, it is
essential that you take into account the various sample sizes by multiplying
each mean by its own N
before summing. Table 3.7 illustrates this procedure. Four means. are
presented, along with two hypothetical sample sizes for each mean. In the
left-hand table, the four sample sizes are equal. In the right-hand table, the
four sample sizes are not equal. Notice that 18.50 is the mean of the means
when the separate means are simply added and the sum divided by the number of
means. This gives the correct answer when the sample sizes are equal. However,
when sample sizes differ, 18.50 is wrong. Each mean must be multiplied by its
respective N, and the mean of the means is 17.60. When N 's are
unequal, averaging the means without accounting for sample frequencies always
causes an error.
to The Future
In Chapter 9 you will learn
a most important concept-a concept called a sampling distribution of the mean.
The mean of a set of means is an inherent part of that concept.
3.13 Skewed Distributions and
Measures of Central Tendency
when the mean median and mode are represented graphically may demonstrate
varying degrees of Skewness, which refers to the degree of asymmetry of the
a symmetrical distribution the mean, median and mode all fall in the same
there are two modes (bi-modal) even though the mean, median fall in the same
point the two modes will represent the highest points of the distribution. This
is considered a bimodal symmetrical distribution
symmetrical distribution the largest frequencies are found in the middle
whereas in a skewed distribution the largest frequencies are found at one end
of the distribution rather than in the middle.
The word skew
is similar to the word skewer which is long and pointed and is thicker at one
end than the other (not symmetrical). A skewed distribution has a thin point on
positively skewed; the thin point is toward the high scores, and the most
frequent scores are low ones. In the negatively skewed, the thin point or
skinny end is toward the low scores, and the most frequent scores are high
ones. There are mathematical ways of measuring the degree of skewness that are
more precise than eyeballing, but you can figure the relationship of the mean
to the median and this provides an objective way to determine the direction of
the skew. When the mean is numerically smaller than the median, there is some
amount of negative skew. When the mean is larger than the median there is
positive skew. The reason for this is that the mean is affected by the size of
the numbers and is pulled in the direction of extreme scores. The median is not
influenced by the size of the scores. The relationship between the mean and the
median is illustrated in the picture below. The size of the difference between
the mean and the median gives you an indication of how much the distribution is
positively skewed distribution below demonstrates an asymmetrical pattern. In
this case the mode is smaller than the median, which is smaller than the mean.
relationship exists between the mode, median and mean because each statistic describes
the distribution differently.
mode represents the most frequently occurring score and thus is the highest
point on the X axis in a frequency distribution. The median cuts the
distribution in half so that 50% of the scores are on either side.
mean unlike the median and mode is affected by larger scores since it is the
product of the additive score values divided by their number. The mean
represents the balance point in the distribution. Because of this it is drawn
towards the skewness and in positively skewed towards the larger values.
distribution is also asymmetrical but with the opposite order of the mean,
median, and mode. The mean is smaller than the median, which is smaller than
mode which has the highest value in a frequency distribution points the
skewness in a negative direction.
3.14 The Mean of a Set of Means ()
5.1 The spread or dispersion of
scores is known as variability. If the distribution of scores fall within a
narrow range there is little variability. Conversely scores that vary widely
connote a distribution that is highly variable.
5.2.1 The range is the difference
between the largest score and the smallest score.
5.3 Standard Deviation
5.5 Standard Deviation (s) as an
Estimate of Population Variability
5.5.2 Deviation Scores
5.5.4 Deviation-Score Method of
Computing s from Ungrouped Data
5.5.6 Deviation-Score Method of
Computing s from Grouped Data
5.5.8 The Raw-Score Method of
Computing s from Ungrouped Data
Method of Computing s from Grouped Data
5.6 The Other Two Standard
Deviations, and S
5.10 z Scores
have used measures of central value and measures of variability to describe a
distribution of scores. The next statistic, z, is used to describe a single
z score is a mathematical way to change a raw score so that it reflects its
relationship to the mean and standard deviation of its fellow scores.
distribution of raw scores can be converted to a distribution of z scores; for
each raw score, there is a z score. Raw scores above the mean will have
positive z scores; those below the mean will have negative z scores.
z score is also called a standard score because it is a deviation score
expressed in standard deviation units. It is the number of standard deviations
a score is above or below the mean. A z score tells you the relative position
of a raw score in the distribution. (z) scores are also used for inferential
purposes. Much larger z scores may occur then.
deviation of a sample of scores
of a sample
difference between the raw score and the mean
difference by the standard deviation of the sample
of z Scores
scores are used to compare two scores in the same distribution. They are also
used to compare two scores from different distributions, even when the
distributions are measuring different things.
5.11 Variance and Standard Deviation
is the symbol for variance and is a measure of variability from the mean
of the distribution of scores.
the mean of the scores.
the mean from every score.
the results of step two.
the results of step three.
the results of step four by N (The number of scores)-1.
Find the mean
of the scores. = 50 / 5
mean from every score. The second column above
results of step two. The third column above
Sum the results
of step three. 22
results of step four by N (# of scores)-1. s2 = 22 / (5-1) =
Note that the
sum of column *2* is zero. This must be the case if the calculations are
performed correctly up to that point.
is the symbol for standard deviation and it is the square root of the variance.
standard deviation is the preferred measure of variability.
square of the variance above. Square root of 5.5=2.35
Correlation and Regression
6.1.1 Sir Francis Galton (1822-1911)
in England conducted some of the earliest investigations making use of
statistical analysis. Galton was concerned with the general question of whether
people of the same family were more alike than people of different families.
Galton needed a method that would describe the degree to which, for example,
heights of fathers and their sons were alike. The method he invented for this
purpose is called correlation (co-relation). With it, Galton could also measure
the degree to which the heights of unrelated men were alike. He could then
compare these two results and thus answer his question.
6.1.2 Galton’s student Karl Pearson
(1857-1936), with Galton’s aid, later developed a formula that yielded a
statistic known as a correlation coefficient. Pearson’s product-moment
coefficient, and other correlation coefficients based on Pearson’s work, have
been widely used in statistical studies in psychology, education, sociology,
medicine, and many other areas.
6.2 Concept of Correlation
6.2.1 In order to compute a
correlation, you must have two variables, with values of one variable (X)
paired in some logical way with values of the second variable (Y). Such an
organization of data is referred to as a bivariate (two-variable) distribution.
group of people may take two tests and the score results of both tests can be
relationships may be organized as bivariate distribution such as height of
fathers is one variable, X and height of sons is another variable Y.
6.3 Positive Correlation
6.3.1 In the case of a positive
correlation between two variables, high measurements on one variable tend to be
associated with high measurements on the other and low measurements on one with
low measurements on the other. In other words, the two variables vary together
in the same direction. A perfect positive correlation is 1.00. A scatterplot is
used to visualize this relationship with each point in the scatterplot
representing a pair of scores represented on the X and Y axis of the chart. The
line that runs through the points is called a regression line or “line of best
fit”. When there is perfect correlation (+ -1.00), all points fall exactly on
the line. When the points are scattered away from the line, correlation is less
than perfect and the correlation coefficient falls between .00 (No correlation)
and 1.00 (Perfect correlation). It was when Galton cast his data in the form of
a scatterplot that he conceived the idea of a correlationship between the
variables. It is from the term regression that we get the symbol r for
correlation. Galton chose the term regression because it was descriptive of a
phenomenon that he discovered in his data on inheritance. He found, for
example, that tall fathers had sons somewhat shorter than themselves and that
short fathers had sons somewhat taller than themselves. From such data, he
conceived his “law of universal regression,” which states that there exists a
tendency for each generation to regress, or move toward, the mean of the general
6.3.2 Today, the term regression also
has a second meaning. It refers to a statistical method that is used to fit a
straight line to bivariate data and to predict scores on one variable from
scores on a second variable.
6.3.3 It is not necessary that the numbers
on the two variables be exactly the same in order to have perfect correlation.
The only requirement is that the differences between pairs of scores be all the
same. The relationship must be such that all points in a scatterplot will lie
on the regression line. If this requirement is met, correlation will be
perfect, and an exact prediction can be made.
6.3.4 Nature, of course, is not so
accommodating as to permit such perfect prediction, at least at science’s
present state of knowledge. People cannot peredict their son’ heights
precisely. The points do not all fall on the regression line; some miss it
badly. However, as Galton found , there is some positive relationship; the
correlation coefficient between father and son height is r=.50. The correlation
between math and reading skills is r=.54. Predictions made from these
correlations although far from perfect would be far better than a random guess.
6.4 Negative Correlation
6.4.1 Negative correlation occurs
where high scores of one variable are associated with low scores of the other.
The two variables thus tend to vary together but in opposite directions. The regression line runs from the upper
left of the graph to the lower right. Negative correlation could be changed to
positive by changing the type of score plotted on one of the variables.
6.4.2 Perfect negative correlation
exists, as does perfect positive correlation, when all points are on the
regression line. The correlation coefficient in such a case is –1.00. For
example, there is a perfect negative relationship between the amount of money
in your checking account and the amount of money you have written check for (if
you ignore service charge and deposits). As the amount of money you write
checks for increases, your balance decreases by exactly the same amount.
6.4.3 Other examples of negative
correlation (less than perfect) are;
and inches of snow at the top of a mountain, measured at noon each day in May
of sunshine and inches of rainfall per day at Miami, Florida
of pounds lost and number of calories consumed per day by a person on a strict
6.4.4 Negative correlation permits
prediction in the same way that positive correlation does. With correlation,
positive is not better than negative. In both cases, the size of the
correlation coefficient indicates the strength of the relation ship-the larger
the absolute value of the number, the stronger the relationship. The algebraic
sign (+ or -) indicates the direction of the relationship.
6.5 Zero Correlation
6.5.1 A zero correlation means that
there is no relationship between the two variables. High and low scores on the
two variables are not associated in any predictable manner. In the case of zero
correlation, the best prediction from any X score is the mean of the Y scores.
The regression line, then, runs parallel to the X axis at the height of Y on
the Y axis.
6.6 Computation of the correlation
6.7 Computational Formulas
6.7.2 Blanched Formula
procedure requires you to find the means and standard deviations of both X and
Y before computing r.
126.96.36.199.1 r=(j(X(each value)*Y(each
188.8.131.52.2 XY=Product of each X value multiplied by
its paired Y value
184.108.40.206.3 X(mean)=Mean of variable X
220.127.116.11.4 Y(mean)=Mean of variable Y
18.104.22.168.5 Sx=Standard deviation of
22.214.171.124.6 Sy= Standard deviation of
126.96.36.199.7 N=Number of pairs of observations
188.8.131.52.1 Multiply each paired X and Y score
184.108.40.206.2 Sum the products of X*Y
220.127.116.11.3 Divide the summed products of X*Y by the
number of paired scores (N)
18.104.22.168.4 Multiply the mean of the X scores X(mean)
by the mean of the Y scores Y(mean)
22.214.171.124.5 Minus the product of X(mean)*Y(mean) from
the product of the division in step 3
126.96.36.199.6 Multiply the standard deviation of X
scores Sx by the standard
deviation of Y scores Sy.
188.8.131.52.7 Divide the product of step 5 by the product
of step 6
6.7.3 Raw Score Formula
this formula, you start with the raw scores and obtain r without having to
compute means and standard deviations
184.108.40.206.1 r=(N*(j(X(each value)*Y(each value))))-(( jX)(*(
Root [(N*(jX2 )-( jX)2]*[
(N*(jY2 )-( jY)
220.127.116.11.2 XY=Product of each X value multiplied by
its paired Y value
18.104.22.168.3 X(mean)=Mean of variable X
22.214.171.124.4 Y(mean)=Mean of variable Y
126.96.36.199.5 N=Number of pairs of observations
188.8.131.52.1 Multiply each paired X and Y score
184.108.40.206.2 Sum the products of X*Y
220.127.116.11.3 Multiply the summed products by the number
of paired observations.
18.104.22.168.4 Sum the X scores
22.214.171.124.5 Sum the Y scores
126.96.36.199.6 Multiply the summed X scores by the Summed
188.8.131.52.7 Minus the product of step 6 (Summed X
scores*Summed Y scores) from the product of step 4 (summed products*N)
184.108.40.206.8 Square each X score (X2) and sum
220.127.116.11.9 Multiply the product of step 8 (Summed
products of X*X (X2)) by the number of paired scores.
Sum the X
scores and square the product (jX*jX) or (jX) 2.
product of step 10 ((jX) 2) from the product of step 9
Square each Y
score (Y 2) and sum the products
product of step 12 (Summed products of Y*Y (Y 2)) by the number of
Sum the Y
scores and square the product (jY *j Y) or (jY) 2.
product of step 14 ((jY) 2) from the product of step
13 (N*(j Y 2)
product of step 15 [N*(j Y 2)- ((jY)
2)] by the product of step 11 [N*(j X 2)- ((jX)
square root of step 16 [N*(j X 2)- ((jX)
2)]* [N*(j Y 2)- ((jY)
product of step 7 [(N*jXY)-(( jX)*( jY))] by the product of step 17 [SQUARE
ROOT[N*(j X 2)- ((jX)
2)]* [N*(j Y 2)- ((jY)
6.8 The Meaning Of r
6.8.1 (r)=is a descriptive statistic
or summary index number, like the mean and standard deviation and is used to
describe a set of data.
6.8.2 A correlation coefficient is a
measure of the relationship between two variables. It describes a the tendency
of two variables to vary together (covary); that is, it describes the tendency
of high or low values of one variable to be regularly associated with either
high or low values of the other variable. The absolute size of the coefficient
(from 0 to 1.00) indicates the strength of that tendency to covary.
6.8.4 The above scatterplot shows the
correlational relationships of r=.20, .40, .60, and .80. Notice that as the
size of the correlation coefficient gets larger, the points cluster more and
more closely to the regression line; that is, the envelope containing the
points becomes thinner and thinner. This means that a stronger and stronger tendency
to covary exists as r becomes larger and larger. It also means that predictions
made about values of the Y variable from values of the X variable will be more
accurate when r is larger.
6.8.5 The algebraic sign tells the
direction of the covariation. When the sign is positive, high values of X are
associated with high values of Y, and low values of X are associated with low
values of Y. When the sign is negative, high values of X are associated with
low values of Y, and low values of X are associated with high values of Y.
Knowledge of the size and direction of r, then, permits some prediction of the
value of one variable if the value of the other variable is known.
6.8.6 Correlation vs. Causation
correlation coefficient does not tell you whether or not one of the variables
is causing the variation in the other. Quite possibly some third variable is
responsible for the variation in both.
correlation coefficient alone cannot establish a causal relationship.
6.8.7 Coefficient of Determination
is an overall index that specifies the proportion of variance that two
variables have in common.
18.104.22.168.1 COD=Coefficient of Determination
22.214.171.124.2 ( r )=Pearson product-moment correlation
126.96.36.199.1 Multiply r * r (r2)
could be argued that the proportion of variance the two variables have in
common can be attributed to the same cause. Or that this is the percentage of
variance which adheres most closely to the regression line.
what happens to a fairly strong correlation of .70 when it is interpreted in
terms of variance. Only 49 % of the variance is held in common.
coefficient is useful in comparing correlation coefficients. When one compares
an r of .80 with an r of .40, the tendency is to think of the .80 as being
twice as high as .40, but that is not the case. Correlation coefficients are
compared in terms of the amount of common variance. .802=.64, .402=.16,
.64/.16=4 Thus, two variables that are correlated with r=.80 have four times as
much variance as two variables correlated with r=.40
6.8.8 Practical Significance of r
high must a correlation coefficient be before it is of use? How low must it be
before we conclude it is useless? Correlation is useful if it improves
prediction over guessing. In this sense, any reliable correlation other than
zero, whether positive or negative, is of some value because it will reduce to
some extent the incorrect predictions that might other wise be made. Very low
correlations allow little improvement over guessing in prediction. Such poor
prediction usually is not worth the costs involved in practical situations.
Generally, researchers are satisfied with lower correlations in theoretical
work but require higher ones in practical situations.
6.9 Correlation and Linearity
6.9.1 For r to be a meaningful
statistic, the best fitting line through the scatterplot of points must be a
straight line. If a curved regression line fits the data better than a straight
lie, r will be low, not reflecting the true relationship between the two
variables. The product-moment correlation coefficient is not appropriate as a
measure of curved relationships. Special non-linear correlation techniques for
such relationships do exist and are described elsewhere.
6.10 Other Kinds of Correlation Coefficients
may be computed on data for which one or both of the variables are dichotomous
(having only two possible values). An example is the correlation of the
dichotomous variable sex and the quantitative variable grade-point average.
variables can be combined, and the resulting combination can be correlated with
one variable. With this technique, called multiple correlation a more precise
prediction can be made. Performance in school or on the job can usually be
predicted better by using several measures of a person rather than just one.
technique called partial correlation allows you to separate or partial out the
effects of one variable from the correlation of two other variables. For example,
if we want to know the true correlation between achievement-test scores in two
school subjects it will probably be necessary to partial out the effects of
intelligence since IQ and achievement are correlated.
for Ranked data
is used when the data are ranks rather than raw scores.
the relationship between two variables is curved rather than linear, the
correlation ratio, eta gives the degree of association.
statistic text books
above correlation techniques are covered in intermediate level text books. 
6.11 Correlation and Regression
6.12 Regression Equation 109
=the Y value predicted from a particular X value (Y’ is pronounced
point at which the regression line intersects the Y axis
slope of the regression line--that is, the amount Y is increasing for each
increase of one unit in X
X value used to predict Y’ .
The symbols X
and Y can be assigned arbitrarily in correlation, but, in a regression
equation, Y is assigned to the variable you wish to predict. To make
predictions of Y using the regression equation, you need to calculate the
values of the constants a and b, which are called regression coefficients.
coefficient for X and Y
=the standard deviation of the Y variable
=the standard deviation of the X variable
for positive correlation b will be a positive number. For negative correlation
b will be negative
of the Y scores
coefficient computed above
of the x scores
(standard deviation of y) by Sx (standard deviation of x)
product of step 1 (Sy/Sx) by r (correlation coefficient
for X and Y)
X(mean) (mean of the x scores) by b (regression coefficient computed in step 1
Minus the product
of the previous step above from Y(mean)
(Mean of the Y scores)
the predicted Y score
(value used to predict Y’) by b (calculated in step 1 above)
product of the previous step above to a (product of step 2 above)
a Regression Line
a Y Score
6.13 Rank Order Correlation
6.14 r Distribution Tables
7.1 A raw score does not reveal its
relationship to other scores and must be transformed into a score that reveals
these relationships. There are two types of score transformations; percentile
ranks and linear transformations.
7.2.1 A relationship between scores is
revealed increasing the amount of information for analytical interpretation.
7.2.2 Allows two scores to be
7.3 Percentile Ranks Based On The
7.3.1 The percentile rank is the
percentage of scores that fall below a given score.
the scores from lowest to highest and determine total number of scores.
188.8.131.52.1.1 33 28 29 37 31 33 25 33 29 32 35
184.108.40.206.1.2 25 28 29 29 31 32 33 33 33 35 37=Total
number of scores=11
the number of scores falling below the selected score
220.127.116.11.1 Example Number=31
18.104.22.168.1.1 Number of scores below=4
the percentage of scores which fall below the selected score by dividing the
number of scores below by the total number of scores and multiplying by 100.
the percentage of scores which fall at the selected score by dividing that
number by the total number of scores and multiplying by 100.
the percentage of scores at the selected score by 2 and add the product to the
percentage of scores below the selected score.
22.214.171.124.2 This would mean that the percentage of
scores falling below the score of 31would be 40.95% and that would be the
scores percentile rank.
Summary of Process
126.96.36.199.1 Rank the scores from lowest to highest
188.8.131.52.2 Add the percentage of scores that fall
below the score to one-half the percentage of scores that fall at the score.
184.108.40.206.3 The result is the percentile rank of that
score which is the percentage of scores which fall below the selected score.
220.127.116.11.2 This would mean that the percentage of
scores falling below the score of 33 would be 68.18% and that would be the
scores percentile rank.
of the algebraic procedure applied to the selected numbers of 31 and 33.
7.4 Percentile Ranks Based On The
8.2 ADDITIVE COMPONENT
8.3 MULTIPLICATIVE COMPONENT
8.4 LINEAR TRANSFORMATIONS - EFFECT
ON MEAN AND STANDARD DEVIATION
8.5 LINEAR TRANSFORMATIONS - FINDING
a AND b GIVEN X (sample mean AND sX'
8.6 STANDARD SCORES OR Z-SCORES
(Post Score) – Mean (Pre Score))/(Standard Deviation (Pre Score)/( SQRT Count))
8.7 CONCLUSION AND SUMMARY
9.1 Definition of Inferential
9.1.1 Inferential statistics are
concerned with decision-making. Usually, the decision is whether the difference
between two samples is probably due to chance or probably due to some other
factor. Inferential statistics help you make a decision by giving you the
probability that the difference is due to chance. If the probability is very
high a decision that the difference is due to chance is supported. If the
probability is very low, a decision that the difference is due to some other
factor is supported. Descriptive statistics are also used in these
9.2.1 Distributions from observed
scores are called empirical distributions
9.2.2 Theoretical distributions are
based on mathematical formulas and logic rather than on empirical observations.
The probability that the event was due to chance is found by using a
9.2.3 Probability of the occurrence of
any event ranges from .00 (there is no possibility that the event will occur)
to 1.00 (the event is certain to happen). Theoretical distributions are used to
find the probability of an event or a group of evnts.
9.3 Rectangular Distribution
9.3.1 The Histogram below is a
theoretical frequency distribution that shows the types and number of cards in
an ordinary deck of playing cards. Since there are 13 kinds of cards, and the
frequency of each card is four, the theoretical curve is rectangular in shape.
(The line that encloses a frequency polygon is called a curve, even if it is
straight.) The number in the area above each card is the probability of
obtaining that card in a chance draw from the deck. That probability (.077) was
obtained by dividing the number of cards that represent the event (4) by the
total number of cards (52)
9.3.2 Illustration Theoretical Card
9.3.3 Probabilities are often stated
as “chances in a hundred.” The expression p=.077 means that there are 7.7
chances in 100 of the event in question occurring. Thus from the illustration above
you can tell at a glance that there are 7.7 chances in 100 of drawing an ace
from a deck of cards.
9.3.4 With this theoretical
distribution, you can determine other probabilities. Suppose you wanted to know
your chances of drawing a face card or a 10. These are the darkened areas
above. Simply add the probabilities associated with a 10, jack, queen, and
king. Thus, .077 +077 + 077 + 077=.308. Which means you have 30.8 chances in
100 of drawing one of these face cards or a 10.
9.3.5 One property of the distribution
above is true for all theoretical distributions in that the total area under
the curve is 1.00. In the above illustration there are 13 kinds of events, each
with a probability of .077. Thus, (13)(.077)=1.00. With this arrangement, any
statement about area is also a statement about probability. Of the total area
under the curve, the proportion that signifies “ace” is .077, and that is also
the probability of drawing an ace from the deck.
9.4 Binomial Distribution
9.4.1 The Binomial (two names) is
another example of a theoretical distribution.
9.5 Comparison of Theoretical and
9.5.1 A theoretical curve represents
the “best estimate” of how the events would actually occur. As with all
estimates, the theoretical curve is somewhat inaccurate; but in the world of
real events it is better than any other estimate. A theoretical distribution is
one based on logic and mathematics rather than on observations. It shows you
the probability of each event that is part of the distribution. When it is
similar to an empirical distribution, the probability figures obtained from the
theoretical distribution are accurate predictors of actual events.
9.5.2 There are a number of
theoretical distributions that applied statisticians have found useful. (normal
distribution, t distribution, F distribution, chi square distribution, and U
9.6 The Normal Distribution
9.6.1 Early statisticians, who found
that frequency distributions of data gathered from a wide variety of fields
were similar, established the name normal distribution.
9.6.2 The normal distribution is
sometimes called the Gaussian distribution after Carl Friedrich Gauss
(1777-1855) who developed the curve (about 1800 as a way to represent the
random error in astronomy observations. Because this curve was such an accurate
picture of the effects of random variation, early writers referred to the curve
as the law of error.
9.6.3 Description of the Normal
normal distribution is a bell-shaped, symmetrical distribution, a theoretical
distribution based on a mathematical formula rather than on any empirical
observations although empirical curves often look similar to this theoretical
distribution. Empirical distributions usually start to look like the normal
distribution after 100 or more observations. When the theoretical curve is
drawn, the Y-axis is usually omitted. On the X-axis, z scores are used as the
unit of measurement for the standardized norm curve with the following formula.
mean, median, and the mode are the same score-the score on the X-axis at which the
curve is at its peak. If a line were drawn from the peak to the mean score on
the X-axis, the area under the curve to the left of the line would be half the
total area-50%-leaving half the area to the right of the line. The tails of the
curve are asymptotic to the X axis; that is , they never actually cross the
axis but continue in both directions indefinitely with the distance between the
curve and the X axis getting less and less. Although theoretically the curve
never ends, it is convenient to think of (and to draw) the curve as extending
from -3 to +3.
two inflection points in the curve are at exactly -1 and +1. An
inflection point is where a curve changes from bowed down to bowed up, or vice
that are not normal distributions are definitely not abnormal but simply
reflect how data is distributed. The use of the word normal is meant to imply
9.6.4 Use of the Normal Distribution
theoretical normal distribution is used to determine the probability of an
event as the figure below illustrates showing the probabilities associated with
certain areas. The web link below can calculate these areas between the mean
and the z score when you plug in the mean of 0 in the box to the left of the
first applet and the z score in the right box then click between for the area
between the mean and the z score as the illustration below demonstrates. These
probabilities are also found in tables in the back of most statistic textbooks.
Normal Distribution Link
of Normal Distribution
normally distributed empirical distribution can be made to correspond to the
standardized normal distribution (a theoretical distribution) by using z
scores. Converting the raw scores of any empirical normal distribution to z
scores will give the distribution a mean equal to zero and a standard deviation
equal to 1.00 and that is exactly the scale used in the theoretical normal
distribution. With this correspondence established, the theoretical normal
distribution can be used to determine the probabilities of empirical events,
whether they are IQ scores, tree diameters, or hourly wages.
9.6.5 Finding What Proportion of a
Population has Scores of a Particular Size or Greater
Raw Scores to z Scores
18.104.22.168.2 Variables Defined
22.214.171.124.2.1 (z)=z score
deviation of scores
126.96.36.199.2.3 (x)=individual raw score
188.8.131.52.3.1 Find the difference between the raw score
and the mean
184.108.40.206.3.2 Divide that difference by the standard
the proportion of the distribution between the mean and the z score. (This
gives you the proportion from the mean)
220.127.116.11.1 You can look this up in the back of a
statistics textbook in the table for areas under the normal curve between the
mean and z
18.104.22.168.2 Web Reference
22.214.171.124.2.1 The web link below can calculate these
areas between the mean and the z score when you plug in the mean of 0 in the
box to the left of the first applet and the z score in the right box then click
between for the area between the mean and the z score as the illustration below
the proportion between the mean and your z score from .5000
126.96.36.199.1 .5000 or 50% of the curve lies to the right
of the mean and the proportion you found from the reference in step #2 above is
the proportion between the mean and the z score
188.8.131.52.2 The difference is the proportion above your
z score or the percentage of scores above your raw score expected to be found.
9.6.6 Finding the Score that Separates
the Population into Two Proportions
of starting with a score and calculating proportions, you can also work
backward and answer questions about scores if you are given proportions. If for
example you want to find a score that is required to be in the top 10% of the
population follow the procedure below.
184.108.40.206.1 (z)=z score
deviation of scores
220.127.116.11.3 (x)=individual raw score
18.104.22.168.1 Find the difference between the chosen
percentage and .5000. For example .5000-.1000=.4000. (If you wanted to find the
z score that separates the upper 10% of the distribution from the rest.
22.214.171.124.2 The product of step # 1 above is used to
calculate the z score for the above equation. To find the z score use the use
tables in a stats textbook in the table for areas under the normal curve
between the mean and z. Look up the difference in the previous step or its
closest approximation and find the z score associated to plug into the equation
above. You can also use the web reference below to find the z score
reference below 2nd applet gives you the z score to be used in the
equation above. Pug in a mean of 0 and SD (Standard Deviation) of 1, put in the
percentage in decimals (eg .10=10%, .20=20%) into the shaded area box, and
click the above button to obtain the z score you can use in the above equation.
126.96.36.199.3 Plug the z score found in step # 2 above
into the equation above to find the raw score which separates the two
Scores to Z scores
188.8.131.52.2 Variables Defined
184.108.40.206.2.1 (z)=z score
deviation of scores
220.127.116.11.2.3 (x)=individual raw score
18.104.22.168.3.1 Find the difference between the raw score and
22.214.171.124.3.2 Divide that difference by the standard
the proportion of the distribution between the mean and the z score. (This
gives you the proportion from the mean) for each of the z scores above.
126.96.36.199.1 You can look this up in the back of a statistics
textbook in the table for areas under the normal curve between the mean and z
188.8.131.52.2 Web Reference
184.108.40.206.2.1 The web link below can calculate these
areas between the mean and the z score when you plug in the mean of 0 in the
box to the left of the first applet and the z score in the right box then click
between for the area between the mean and the z score as the illustration below
the proportions to find the Proportion of the Population between Two Scores
9.6.8 Finding the Extreme Scores in a
section outlines how to find extreme scores that divide the population into a
percentage at each tail of the distribution.
220.127.116.11.1 (z)=z score
deviation of scores
18.104.22.168.3 (x)=individual raw score
22.214.171.124.1 Divide the percentage by 2
126.96.36.199.2 Find the difference between .5000 and the
188.8.131.52.3 Find the z score from the previous step
184.108.40.206.4 Plug the z score into the above equation
9.7 Comparison of Theoretical and
9.7.1 The accuracy of predictions
based on a normal theoretical distribution will depend on how representative
the empirical sample as discussed in the next section
understanding of sampling distributions requires an understanding of samples. A
sample, of course, is some part of the whole thing; in statistics the “whole
thing” is a population. The population is always the thing of interest; a
sample is used only to estimate what the population is like. One obvious
problem is to get samples that are representative of the population.
that are random have the best chance of being representative and a sampling
distribution can tell you how much faith (probability-wise) you can put in
results based on a random sample.
means all the members of a specified group. Sometimes the population is one
that could actually be measured, given plenty of time and money. Sometimes,
however, such measurements are logically impossible. Inferential statistics are
used when it is not possible or practical to measure an entire population.
using samples and the methods of inferential statistic can make decisions about
immeasurable populations. Unfortunately, there is some peril in this. Samples
are variable, changeable things. Each one produces a different statistic. How
can you be sure that the sample you draw will produce a statistic that will
lead to a correct decision about the population? Unfortunately, you cannot be
absolutely sure. To draw a sample is to agree to accept some uncertainty about
the results. However it is possible to measure this uncertainty. If a great
deal of uncertainty exists, the sensible thing to do is suspend judgment. On
the other hand, if there is very little uncertainty, the sensible thing to do
is reach a conclusion, even though there is a small risk of being wrong.
Restated you must introduce a
hypothesis about a population and then, based on the results of a sample,
decide that the hypothesis is reasonable or that it should be rejected.
10.2 Representative and Nonrepresentative
you want to know about an unmeasurable population you have to draw a
representative sample by using a method of obtaining samples that is more
likely to produce a representative sample than any other method. How well a
particular method works can be assessed either mathematically or empirically.
For an empirical assessment, start with a population of numbers, the parameter
of which can be easily calculated. The particular method of sampling is
repeatedly used, and the corresponding statistic calculated for each sample.
The mean of these sample statistics can then be compared with the parameter.
will name two methods of sampling that are most likely to produce a
representative sample, discuss one of them in detail, and then discuss some
ways in which Nonrepresentative samples are obtained when the sampling method
method called random sampling is commonly used to obtain a sample that is most
likely to be representative of the population. Random has a technical meaning
in statistics and does not mean haphazard or unplanned. A random sample in most
research situations is one in which every potential sample of size N has an
equal probability of being selected. To obtain a random sample, you must
population of scores
every member of the population
in such a way that every sample has an equal probability of being chosen
method is to assign each score a number and use the random number generator
below to pick your sample of numbers.
go through these steps with a set of real data-the self-esteem scores of 24
fifth-grade children.2 We define these 24 scores as our population. From these
we will pick a random sample of seven scores.
One method of picking a random sample is to write each self-esteem score on a
slip of paper, put the 24 slips in a box, jumble them around, and draw out
seven. The scores on the chosen slips become a random sample. This method works
fine if the slips are all the same size and there are only a few members of the
population. If there are many members, this method is tedious.
(easier) method of getting a random sample is to use a table of random numbers,
such as Table B in the Appendix. To use the table, you must first assign an
identifying number to each of the 24 self-esteem scores, thus:
score has been identified with a two-digit number. Now turn to Table B and
pick a row and a column in which to start. Any haphazard method will work;
close your eyes and stab a place with your finger. Suppose you started at row
35, columns 70-74. Reading horizontally, the digits are 21105. Since you need
only two digits to identify any member of our population, use the first two
digits, 21. That identifies one score for the sample-a score of 46. From this
point, you can read two-digit numbers in any direction-up, down, or
sideways-but the decision should have been made before you looked at the
numbers. If you had decided to go down, the next number is 33. No self-esteem
score has an identifying number of 33, so skip it and go to 59, which gives you
the saine problem as 33. In fact, the next five numbers are too large. The
sixth number is 07, which identifies the score of 32 for the random sample. The
next usable number is 13, a score of 35. Continue in this way until you arrive
at the bottom. At this point, you can go in any direction. We will skip over
two columns to columns 72 and 73 (you were in columns 70 and 71) and start up.
The first number is 12, which identifies a score of 31. The next usable numbers
are 19, 05, and 10, giving scores of 35, 42, and 24. Thus, the random sample of
seven consists of the following scores: 46, 32, 35, 31, 35, 42, and 24. If
Table B had produced the same identifying number twice, you would have ignored
it the second time.
is this table of random numbers? In Table B (and in any table of random
lUmbers), the probability of occurrence of any digit from a to 9 at any place
in the table s the same-. 10. Thus, you are just as likely to find 000 as 123
or 381. Incidentally, 'ou cannot generate random numbers out of your head.
Certain sequences begin to _cur, and (unless warned) you will not include
enough repetitions like 666 and 000.
are some hints for using a table of random numbers.
Make a check beside the identifying number of a score when it
is chosen for the sample. This will help prevent duplications.
If the population is large (over 100), it is more efficient to
get all the identifying numbers from the table first. As you select them, put
them in some rough order. This will help prevent duplications. After you have
all the identifying numbers, go to the population to select the sample.
If the population has exactly 100 members, let 00 be the
identifying number for 100. In this way ,you can use two-digit identifying numbers,
each one of which matches a population score. This same technique can be
applied to populations of 10 or 1000 members.
method called stratified sampling is another way to produce a sample that is
very likely to mirror the population. It can be used when an investigator knows
the numerical value of some important characteristic of the population. A
stratified sample is controlled so that it reflects exactly some know
characteristic of the population. Thus, in a stratified sample, not everything
is left to chance.
example, in a public opinion poll on a sensitive political issue, it is
important that the sample reflect the proportions of the population who
consider themselves Democrat, Republican, and Independent. The investigator
draws the sample so it will reflect the proportions found in the population.
The same may be done for variables such as sex, age, and socio-economic status.
After stratification of the samples has been determined, sampling within each
stratum is usually random.
justify a stratified sample, the investigator must know what var9iables will
affect the results and what the population characteristics are for those
variables. Some times the investigator has this information (as from census
data), but many times such information is just not available (as in most
biased sample is one that is drawn using a method that systematically
underselects or overselects from certain groups within the population. Thus, in
a biased sampling technique, every sample of a given size does not have an
equal opportunity of being selected. With biased sampling techniques, you are
much more likely to get a Nonrepresentative sample than you are with random or
stratified sampling techniques.
example, it is reasonable to conclude that some results based on mailed
questionnaires are not valid, because the samples are biased since not all of
the recipients will respond and those that do may be different than those that
do. Therefore, the sample is biased. The probability of bias is particularly
high if the questionnaire elicits feelings of pride or despair or disgust or
apathy in some of the recipients.
a nice random sample you can predict fairly accurately your chance of being
wrong. If it is higher than you would like, you can reduce it by increasing
sample size. With a biased sample, however, you do not have any basis for
assessing your margin of error and you don’t know how much confidence to put in
your predictions. You may be right or you may be very wrong. You may get
generalizable results from such samples, but you cannot be sure. The search for
biased samples in someone else’s research is a popular (and serious) game among
10.3 Sampling Distributions
two categories of sampling distributions are: sampling distributions in general
and sampling distributions of the mean.
sampling distribution is a frequency distribution of sample statistics. Drawing
many random samples from a population and calculating a statistic on each
sample could obtain a sampling distribution. These statistics would be arranged
into a frequency distribution. From such a distribution you could find the
probability of obtaining any particular values of the statistic.
sampling distribution is for a particular statistic (such as the mean,
variance, correlation coefficient and so forth). In this section you will learn
only about the sampling distribution of the mean. It will serve as an
introduction to sampling distributions in general, some others of which you
will find out about in later sections.
10.4 The Sampling Distribution of the Mean
Sampling Distribution of the Mean
sampling distribution of the mean is a frequency distribution of sample means
is drawn randomly from the same population
size (N) is the same for all samples
The number of
samples is very large
illustration shows 200 separate random samples, each with N=10 from a
population of 24 self esteem scores. The mean of each group of 10 was calculated,
and arranged in 200 sample means () into
the frequency polygon. The mean (parameter of the 24 self esteem scores is
35.375. In the illustration below most of the statistics (sample means) are
fai8rly good estimates of that parameter. Some of the ’s, of
course, miss the mark widely; but most are pretty close. The illustration below
is an empirical sampling distribution of the mean. Thus, a sampling
distribution of the mean is a frequency distribution of sample means.
sampling Distribution of the Means (Frequency Distribution of sample means)
never use an empirical sampling distribution of the mean in any of your
calculations; you will always use theoretical ones that come from mathematical
formulas. An empirical sampling distribution of the mean is easier to
understand for illustration purposes.
population of scores, regardless of form, the sampling distribution of the mean
will approach a normal distribution as N (sample size) gets larger. Furthermore,
the sampling distribution of the mean will have a mean equal to the and a standard
deviation equal to .
Now you know
not only that sampling distributions of the mean are normal curves but also
that, if you know the population parameters and , you
can determine the parameters of the sampling distribution.
qualification is that the sample size (N) be large. How many does it take to
make a large sample? The traditional answer is 30 or more, although, if the
population itself is symmetrical, a sampling distribution of the mean will be
normal with sample sizes much smaller than 30. If the population is severely
skewed samples with 30 (or more) may be required.
The mean of
the sampling distribution of means will be the same as the population mean, . The standard deviation of the sampling distribution will be
the standard deviation of the population ()
divided by the square root of the sample size.
Limit Theorem works regardless of the form of the original population. Thus,
the sampling distribution of the mean of scores coming from a rectangular or
bimodal population approaches normal if N is large.
deviation of any sampling distribution is called the standard error, and the
mean is called the expected value. In this context, and in several others in
statistics, the term error means deviations or random variation. Sometimes,
error refers to a mistake, but most often it is used to indicate deviations or
In the case
of the sampling distribution of the mean, we are dealing with a standard error
of the mean (symbolized and the expected value of the mean
rarely encountered the standard error is commonly used. Be sure that you
understand that it is the standard deviation of the sampling distribution of
some statistic. In this section, it is the standard deviation of a sampling
distribution of the mean.
Theoretical Sampling Distribution of the Mean, N=10
Distribution of the Mean for N=10
Standard Deviation=6.304/ =1.993
of a Sampling Distribution of the Mean
the sampling distribution of the mean is a normal curve, you can apply what you
learned in the last chapter about normally distributed scores to questions
about sample means. In the above illustration notice the question mark points
to the area below a mean of 32 of sample means and asks what proportion of
sample mean scores would fall below that score. First you would find the
standard error of the mean, then the z score which allows you to determine the
error of the mean Formula
standard deviation of the population by the square root of the number of the
error of the mean
of the sample
=Mean of the
difference between the population mean and the sample mean
difference found in the previous step by the standard error of the mean to
determine the z score.
proportion associated with the z score of the previous step with the Web link
Using the web
reference below, click the below button and type in your z score to find the
proportion of scores which fall below that score. Likewise knowing the z score
you could find scores between the z score and the mean or any other combination
by clicking the appropriate button and inserting your z score.
the Illustration Theoretical Sampling Distribution of the Mean (above) with a z
score of 1.993, you would expect a proportion of .0455
of the means to be less than 32. We can check this prediction by determining
the proportion of those 200 random samples that had means of 32 or less. By
checking the frequency distribution from which the theoretical sampling
distribution was drawn (Empirical sampling Distribution of the Means (Frequency Distribution of
sample means)) (see above) we found the empirical proportion to be .0400.
Missing by ½ of 1 percent isn’t bad, and once again, you find that a
theoretical normal distribution predicts an actual empirical proportion quite
effect does sample size have on a sampling distribution? When the sample size
(N) becomes larger will become smaller. See the equation above
and illustration below. This illustration shows some sampling distributions of
the mean based on the population of 24 self-esteem scores. The sample sizes are
3, 5, 20, 20. A sample mean of 39 is included in all four figures as a
reference point. Notice that, as becomes smaller, a sample mean of 39 becomes
a rarer and rarer event. The good investigator, with an experiment to do, will
keep in mind what we have just demonstrated about the effect of sample size on
the sampling distribution and will use reasonably large samples.
Sampling distributions of the mean for four different sample sizes. All samples
are drawn from the same population. Note how a sample mean of 39 becomes rarer
and rarer as becomes smaller.
10.5 Calculating a Sampling Distribution when
Parameters are not Available
of the foregoing information is based on the assumption that you have the
population parameters, and, as you know, that is seldom the case. Fortunately,
with a little modification of the formula and no modification of logic, the
random sample you learned to draw can be used for estimating the population
you have only a sample standard deviation with which to estimate the standard
error of the mean, the formula is the following
statistic s is an
estimate of , and is required for use of the normal curve. The
larger the sample size, the more reliable s is. As
a practical matter, s is
considered reliable enough if N is 30. As a technical
matter, the normal curve is only appropriate when you know and .
Error of the Mean Estimated from a sample
error of the mean estimated from a sample
deviation of a sample
Divide s by
the square root of N to find s.
statisticians identify two different types of decision-making processes as statistical
inference. The first process is called hypothesis testing, and the second is
called estimation. Hypothesis testing means to hypothesize a value for a
parameter, compare (or test) the parameter with an empirical statistic, and
decide whether the parameter is reasonable. Hypothesis testing is just what you
have been doing so far in this chapter. Hypothesis testing is the more popular
technique of statistical inference.
other kind of inferential statistics, estimation, can take two forms-parameter
estimation and confidence intervals. Parameter estimation means that one
particular point is estimated to be the parameter of the population. A
confidence interval is a range of values bounded by a lower and an upper limit.
The interval is expected, with a certain degree of confidence, to contain the
parameter. These confidence intervals are based on sampling distributions.
Concept of a Confidence Interval
confidence interval is simply a range of values with a lower and an upper
limit. With a certain degree of confidence (usually 95% or 99%), you can state
that the two limits contain the parameter. The following example shows how the
size of the interval and the degree of confidence are directly related (that
is, as one increases the other increases also).
sampling distribution can be used to establish both confidence and the
interval. The result is a lower and an upper limit for the unknown population
is the rationale for confidence intervals. Suppose you define a population of
scores. A random sample is drawn and the mean ()
calculated. Using this mean (and the techniques described in the next section),
a statistic called a confidence interval is calculated. (We will use a 95%
confidence interval in this explanation.) Now, suppose that from this
population many more random samples are drawn and a 95% confidence interval
calculated for each. For most of the samples, will be close to and will fall within the
confidence interval. Occasionally, of course, a sample will produce an far from and the confidence interval about will not contain . The method is such, however, that the probability of these
rare events can be measured and held to an acceptable minimum like 5%. The
result of all this is a method that produces confidence intervals, 95% which
life situation, you draw one sample and calculate one interval. You do not know
whether or not lies between the two
limits, but the method you have used makes you 95% confident that it does.
the Limits of a Confidence Interval
a random sample and calculated the mean and standard error, the Upper and Lower
limit confidence Interval may be calculated.
confidence level is used for problems of estimation, such as confidence
intervals, and the term significance level is used for problems of hypothesis
error of the mean estimated from a sample
deviation of a sample
of the sample
(1.96=95% 2.58 =99%)
error of the mean estimated from a sample s
Divide s by
the square root of N to find s.
z score (based on the confidence interval you want (1.96=95% 2.58 =99%)) by s
z score (based on the confidence interval you want (1.96=95% 2.58 =99%)) by s
Find the sum
of the and the product of the previous step to
determine the upper limit score
10.7 Other Sampling Distributions
you have been introduced to the sampling distribution of the mean. The mean is
clearly the most popular statistic among researchers. There are times, however,
when the statistic necessary to answer a researcher’s question is not the mean.
For example, to find the degree of relationship between two variables, you need
a correlation coefficient. To determine whether a treatment causes more
variable responses, you need a standard deviation. Proportions are commonly
used statistics. In each of these cases (and indeed, for any statistic), the
basic hypothesis testing procedure you have just learned is often used by
Draw a random
sample and calculate a statistic
statistic with a sampling distribution of that statistic and decide whether
such a sample statistic is likely if the hypothesized population parameter is
are sampling distributions for statistics other than the mean such as the t
distribution. In addition, some statistics have sampling distributions that are
normal, thus allowing you to use the familiar normal curve.
with every sampling distribution comes a standard error. Just as every
statistic has its sampling distribution, every statistic has its standard
error. For example, the standard error of the median is the standard deviation
of the sampling distribution of the median. The standard error of the variance
is the standard deviation of the sampling distribution of the variance. Worst
of all, the standard error of the standard deviation is the standard deviation
of the sampling distribution of the standard deviation. If you followed that
sentence, you probably understand the concept of standard error quite well.
main points we want to emphasize are that statistics are variable things, that
a picture of that variety is a sampling distribution, and that a sampling
distribution can be used to obtain probability figures.
10.8 A Taste of Reality
techniques of inferential statistics that you are learning in this book are
based on the assumption that a random sample has been drawn. But how often do
you find random samples in actual data analysis? Seldom. However, there are two
justifications for the continued use of non-random samples.
the first place, every experiment is an exercise in practicality. Any
investigator has a limited amount of time, money, equipment, and personnel to
draw upon. Usually, a truly random sample of a large population is just not
practical, so the experimenter tries to obtain a representative sample, being
careful to balance or eliminate as many sources of bias as possible.
the second place, the only real test of generalizability is empirical-that is
finding out whether the results based on a sample are also true for other
samples. This kind of check-up is practiced continually. Usually, the results
based on samples that are unsystematic (but not random) are true for other
samples from the same population.
of these justifications develop a very hollow ring, however, if someone
demonstrates that one of your samples is biased and that a representative
sample proves your conclusions false.
Differences between Means
of the best things about statistics is that it helps you to understand
experiments and the experimental method. The experimental method is probably
the most powerful method we have of finding out about natural phenomena. Few
ifs, ands, or buts or other qualifiers need to be attached to conclusions based
on results from a sound experiment.
sections below will discuss the simplest kind of experiment and then show how
the statistical techniques you have learned about sampling distributions can be
expanded to answer research questions.
11.2 A Short Lesson on How to Design An
basic ideas underlying a simple two-group experiment are not very complicated
logic of an experiment
two equivalent groups and treat them exactly alike except for one thing. Then measure
both groups and attribute any difference between the two to the one way in
which they were treated differently.
above summary of an experiment is described more fully in the table below
Summary of simple Experiment Table 8-1
fundamental question of the experiment outlined above is “What is the effect of
Treatment A on a person’s ability to perform Task Q” In more formal terms, the
question is “For Task Q scores, is the mean of the population of those who have
had Treatment A different from the mean of the population of those who have not
had Treatment A?” This experiment has an independent variable with two levels
(Treatment A or no Treatment A) and a dependent variable (scores on Task Q). A
population of subjects is defined and two random samples are drawn.
equivalent statement is that there are two populations to begin with and that
the two population means are equal. On random sample is then drawn from each
population. Actually, when two samples are drawn from on population, the
correct procedure is to randomly assign each subject to a group immediately
after it is drawn from the population. This procedure continues until both
groups are filled.
random samples are both representative of the population and (approximately)
equivalent to each other. Treatment A is then administered to one group
(commonly called the experimental group) but not to the other group (commonly
called the control group). Except for Treatment A, both groups are treated
exactly the same way. That is, extraneous variables are held constant or
balanced out for the two groups. Both groups perform Task Q and the mean score
for each group is calculated. The two sample means almost surely will differ.
The question now is whether this observed difference is due to sampling
variation (a chance difference) or to Treatment A. You can answer this question
by using the techniques of inferential statistics. (See illustration above) In
the above example the word treatment refers to different levels of the independent
variable. The illustrations experiment had two treatments.
some experimental designs, subjects are assigned to treatments by the
experimenter, in others, the experimenter uses a group of subjects who have
already been “treated” 9for example, being males or being children of
authoritarian parents). In either of these designs, the methods of inferential
statistics are the same, although the interpretation of the first kind of
experiment is usually less open to attack.
issue is discussed more fully in Research Design and Methodology textbooks.
statistics are used to help you decide whether or not a difference between
sample means should be attributed to chance.
Logic of Inferential Statistics (The rationale for using the null hypothesis)
decision must be made about the population of those given Treatment A, but is
must be made on the basis of sample data. Accept from the start that because of
your decision to use samples, you can never know for sure whether or not
Treatment A has an effect. Nothing is ever proved through the use of
inferential statistics. You can only state probabilities, which are never
exactly one or zero. The decision-making goes like this. In a well-designed
two-group experiment, all the imaginable results can be reduced to two possible
outcomes: either Treatment A has an effect or it does not. Make a tentative
assumption that Treatment A does not have an effect and then, using the results
of the experiment for guidance, find out how probable it is that the assumption
is correct. If it is not very probable, rule it out and say that Treatment A
has an effect. If the assumption is probable, you are back where you began: you
have the same two possibilities you started with. (Negative inference)
this into the language of an experiment. Begin with two logical possibilities,
a and b
A did not have an effect. That is , the mean of the population of scores of
those who received Treatment A is equal to the mean of the population of scores
of those who did not receive Treatment A, and thus the difference between
population means is zero. This possibility is symbolized H0
(pronounced “H sub oh”).
A did have an effect. That is, the mean of the population of scores of those
who received Treatment a is not equal to the mean of the population of scores
of those who did not receive Treatment A. This possibility is symbolized H1
(pronounced “H sub one”).
assume that Treatment A had no effect (that is, assume H0). If H0
is true, the two random samples should be alike except for the usual variations
in samples. Thus, the difference in the sample means is tentatively assumed to
be due to chance.
the sampling distribution for these differences in sample means. This sampling
distribution gives you an idea of the differences you can expect if only chance
is at work.
subtraction, obtain the actural difference between the experimental group mean
and the control group mean.
the difference obtained to the differences expected (from Step 3) and conclude
that the difference obtained was:
Differences of this size are very probable just by chance, and the most
reasonable conclusion is that the difference between the experimental group and
the control group may be attributed to chance. Thus, retain both possibilities
in Step 1.
Differences of this size are highly improbable, and the most reasonable
conclusion is that the difference between the experimental group and the
control group is due to something besides chance. Thus, reject H0
(possibility a in Step 1) and accept H1 (possibility b); that is,
conclude that Treatment A had an effect.
basic idea is to assume that there is no difference between the two population
means and then let the data tell you whether the assumption is reasonable. If
the assumption is not reasonable, you are left with only one alternative: the
populations have different means.
assumption of no difference is so common in statistics that it has a name: the
null hypothesis, symbolized, as you have already learned, H0. The
null hypothesis is often stated in formal terms:
1 -2 =0
is, the null hypothesis states that the mean of one population is equal to the
mean of a second population.
the concept of the null hypothesis is broader than simply the assumption of no
difference although that is the only version used in this section. Under some circumstances,
a difference other thatn zero might be the hypothesis tested.
11.3.10 H1 is referred to as
an alternative hypothesis. Actually, there are an infinite number of
alternative hypotheses-that is, the existence of any difference other than
zero. In practice, however, it is usual to choose one of three possible
alternative hypotheses before the data are gathered:
example of the simple experiment, this hypothesis states that Treatment A had
an effect, without stating whether the treatment improves or disrupts
performance on Task Q. Most of the problems in this section use this H1
as the alternative to H0. If you reject H0 and accept
this H1, you must examine the means and decide whether Treatment A
facilitated or disrupted performance on Task Q.
hypothesis states that Treatment A improves performance on Task Q.
hypothesis states that Treatment A disrupts performance on Task Q.
11.3.11 The null hypothesis is proposed
and this proposal may meet with one of two fates at the hands of the data. The
null hypothesis may be rejected, which allows you to accept an alternative
hypothesis. Or it may be retained. If it is retained, it is not proved as true;
it is simply retained as one among many possibilities.
11.3.12 Perhaps an analogy will help
with this distinction about terminology. Suppose a masked man has burglarised a
house and stolen all the silver. There are two suspects,H1 and H0. The lawyer
for H0 tries to establish beyond reasonable doubt that her client was out of
state during the time of the robbery. If she can do this, it will exonerate H0
(H0 will be rejected, leaving only H1 as a suspect). However, if she cannot
establish this, the situation will revert to its original state: H1 or H0 could
have stolen the silver away, and both are retained as suspects. So the null
hypothesis can be rejected or retained but it can never be proved with
certainty to be true or false by using the methods of inferential statistics.
Statisticians are usually very careful with words. That is probably because
they are used to mathematical symbols, which are very precise. Regardless of
the reason, this distinction between retained and proved although subtle, is
11.4 Sampling Distribution of a Difference
difference is simply the answer in a subtraction problem. As explained in the
section on the logic of inferential statistics, the difference that is of
interest is the difference between two means. You evaluate the obtained
difference by comparing it with a sampling distribution of differences between
means (often called a sampling distribution of mean differences).
that a sampling distribution is a frequency distribution of sample statistics,
all calculated from samples of the same size drawn from the same population;
the standard deviation of that frequency distribution is called a standard
error. Precisely the same logic holds for a sampling distribution of
differences between means.
can best explain a sampling distribution of differences between means by
describing the procedure for generating an empirical sampling distribution of
mean differences. Define a population of scores. Randomly draw two samples,
calculate the mean of each, and subtract the second mean from the first. Do
this many times and then arrange all the differences into a frequency
distribution. Such a distribution will consist of a number of scores, each of
which is a difference between two sample means. Think carefully about the mean
of the sampling distribution of mean differences. Stop reading and decide what
the numerical value of this mean will be. The mean of a sampling distribution
of mean differences is zero because, on the average, the sample means will be
close to , and the differences will be close to zero. These small
positive and negative differences will then cancel each other out.
sampling distribution of mean differences has a standard deviation called the
standard error of a difference between means.
many experiments, it is obvious thaqt there are two populations to begin with.
The question, however, is whether they are equal on the dependent variable. To
generate a sampling distribution of differences between means in this case,
assume that , on the dependent variable, the two population have the same mean,
standard deviation, and form (shape of the distribution), Then draw one sample
from each population, calculate the means, and subtract one from the other.
Continue this many times. Arrange the differences between sample means into a
sampling distributions of differences between means that you will use will be
theoretical distributions, not the empirical ones we described in the last two
paragraphs. However, a description of the procedures for an empirical
distribution, which is what we’ve just given, is usually easier to understand in
things about a sampling distribution of mean differences are constant: the mean
and the form. The mean is zero, and the form is normal if the sample means are
based on large samples. Again the traditional answer to the question “What is a
large sample?” is “30 or more.”
question of this experiment was “Are the racial attitudes of 9th
graders different from those of 12th graders?” The null hypothesis
was that the population means were equal (H0: 1=2). The alternative hypothesis was that they were
not equal (H1: 1 2).
The subjects in this experiment were 9th and 12th grade
black and white students who expressed their attitudes about persons of their
own sex but different race. Higher scores represent more positive attitudes.
The table below shows the summary data. As you can quickly calculate from the
first table below the obtained mean difference between samples of 9th
and 12th graders is 4.10. Now a decision must be made. Should this
difference in samples be ascribed to chance (retain H0; there is no
difference between the population means)? Or should we say that such a difference
is so unlikely that it is due not to chance but to the different
characteristics of 9th and 12th grade students (reject H0
and accept H1; there is a difference between the populations)? Using
a sampling distribution of mean differences (see 2nd illustration
below, a decision can be made.
from an experiment that compared the racial attitudes of 9th and 12th
distribution from the racial attitudes study. It is based on chance and shows z
scores, probabilities of those z scores, and differences between sample
means.(Sampling Distribution Of Differences Between Means)
second illustration above shows a sampling distribution of differences between
means that is based on the assumption that there are no population differences
between 9th and 12th graders-that is, that the true
difference between the population means is zero.. The figure is a normal curve
that shows you z scores, possible differences between sample means in the
racial attitudes study, and probabilities associated with those z scores and
difference scores. Our obtained difference, 4.10, is not even shown on the
distribution. Such events are very rare if only chance is at work. From the
Figure you can see that a difference of 3.96 or more would be expected five
times in 10,000 (.0005). Since a difference of –3.96 or greater also has a
probability of .0005, we can add the two probabilities together to get .001.
Since our difference was 4.10 (less probable than 3.96), we can conclude that
the probability of a difference of 4.10 being due to chance is less than
.001. This probability is very small, indeed, and it seems reasonable to rule
out chance; that is, to reject H0 and, thus, accept H1.
By examining the means of the two groups in table two above we can write a
conclusion using the terms in the experiment. “Twelfth graders have more
positive attitudes toward people of their own sex, but different race than do
11.5 A Problem and Its Accepted
probability that populations of 9th and 12th grade
attitude scores are the same was so small (p< .001) that it was easy to rule
out chance as an explanation for the difference. But what if that probability
had been .01, or .05, or .25, or .50? How to divide this continuum into a group
of events that is “ due to chance” and another that is “not due to chance”-that
is the problem.
is probably clear to you that whatever solution is adopted will appear to be an
arbitrary one. Breaking any continuum into two parts will leave you
uncomfortable about the events close to either side of the break. Nevertheless,
a solution does exist.
generally accepted solution is to say that the .05 level of probability is the cut-off
between “ due to chance” and “ not due to chance.” The name of the cut-off
point that separates “ due to chance” and “not due to chance” is the level of
significance. If an event has a probability of .05 or less (for example, p=.03,
p=.01, or p=.001), H0 is rejected, and the event is considered
significant ( not due to chance). If an event has a probability of .051 or
greater (for example, p=.06, p=.50, or p=.99), H0 is retained, and
the event is considered not significant (may be due to chance). Here, the word
significant is not synonymous with “important.” A significant event in
statistics is one that is not ascribed to chance.
area of the sampling distribution that
covers the events that are “not due to chance” is called the critical region. If
an event falls in the critical region, H0 is rejected. The figure above
identifies the critical region for the .05 level of significance. As you can
see, the difference in means between 9th and 12th grade
racial attitudes (4.10) falls in the critical region, so H0 should
widely adopted, the .05 level of significance is not universal. Some
investigators use the .01 level in their research. When the .01 level is used
and H1: 1 2,
the critical region consists of .005 in each tail of the sampling distribution.
In the figure above differences greater than –3.10 or 3.10 are required in
order to reject H0 at the .01 level.
textbooks, a lot of lip service is paid to the .05 level of significance as the
cont off point for decision making. In actual research, the practice is to run
the experiment and report any significant differences at the smallest correct
probability value. Thus, in the same report, some differences may be reported
as significant at the .001 level, some at the .01 level, and some at the .05
level. At present, it is uncommon to report probabilities greater than .05 as
significant, although some researcher argue that the.10 or even the .20 level may
be justified in certain situations.
11.6 How to Construct A Sampling Distribution of
Differences Between Means
already know two important characteristics of a sampling distribution of
differences between means. The mean is 0, and the form is normal. When we
constructed the illustration above of the sampling distribution of differences
between the racial attitudes of 9th and 12th graders, we
used the normal curve table and a form of the familiar z score.
formula in the text is the “working model” of the more general Formula. Since
our null hypothesis is that 1-2=0-, the term in parentheses on the
right is 0, leaving you with the “working model.” This more general formula is of
a form you have seen before and will see again: the difference between a
and a parameter (1-2) divided by the standard error of the statistic.
(Model) Formula (z Score For Observed Mean Difference)
Standard Error of Mean
Standard Error of Difference
Difference between Sample Means Associated with each z Score
for the observed mean difference
of one sample
of a second sample
error of a difference
=Standard Error of the mean of Sample 1
= Standard Error of the mean of Sample 2
=Difference between Sample Means
of the mean estimated from a sample s
Divide s by
the square root of N to find s.
Error of Difference
Standard Error of the mean of Sample 1 and add it to the square of the Standard
Error of the mean of Sample 2.
Find the square
root of the result of the previous step to find the Standard Error of
z Score For
Observed Mean Difference
difference between the mean of sample 1 and the mean of sample
difference found in the previous step by the standard error of difference found
in the previous section.
between Sample Means Associated with each z Score
z score found in a stats textbook table or with the Web reference below by the
standard error of a difference to determine the difference between Sample Means
Error of Difference
a Sampling Distribution Of Differences Between Means as in the illustration
above (see Sampling distribution from the racial attitudes study) the tick
marks at the baseline of the illustration (like the standard deviation)
represents increments of the standard error of difference.
Probability of Difference this large or
Larger Occurring as a Result of Chance
probabilities are; .25 .125 .025 .005 .0005
probabilities are displayed in the illustration above (see Sampling
distribution from the racial attitudes study) at the bottom of the chart
Finding the z
score associated with the probabilities
There are at
least two ways of determining the z score associated with the above
Look up the z
score in a table in the back of a stats text book. To do this you will need to
subtract the probabilities above from .5000 to find the correct z score which
will give you the proportions
.25 .375 .475
Look these up
in a table in the back of a sts text book to find the z scores listed below
plug oin the following probability figures .25 .125 .025 .005 .0005
Using the web
Plug in the following probabilities .25 .125 .025 .005 .0005 into the shaded
area of the 3rd applet and click the above or below button
The following z
scores are associated
.67 1.15 1.96
between Sample Means
are placed between the z scores and probabilities (see illustration above) (see
Sampling distribution from the racial attitudes study)
z Score For
Observed Mean Difference
This score is
compared in the chart with the statistics in the Sampling Distribution Of
Differences Between Means to determine whether the difference is significant.
z score method (z Score For Observed Mean Difference ) (No Charting)
determine the z Score For Observed Mean Difference go to the 2nd
applet from the web reference below, and plug in the z score to determine the
proportion above the z score to determine proportion occurring by chance.
11.7 An Analysis of Potential
significance level is the probability that the null hypothesis will be rejected
in error when it is true (a decision known as a Type I error). The significance
of a result is also called its p-value; the smaller the p-value, the more
significant the result is said to be.
first glance, the idea of adopting a significance level of 5% seems
preposterous to some who argue for greater certainty. How about using a level of significance of one in a million,
which reduces uncertainty to almost nothing. It is true that adopting the .05
level of significance leaves some room for mistaking a chance difference for a
real difference. Lowering the level of significance will reduce the probability
of this kind of mistake, but it increases the probability of another kind.
Uncertainty about the conclusion will remain. In this section, we will discuss
the two kinds of mistakes that are possible. You will be able to pick up some
hints on reducing uncertainty, but if you agree to draw a sample, you agree to
accept some uncertainty about the results.
Null Hypothesis when it is true. The probability of a Type I error is
symbolized by (alpha).
Null Hypothesis when it is false. The probability of a Type II error is
symbolized by (beta)
are already somewhat familiar with from your study of level of significance. When the .05 level of
significance is adopted, the experimenter concludes that and event with p<
.05 is not due to chance. The experimenter could be wrong; if so, a Type I
error has been made. The probability of a Type I error--is controlled by the level of significance you adopt.
A proper way to think of and a Type I error is
in terms of “in the long run” (see illustration above) (see Sampling distribution from the racial
attitudes study) is a theoretical sampling distribution of mean differences. It
is a picture of repeated sampling (that is, the long run). All those
differences came from sample means that were drawn from the same population,
but some differences were so large they could be expected to occur only 5
percent of the time. In an experiment, however, you have only one difference,
which is based on your two sample means. If this difference is so large that
you conclude that there are two populations whose means are not equal, you may
have made a Type I error. However, the probability of such an error is not more
calculation of is a more complicated matter. For one thing, a Type II error can
be committed only when the two populations have different means. Naturally, the
farther apart the means are, the more likely you are to detect it, and thus the
lower is. We will discuss
other factors that affect in the last section.
“How to reject the Null Hypothesis.”
The general relationship between and is an inverse one. As
goes down, goes up. That is, if
you insist on a larger difference between means before you call the difference
nonchance, you are less likely to detect a real nonchance difference if it is
small. The illustration below demonstrates this relationship.
Illustration Frequency distribution of
raw scores when H0 is false
illustration above is a picture of two populations. Since these are
populations, the “truth” is that the mean of the experimental group is four
points higher than that of the control group. Such “truth” is available only in
hypothetical examples in textbooks. In the real world of experimentation you do
not know population parameters. This example, however, should help you
understand the relation of to . If a sample is drawn from each population, there is only
one correct decision: reject H0. However, will the investigator make
the correct decision? Would a difference of four be expected between sample
means from Populations A and B (14-10=4)? To evaluate the probability of a
difference of four, see if it falls in the critical region of the sampling
distribution of mean differences, shown in the illustration below. (We
arbitrarily picked this sampling distribution so we could illustrate the points
Illustration Sampling distribution of
differences between means from Populations A and B if H0 were true
you can see in the illustration above, a difference of 4 score points would be expected 4.56 percent of the time. If
had been set at .05,
you would correctly reject H0, since the probability of the obtained difference
(.0456) is less than .05. However, if had been set at .01,
you would not reject H0, since the obtained probability
(.0456) is not less than .01. Failure to reject H0 in this case is a Type II
this point, we can return to our discussion of setting the significance level.
The suggestion was “Why not reduce the significance level to one in a million?”
From the analysis of the potential mistakes, you can answer that when you
decrease , you increase . So protection from one error is
traded for liability to another kind of error.
Most persons who use statistics as a
tool set (usually at .05) and let fall where it may. The actual calculation of , although important, is beyond the scope of this discussion.
11.8 One-Tailed and Two-Tailed Tests
we discussed the fact that in practice it is usual to choose one of three
possible alternative hypotheses before the data are gathered.
12= This hypothesis simply says that the
population means differ but makes no statement about the direction of the
Here, the hypothesis is made that the mean of the first population is greater
than the mean of the second population
mean of the first population is smaller than the mean of the second population
far in this section, you have been working with the first H1.You
have tested the null hypothesis, 1=2, against the alternative hypothesis 12. The null hypothesis was rejected when you found large positive
or large negative deviations (1<2).
When was set at .05, the
.05 was divided into .025 in each tail of the sampling distribution, as seen in the illustration below.
a similar way, you found the probability of a difference by multiplying by 2
the probability obtained from the z score. With such a test, you can reject H0
and accept either of the possible alternative hypotheses, 1 2 or 1 2.
This is called a two-tailed test of significance, for reasons that should be
obvious from the illustration above.
however, an investigator is concerned only with deviations in one direction;
that is, the alternative hypothesis of interest is either 1 2 or 1 2. In
either case, a one-tailed test is appropriate. The illustration below is a
p;icture of the sampling distribution for a one-tailed test, for 1 2.
a one-tailed test, the critical region is all in one end of the sampling
distribution. The only outcome that allows you to reject H0 is one
in which 1 is so much larger than 2 that the z score is 1.65 or more. Notice in the above
illustration that if you are running a one-tailed test there is no way to
conclude that 1 is less than 2, even if 2
is many times the size of 1.
In a one-tailed test, you are interested in only one kind of difference.
One-tailed tests are usually used when an investigator knows a great deal about
the particular research area or when practical reasons dictate an interest in
establishing 1 2 but
not 1 2.
is some controversy about the use of one or two-tailed test. When in doubt use
a two-tailed test. The decision to use a one-tailed or a two-tailed test should
be made before the data are gathered.
11.9 Significant Results and Important Results
word “significant” has a precise technical meaning in statistics and other
meanings in other contexts.
study that has statistically significant results may or may not have important
results. You have to decide about the importance without the help of
To Reject the Null Hypothesis
11.10.1 To reject H0 is to be left with
only one alternative, H1, from which a conclusion can be drawn. To retain H0 is
to be left up in the air. You don’t know whether the null hypothesis is really
true or whether it is false and you just failed to detect it. So, if you are
going to design and run an experiment, you should maximise your chances of
rejecting H0. There are three factors to consider actual difference, standard
error, and .
11.10.2 In order to get this discussion
out of the realm of the hypothetical and into the realm of the practical,
consider the following problem. Supposing you want to select a research project
which seeks to reject H0. You decide to try to show that widgets are different
from controls. Accept for a moment the idea that widgets are different-that H0
should be rejected. What are the factors that determine whether you will
conclude from your experiment that widgets are different?
the actual difference between widgets and controls, the more likely you are to
reject H0. There is a practical limit, though. If the difference is too large,
other people will call your experiment trivial, saying that it demonstrates the
obvious and that anyone can see that widgets are different. On the other hand,
small differences can be difficult to detect. Pre-experiment estimations of
actual differences are usually based on your own experience.
Standard Error of a Difference
You can see
that as gets smaller, z gets larger, and you are more
likely to reject H0. This is true, of course, only if widgets are really
different from controls. Here are two ways you can reduce the size of .
The larger the sample, the smaller the standard error of the
Illustration) This illustration shows that the larger the sample size,
the smaller the standard error of the mean. The same relationship is true for
the standard error of a difference.
Some Texts 
show you how to calculate the sample size required to reject H0. In order to do
this calculation, you must make assumptions about the size of the actual
difference. Many times, the size of the sample is dictated by practical
consideration-time, money, or the availability of widgets.
Reducing the variability in the sample will produce a smaller . You
can reduce variability by using reliable measuring instruments, recording data
correctly, and, in short, reducing the “noise” or random error in your
The larger is, the more likely
you are to reject H0. The limit to this factor is your colleagues’ sneer when
you report that widgets are “significantly different at the .40 level.”
Everyone believes that such differences should be attributed to chance.
Sometimes practical considerations may permit the use of =.10. If wegetws and controls both could be used to treat a
deadly illness and both have the same side effects, but “widgets are
significantly better at the .10 level,” then widgets will be used. (Also, more
data will then be gathered [sample size increased] to see whether the
difference between widgets and controls is reliable.)
11.10.3 We will close this section on
how to reject the null hypothesis by telling you that these three factors are
discussed in intermediate-level texts under the topic power. The power of a
statistical test is defined as 1-. The more powerful the test, the more likely it is to detect
any actual difference between widgets and controls.
The techniques you have learned so for
require the use of the normal distribution to assess probabilities. These
probabilities will be accurate if you have used in your calculations or if N is so large that
s is a reliable estimate of . In
this section, you will learn about a distribution that will give you accurate
probabilities when you do not know and N is not large. The logic you have used,
however, will be used again. That is, you assume the null hypothesis, draw
random samples, introduce the independent variable, and calculate a mean
difference on the dependent variable. If these differences cannot be attributed
to chance, reject the null hypothesis and interpret the results.
this point you may suspect that the normal curve is an indispensable part of
modern statistical living. Up until now, in this tract, it has been. However,
in the next sections you will encounter several sampling distributions, none of
which is normal, but all of which can be used to determine the probability that
a particular event occurred by chance. Deciding which distribution to use is
not a difficult task but it does require some practise. Remember that a
theoretical distribution is accurate if the assumptions on which it is based
are true for the data from the experiment. By knowing the assumptions a
distribution requires and the nature of your data, you can pick an appropriate
This section is about a theoretical
distribution called the t distribution. The t is a lowercase one; capital T has
entirely different meanings. The t distribution is used to find answers to the
four kinds of problems listed below. The t distribution is used when is not known and
sample sizes are too small to ensure that s is a reliable estimate or . Problems 1, 2, and 4 are problems of hypothesis testing.
Problem 3 requires the establishment of a confidence interval.
Did a sample with a mean come from a population with a mean ?
Did two samples, with means 1
come from the same population?
is the confidence interval about the difference between two sample means?
a Pearson product-moment correlation coefficient, based on sample data, come
from a population with a true correlation of .00 for the two variables?
W. S. Gosset (1876-1937) invented the
t distribution in 1908 after he was hired in 1899 by Arthur Guinness, Son &
Company, a brewery in Dublin, Ireland to determine if a new strain of barley,
developed by botanical scientists, had a greater yield than the old barley
more information, see "Gosset, W. S.," in Dictionary of National
Biography, 1931-40, London: Oxford University Press, 1949, or L. McMullen
& E. S. Pearson, "William Sealy Gosset, 1876-1937," Biometrika, 1939,205-253.
Prohibited by the company to publish
in Biometrika a journal founded in 1901 by Francis Galton, Gosset published his
new mathematical statistics under the pseudonym “Student” which became known as
“Student’s t.” (No one seems to know why the letter t was chosen. E. S.
Pearson surmises that t was simply a "free letter"-that is, no
one had yet used t to designate a statistic.) Since he worked for the
Guinness Company all his life, Gosset continued to use the pseudonym
"Student" for his publications in mathematical statistics. Gosset was
very devoted to his company, working hard and rising through the ranks. He was
appointed head brewer a few months before his death in 1937.
Gosset was confronted with the problem
of gathering, in a limited amount of time, data about the brewing process. He
recognized that the sample sizes were so small that s was not an accurate
estimate of and thus the normal-curve
model was not appropriate. After working out the mathematics of distributions
based on s, which is a statistic and, therefore, variable, rather than on , which is a parameter and, therefore, constant, Gosset found that
the theoretical distribution depended upon sample size, a different
distribution for each N. These distributions make up a family of curves
that have come to be called the t distribution.
In Gosset's work, you again see how a
practical question forced the development of a statistical tool. (Remember that
Francis Galton invented the concept of the correlation coefficient in order to
assess the degree to which characteristics of fathers are found in their
sons.) In Gosset's case, an example of a practical question was "Will this
new strain of barley, developed by the botanical scientists, have a greater
yield than our old standard?" Such questions were answered with data from
experiments carried out on the ten farms maintained by the Guinness Company in
the principal barley-growing
regions of Ireland. A typical experiment might involve two one-acre plots (one
planted with the old barley, one with the new) on each of the ten farms. Gosset
then was confronted with ten one-acre yields for the old barley and ten for the
new. Was the difference in yields due to sampling fluctuation, or was it a
reliable difference between the two strains? He made the decision using his
newly derived t distribution.
will describe some characteristics of the t distribution and then
compare t with the normal distribution. The following two sections are
on hypothesis testing: one section on samples that are independent of each
other and one on samples that are correlated. Next, you will use the t distribution
to establish confidence intervals about a mean difference. Then you will learn
the assumptions that are required if you choose to use a t test to
analyse your data. Finally, you will learn how to determine whether a
correlation coefficient is statistically significant. Problems 1-4, mentioned
above, will be dealt with in order.
The t Distribution
Rather than just one t distribution,
there are many t distributions. In fact, there is a t distribution
for each sample size from 1 to . These different t distributions are described as having
different degrees of freedom, and there is a different t distribution
for each degree of freedom. Degrees of freedom is abbreviated df (which
is a simple symbol; do not multiply d times f). We'll start with
a definition of degrees of freedom as sample size minus 1. Thus, df =
N - 1. If the sample consists of 12 members, df = 11.
Figure 9.1 is a picture of four of
these t distributions, each based on a different number of degrees of
freedom. You can see that, as the degrees of freedom become fewer, a larger
proportion of the curve is contained in the tails.
You know from your work with the
normal curve that a theoretical distribution is used to determine a probability
and that, on the basis of the probability; the null-hypothesis is retained or
rejected. You will be glad to learn that the logic of using the t distribution
to make a decision is just like the logic of using the normal distribution.
Recall that z is normally distributed. You
probably also recall that, if z = 1. 96, the chances are only 5 in 100 that the mean came from the population with mean .
In a similar way, if the samples are
small, you can calculate a t value from the formula
The number of degrees of freedom (df)
determines which t distribution is appropriate, and from it you can
find a t value that would be expected to occur by chance 5 times in 100.
Figure 9.2 separates the t distributions of Figure 9.1. The t values
in Figure 9.2 are those associated with the interval that contains 95 percent
of the cases, leaving 2.5 percent in each tail. Look at each of the four
If you looked at Figure 9.2 carefully,
you may have been suspicious that the t distribution for df = is a normal curve. It
is. As df approaches the t
distribution approaches the normal distribution. When df = 30, the t distribution
is almost normal. Now you understand why we repeatedly cautioned, in chapters
that used the normal curve, that N must be at least 30 (unless you know or that the
distribution of the population is symmetrical). Even when N = 30, the t
distribution is more accurate than the normal distribution for assessing
probabilities and so, in most research studies (that use samples), t is
used rather than z.
A reasonable question now is
"Where did those t values of 4.30, 2.26,
2.06, and 1.96 come from?" The answer is Table D. Table D is
really a condensed version of 34 t distributions. Look at Table D and
note that there are 34 different degrees of freedom in the left-hand column.
12.2.12 Table D
Across the top under “ Levels for Two-Tailed Test" you will see six
selected probability figures, .20, .10, .05, .02, .01, and .001.
the .05 column down to df = 2, 9, 25, and and you will find t values of 4.30, 2.26, 2.06, and 1.96.
D differs in several ways from the normal-curve table. In the normal-curve
table, the z scores are on the margin of the table and the probability figures
are in the body of
12.2.16 Illustration of Normal
12.2.18 In the t-distribution table, the
opposite is true; the t
values are in the body of the table and the
probability figures are on the top and bottom margins. Also, in the
normal-curve table, you can find the exact probability of any z score;
in Table D, the exact probability is given for only six t values. These six are
commonly chosen as levels by experimenters. Finally, if you
wish to conduct a one-tailed test, use the probability figures shown under that
heading at the bottom of Table D. Note that the probability figures are one-half
those for a two-tailed test. You might draw a t distribution, put in
values for a two-tailed test, and see for yourself that reducing the
probability figure by
one-half is appropriate for a one-tailed test.
As a general rule, researchers run
two-tailed tests. If a one-tailed test is used, a justification is usually
given. In this text we will routinely use two-tailed tests.
use student's t distribution to decide whether a particular sample mean
came from a particular population.
A Belgian, Adolphe Quetelet ('Ka-tle)
(1796-1874), is regarded as the first person to recognize that social and
biological measurements may be distributed according to the "normal law of
error" (the normal distribution). Quetelet made this discovery while
developing actuarial (life expectancy) tables for a Brussels life insurance
company. Later, he began making anthropometric (body) measurements and, in
1836, he developed Quetelet's Index (QI), a ratio in which weight in grams was
divided by height in centimetres. This index was supposed to permit evaluation
of a person's nutritional status: very large numbers indicated obesity and
very small numbers indicated starvation.
Suppose a present-day anthropologist
read that Quetelet had found a mean QI value of 375 on the entire population of
French army conscripts. No standard deviation was given because it had not yet
been invented. Our anthropologist, wondering if there has been a change during
the last hundred years, obtains a random sample of 20 present day Frenchmen who
have just been inducted into the Army. She finds a mean of 400 and a standard
deviation of 60. One now familiar question remains, "Should this mean
increase of 25 QI points be attributed to chance or not?" To answer this
question, we will perform a t test. As usual, we will require p .05 to reject
chance as an explanation.
t Formula Logic
Upon looking in Table D under the
column for a two-tailed test with = .05 at the
row for 19 df, you'll find a t value of 2.09. Our
anthropologist's t is less than 2.09 so the null hypothesis should be
retained and the difference between present-day soldiers and those of old
should be attributed to chance.
Index is not currently used by anthropologists. There were several later
attempts to develop a more reliable index of nutrition and most of those
attempts were successful. Some of Quetelet's ideas are
still around, though. For example, it was from Quetelet, it seems, that Francis
Galton got the idea that the phenomenon of genius could be treated
mathematically, an idea that led to correlation. (Galton seems to turn up in
many stories about important concepts.)
Degrees of Freedom
The number of degrees of freedom is
always equal to the number of observations minus the number of necessary
relations obtaining among these observations OR The number of degrees of
freedom is equal to the number of original observations minus the number of
parameters estimated from the observations
You have been determining “degrees of
freedom" by a rule-of-thumb technique: N - 1. Now it is time for us
to explain the concept more thoroughly, in order to prepare you for
statistical techniques in which df N - 1.
It is somewhat difficult to obtain an
intuitive understanding of the concept of degrees of freedom without the use
of mathematics. If the following explanation leaves you scratching your head,
you might read Helen Walker's 
excellent article in the Journal of Educational Psychology (Walker,
The freedom in degrees of
freedom refers to freedom of a number to have any possible value. If you
were asked to pick two numbers, and there were no restrictions, both numbers
would be free to vary (take any value) and you would have two degrees of
freedom. If, however, a restriction is imposed-namely, that X = 20-one degree of freedom is lost because of that
restriction. That is, when you now pick the two numbers, only one of them is
free to vary. As an example, if you choose 3 for the first number, the second
number must be 17. The second number is not free to vary, because of the
restriction that X = 20.
In a similar way, if you were to pick
five numbers, with a restriction that X = 20, you would have four degrees of freedom. Once four numbers
are chosen (say, -5,3, 16, and 8), the last number (-2) is determined.
restriction that X = 20 may seem to you to be an "out-of-the-blue"
example and unrelated
to your earlier work in statistics; in a way it is, but some of the statistics
you have calculated have had a similar restriction built in. For example, when
you found s, as required in the formula for t,
you used some algebraic version of
Standard Error of Mean for a Sample
The restriction that is built in is
that (X - X) is always zero and, in order to meet that requirement, one of the X's
is determined. All X's are free to vary except one, and the degrees of freedom
for s is N - 1. Thus, for the
problem of using the t distribution to determine whether a sample came from a
population with a mean , df =
N - 1. Walker (1940) summarizes the reasoning
above by stating: "A universal rule holds: The number of degrees of
freedom is always equal to the number of observations minus the number of
necessary relations obtaining among these observations. " A necessary
relationship for s is that (X - X) = O. Another
way of stating this rule is that the number of degrees of freedom is equal to
the number of original observations minus the number of parameters estimated
from the observations. In the case of s, one degree of freedom is subtracted because is used as an estimate of .
we switch from the question of whether a sample came from a population
with a mean, , to the more common question of whether two samples
came from populations with identical means. That is, the mean of one group is
compared with the mean of another group, and the difference is attributed to
chance (null hypothesis retained) or to a treatment (null hypothesis
there are two kinds of two-groups designs. With an independent samples design, the subjects serve in only
one of the two groups, and there is no reason to believe that there is any correlation
between the scores of the two groups. With a correlated-samples design, there is a correlational relationship between
the scores of the two
groups. The difference between these designs is important because the calculation
of the t value for independent samples is different from the calculation
for correlated samples. You may not be able to tell which design has been used
just by looking at the numbers; instead, you must be able to identify the
design from the description of the procedures in the experiment. The design
dictates which formula for t to use. The purpose of both designs,
however, is to determine the probability that the two samples have a common population mean.
Clue to the Future
Most of the rest of this chapter is
organized around independent-samples and correlated-samples designs.
Three-fourths of Chapter 15 (Nonparametric Statistics) is also organized around
these "two designs. In Chapters 12 (Analysis of Variance: One-Way
Classification) and 13 (Analysis of Variance: Factorial Design), though, the procedures you will learn are appropriate only for
Correlated-samples experiments are
designed so that there are pairs of scores. One member of the pair is in one
group, and the other member is in the second group. For example, you might ask
whether fathers are shorter than their sons (or more religious, or more
racially prejudiced, or whatever).
The null hypothesis is fathers = sons. In this design, there is a logical pairing of father and son
scores, as seen in Table 9.1. Sometimes the researcher pairs up two subjects on
some objective basis. Subjects with similar grade-point averages may be paired,
and then one assigned to the experimental group and one to the control group.
A third example of a correlated-samples design is a before-and-after
experiment, with the dependent variable measured before and after the same
treatment. Again, pairing is appropriate: the' 'before" score is paired
with the "after" score for each individual.
Did you notice that Table 9.1 is the
same as Table 5.1, which outlined the basic requirement for the calculation of
a correlation coefficient? As you will soon see, that correlation coefficient
is a part of determining whether fathers = sons.
In the independent-samples design,
the subjects are often assigned randomly to one of the two groups, and there is
no logical reason to pair a score in one group with a score in the other group.
The independent-samples design corresponds to the experimental design outlined
in Table 8.1.
An example of an independent-samples design is shown in Table 9.2. The null
hypothesis to be tested is experimental = control.
Both of these designs utilize random
sampling, but, with an independent-samples design, the subjects are randomly
selected from a population of individuals. In a correlated-samples
design, pairs are randomly selected from a population of pairs.
the t Distribution for Independent Samples
experiments in this section are similar to those in Chapter 10, except that now
you are confronted with data for which the normal curve is not appropriate
because N is too small. As before, the two samples are independent of
each other. "Independent" means that there is no relationship between
the groups before the independent variable is introduced. Independence is often
achieved by random assignment of subjects to one or the other of the groups.
Some textbooks express this lack of relationship by calling this design a
"noncorrelated design" or an "uncorrelated design. "
Using the t distribution to
test a hypothesis is very similar to using the normal distribution. The null
hypothesis is that the two populations have the same mean, and thus any
difference between the two sample means is due to chance. The t distribution
tells you the probability that the difference you observe is due to chance if
the null hypothesis is true. You simply establish an level, and if your
observed difference is less probable than , reject the null hypothesis and conclude that the two means
came from populations with different means. If your observed difference is more
probable than , retain the null hypothesis. Does this sound familiar? We
The way to find the probability of the
observed difference is to use a t test. The probability of the resulting
t value can be found in Table D. For an independent-samples design, the
formula for the t test is
Independent-samples t Test
12.5.5 The t test, like many other statistical tests, is a ratio of
a statistic over a measure of variability. 1 - 2 is a statistic and, of course,
S1- 2 is a measure of variability. You have seen this basic form before and you will see it again. .
9.3 shows several formulas for calculating , S1- 2. Use formulas in the top half of the table
when the two samples have an unequal number of scores. In the special situation where N1 = N2, the formulas simplify into those shown in
the bottom half of Table 9.3. The deviation-score
formulas are included in case you have to solve a problem without a calculator.
If you have a calculator, you can work the problems more quickly by using the
formula for degrees of freedom for independent samples is df = N1 + N2 - 2. The reasoning is as follows. For each
sample, the number of degrees of freedom is N - 1, since,
for each sample, (X - ) = O. Thus, the total degrees of
freedom is (N1 - 1) + (N2 - 1) = N1 + N2 - 2.
Here is an example of an experiment in
which the results were analysed with an independent-samples t test. Thirteen
monkeys were randomly assigned to either an experimental group (drug) or a
control group (placebo). (Monkey research is very expensive, so experiments are carried out with
small N's. Thus, small sample statistical techniques are a must.) The experimental group (N = 7) was given the drug for eight days, while the control group (N = 6) was given a placebo (an inert substance). After eight days of injections, training began
on a complex problem-solving task. Training and shots were continued for six days,
after which the number of errors was tabulated. The number of errors each
animal made and the t test are presented in Table 9.4.
12.5.11 Figure 9-3
12.5.12 The null hypothesis is that the drug made no difference-that the
difference obtained was due just to chance. Since the N's are unequal
for the two samples, the longer formula for the standard error must be used.
Consulting Table D
for 11 df, you'll find that a t = 2.20 is required in order to
reject the null hypothesis with a = .05. Since the obtained t =
-2.99, reject the null hypothesis. The final (and perhaps most important) step
is to interpret the results. Since the experimental group, on the average, made
fewer errors (39.71 vs. 57.33), we may conclude that the drug treatment
facilitated learning. We will often express tabled t values as t.O5
(11 df) = 2.20. This gives you the critical value of t (2.20) for a
particular df (11) and level of significance ( = .05).)
that the absolute value of the obtained t ( |t| = |- 2.99 | =
2.99) is larger than the
tabled t (2.20). In order to reject the null hypothesis, the absolute
value of the obtained t must be as great as, or greater than, the tabled
t. The larger the obtained | t I, the smaller the probability
that the difference between means occurred by chance. Figure 9.3 should help
you see why this is so. Notice in Figure 9.3 that, as the values of | t I
become larger, less and less of the area of the curve remains in the tails of
the distribution. Remember that the area under the curve is a probability.
Recall that we have been conducting a
two-tailed test. That is, the probability figure for a particular t value
is the probability of + t or larger plus the probability of - t or
smaller. In Figure 9.3, t ,05 (11 df) = 2.201. This means that,
if the null hypothesis is true, a t value of +2.201 would occur 2 1/2
percent of the time and a t value of -2.201 would occur 2 1/2 percent of
If you are working these problems with
paper and pencil, Table A, "Squares, Square Roots, and Reciprocals,"
will be an aid to you. For example, 1/7 + 1/6 is easily converted into .143 +
.167 with the reciprocals column; I/N.
Adding decimals is easier than adding fractions.
Formulas and Procedure
Standard error of the difference
Raw Score Formulas
Deviation Score Formulas
Raw Score Formulas
Deviation Score Formulas
error of the difference between means
the t Distribution for Correlated Samples (Some texts use the term dependent samples instead of correlated sample)s
A correlated-samples design may
come about in a number of ways. Fortunately, the actual arithmetic in
calculating a t value is the same for any of the three correlated
samples designs. The three types of designs are natural pairs, matched pairs, and repeated
natural-pairs investigation, the experimenter does not assign the subjects to
one group or the other-the pairing occurs prior to the investigation. Table 9.1 identifies one way in which natural pairs may
occur-father and son. Problems 8 and 13 describe experiments utilizing natural
In some situations, the experimenter has control over the ways pairs are
formed. Matched pairs can be formed in several ways. One way is for two
subjects to be paired on the basis of similar scores on a pretest that is related to the dependent
variable. For example, a hypnotic susceptibility test
might be given to a group of subjects. Two examples of hypnotic suggestibility
pre-tests are ;
 Subjects with similar scores could be paired
and then one member of each pair randomly assigned to either the experimental
or control group. The result is two groups equivalent in hypnotizability.
Another variation of matched pairs is
the split-litter technique used with nonhuman animals. Half of a litter is
assigned randomly to each group. In this way, the genetics of one group
is matched with that of the other. The same technique has been used in human
experiments with twins or siblings. Student's barley experiments and the
experiment described in Problem 9 are examples of starting with two similar subjects
and assigning them at random to one of two treatments.
Still another example of the
matched-pairs technique is the treatment of each member of the control group
according to what happens to its paired member in the experimental group.
Because of the forced correspondence, this is called .a yoked control design.
Problem 11 describes a yoked-control design.
The difference between the
matched-pairs design and a natural-pairs design is that, with the matched
pairs, the investigator can randomly assign one member of the pair to a
treatment. In the natural-pairs design, the investigator has no control over
assignment. Although the statistics are the same, the natural-pairs design is
usually open to more interpretations than the matched-pairs design.
A third kind of correlated-samples
design is called a repeated-measures design because more than one measure is
taken on each subject. This design often takes the form of a before and-after
experiment. A pretest is given, some treatment is administered, and a
post-test is given. The mean of the scores on the post-test is compared with
the mean of the scores on the pretest to determine the effectiveness of the
treatment. Clearly, there are two scores that should be paired: the pretest and
the post-test scores of each subject. In such an experiment, each person is
said to serve as his or her own control. .
three of these methods of forming groups have one thing in common: a meaningful
correlation may be calculated for the data. The name correlated samples comes
from this fact. With a correlated-samples design, one variable is designated X,
the other Y.
Calculating a t Value for Correlated
. The formula for t when the data come from correlated samples
has a familiar theme: a difference between means divided by the standard error
of the difference. The standard error of the difference between means of
correlated samples is symbolized . One formula for a
t test between correlated samples is
df=N-1, where N= the number of pairs
number of degrees of freedom in a correlated-samples case is the number of
pairs minus one. Although each pair has two values, once one value is
other is restricted to a similar
value. (After all, they are called correlated samples.) In addition,
another degree of freedom is subtracted when is calculated. This loss is similar to
the loss of 1 df when s is
As you can see by comparing the
denominator of the correlated-samples t test with that of the
t test on for independent samples (when N1 =N2),
the difference lies in the term 2rxy (S)(S). Of
course, when rxy = 0, this term drops out of the
formula, and the standard error is the same as for independent samples.
Also notice what happens to the
standard-error term in the correlated-samples case where r > 0: the
standard error is reduced. Such a reduction will increase the size of t. Whether
this reduction will increase the likelihood of rejecting the null
hypothesis depends on how much t is increased, since the degrees of freedom
in a correlated samples design are fewer than in the independent-samples
The formula = is
used only for illustration purposes. There is an algebraically equivalent but
arithmetically easier calculation called the direct-difference method,
which does not require you to calculate r. To find the by the direct-difference method, find the
difference between each pair of scores, calculate the standard deviation
of these difference scores, and divide the standard deviation by the square
root of the number of pairs.
find a t value using the direct-difference method,
value using Direct Difference Method
Here is an example of a
correlated-samples design and a t-test analysis. Suppose you were
interested in the effects of interracial contact on racial attitudes. You have
a fairly reliable test of racial attitudes, in which high scores
indicate more positive attitudes. You administer the test one Monday morning
to a biracial group of fourteen 12year-old boys who do not know each
other but who have signed up for a weeklong community day camp. The campers
then spend the next week taking nature walks, playing ball, eating lunch,
swimming, and doing the kinds of things that camp directors dream up to keep
12-year-old boys busy. On Saturday morning, the boys are again given the
racial-attitude test. Thus, the data consist of 14 pairs of before-and-after
scores. The null hypothesis is that the mean of the population of
"before" scores .is equal to the mean of the population of
"after" scores or, in terms of the specific experiment, that a week
of interracial contact has no effect on racial attitudes.
Suppose the data in Table 9.5 were
obtained. We will set = .01 and perform the analysis. Using the sum of the D
and D2 columns in Table 9.5, we can find .
Since t.01 (13 df)
= 3.01, this
difference is significant beyond the .01 level. That is, p < .01, The
"after" mean was larger than the "before" mean; therefore,
we may conclude that, after the week of camp, racial attitudes were
significantly more positive than before.
might note that - = , the
mean of the difference scores. In the problem above, D = -8l and N = 14, so = D/N = -81i14 = -5.78.
preferred the correlated-samples design. In his agriculture experiments, he
found a significant correlation between the yields of the old barley and the
new barley grown on adjacent plots. This correlation reduced the standard-error
term in the denominator of the t test, making the correlated-samples
design more sensitive than the independent-samples design for detecting a
difference between means.
Formula (Illustration formula)
=standard error of the difference between
correlated means (direct-difference method)
N=number of pairs
Error of Mean (see formula below)
between X & Y
Standard Error of Mean
Defined Standard Error of Mean
s or s=standard error of the mean of X or Y
deviation of a sample
Error of Mean
Determine the standard deviation of X scores
Determine the square root of the total number of scores
Divide the product of step #1 (standard deviation of X scores)
by the product of step #2 (square root of the number of X scores)
Determine the standard deviation of Y scores
Determine the square root of the total number of scores
Divide the product of step #1 (standard deviation of Y scores)
by the product of step #2 (square root of the number of Y scores)
(multiply it by itself)
(multiply it by itself)
Add Squared s to
Determine the (Correlation between X & Y)
Multiply the by 2
Multiply s by s
Multiply the product of step #6 (s Xs s) by the
product of step #5 ( Xs 2)
Subtract the product of step #7 (( Xs
2) Xs (s Xs s)) from
the product of step #3 (Squared s +
Obtain the square root of step #8 to obtain the score
Computation Formula (Direct-Difference
standard error of the difference between correlated means (direct-difference
deviation of the distribution of differences between correlated scores
N=Number of pairs of scores
Standard deviation of the distribution of differences between correlated scores
Create a column with the difference between the means. That is
find the difference between each pretest and posttest score (minus the posttest from the pretest) and put that number
in a column
Create a column with the squared differences between the
means. That is multiply the difference between the means by itself
Sum the column of squared differences (the column created in
Sum the column of differences (step1) and square the sum
(multiply it by itself). Then divide this product by the number of score pairs.
Minus the product of the previous step (step 4) from the sum
of the squared differences (step 3)
Take the number of score pairs and minus 1 from that number
Divide the product of step 5 by the product of step 6 to
determine the () score
Find the difference between and
Obtain the square root of the number of score pairs
Divide by the product of step 2 to obtain the t
Using the t Distribution to Establish
a Confidence Interval about a Mean Difference
involves using the t Distribution to establish a confidence interval about a
an upper and lower limit of the difference between the means usually with a 95%
degree of confidence which would still allow for the rejection of the null
As you probably recall from Chapter 9
and Sampling Distributions), a confidence interval is a range of
values within which a parameter is expected to be. A confidence interval is established
for a specified degree of confidence, usually 95 percent or 99 percent.
In this section, you will learn how to
establish a confidence interval about a mean difference. The problems here are
similar to those dealt with in Chapter 9, except that
will be established with the t distribution rather than with the normal distribution.
2. The parameter
of interest is a difference between two population means rather than a population mean.
The first point can be dispensed
with rather quickly. You have already practiced using the t distribution
to establish probabilities; you will use Table D in this section, too.
second point will require a little more explanation. The questions you have
been answering so far in this chapter have been hypothesis-testing questions,
of the form "Does1 -2 =0?" You answered each question by drawing two samples, calculating the means, and finding
the difference. If the probability of the difference was very small, the hypothesis H0: 1 -2 =0 was rejected. Suppose you have rejected the null hypothesis but someone
wants more information than that and asks, “What is the real difference between
2?" The person recognizes that the real difference is not zero but
wonders what it is. You are being asked to make an estimate of 1 -2. You establish
a confidence interval about the difference between 1 and 2 or and , you can state with a specified degree of
confidence that 1 -2 falls within the interval.
Confidence Intervals for Independent
The sampling distribution of 1 - 2 is a t distribution with N1 + N 2 - 2 degrees of freedom. The lower and upper limits
of the confidence interval about a mean difference are found with the following
Interval Upper and Lower Limits
a 95 percent confidence interval, use the t value in Table D associated with =.
05. For 99 percent confidence change to .01.
an example, we will use the calculations you worked up in Problem 16 on the
time required to do problems on the two different brands of desk
calculators. We will establish a 95 percent confidence interval about the
As your calculations revealed,
Thus, .65 and 2.35 are the lower and
upper limits of a 95 percent confidence interval for the mean difference
between the two kinds of calculators.
One of the benefits of establishing a
confidence interval about a mean difference is that you also test the null hypothesis, 1 -2 =0, in the process (see
Natrella, 1960). If 0 were
outside the confidence interval, then the null hypothesis would be rejected
using hypothesis-testing procedures. In the example we just worked, the
confidence interval was .65 to 2.35 minutes; a value of 0 falls outside this
interval. Thus, we can reject H0: 1 -2 =0 at the .05 level.
Sometimes, hypothesis testing is not
sufficient and the extra information of confidence intervals is desirable. Here
is one example of how this “extra information" on confidence intervals
might be put to work in this calculator-purchasing problem. Suppose that the
new brand is faster, but it is also more expensive. Is it still a better buy?
procedures, the purchasing agent can show that, given a machine life of five years, a
reduction of time per problem of 1.7 minutes justifies the increased cost. If
she has the confidence interval you just worked out, she can see immediately
that such a difference in machines (1.7 minutes) is within the confidence
interval. The new machines are the better buy.
Confidence Intervals for Correlated
The sampling distribution of - is also a t distribution. The
number of degrees of freedom is N - 1. As in the section on hypothesis
testing of correlated samples, N is the number of pairs of
scores. The lower and upper limits of the confidence interval about a mean
difference between correlated samples are
Interval Correlated Samples
A word of caution is appropriate here.
For confidence intervals for either independent or correlated samples, use a t
value from Table D, not one calculated from the data.
The interpretation of a confidence
interval about a difference between means is very similar to the interpretation
you made of confidence intervals about a sample mean. Again, the method is such
that repeated sampling from two populations will produce a series of
confidence intervals, 95 (or 99) percent of which will contain the true difference
between the population means. You have sampled only once so the proper
interpretation is that you are 95 (or 99) percent confident that the true
difference falls between your lower and upper limits. It would probably be
helpful to you to reread the material on interpreting a confidence interval
about a mean, (Confidence Intervals).
Use the t
score from the table at alpha .05
(( (mean)- (mean))+t*()
standard error of the difference between correlated means (direct-difference
of Y scores
(t)=This is the t value form the back of a statistics textbook
(t distribution table) or from a t value calculator from the Web
pairs of scores
df=the degrees of freedom for this equation is N-1
Confidence Interval Calculation
Subtract the Mean of Y scores from the Mean of X scores
Multiply by the t score found in the table. Look
across from the degrees of freedom (N-1) and under the alpha level .05. .02,
Add the product of step #1 to the product of step #2 for the
upper limit confidence interval
Confidence Interval Calculation
Subtract the Mean of Y scores from the Mean of X scores
Multiply by the t score found in the table. Look
across from the degrees of freedom (N-1) and under the alpha level .05. .02,
Subtract the product of step #1 to the product of step #2 for
the lower limit confidence interval
Assumptions for Using the t
can perform a t test on the difference between means on any two-group
data you have or any that you can beg, borrow, buy, or steal. No doubt about
it, you can easily come up with a t value using
Independent-samples t Test
You can then attach a probability
figure to your t value by deciding that the t distribution is an
appropriate model of your empirical situation.
In a similar way, you can calculate a
confidence interval about the difference between means in any two-group
experiment. By deciding that the t distribution is an accurate model,
you can claim you are “99 percent confident that the true difference between
the population means is between thus and so."
But should you decide to use the t distribution?
When is it an accurate reflection of the empirical probabilities?
The t distribution will give
correct results when the assumptions it is based on are true for the
populations being analysed. The t distribution, like the normal curve,
is a theoretical distribution. In deriving the t distribution,
mathematical statisticians make three assumptions.
dependent-variable scores for both populations are nonnal1y distributed.
variances of the dependent-variable scores for the two populations are equal.
on the dependent variable are random samples from the population.
Assumption 3 requires three
explanations. First, in a correlated-samples design, the pairs of scores
should be random samples from the population you are interested in.
Second, Assumption 3 ensures that any
sampling errors will fall equally into both groups and that you may generalize
from sample to population. Many times it is a physical impossibility to sample
randomly from the population. In these cases, you should randomly assign the
subjects available to one of the two groups. This will randomise errors, but
your generalization to the population will be on less secure grounds than if
you had obtained a truly random sample.
Third, Assumption 3 ensures the independence
of the scores. That is, knowing one score within a group does not help you
predict other scores in that same group. Either random sampling from the
population or random assignment of subjects to groups-will serve to achieve
Now we can return to the major
question of this section: "When will the t distribution produce
accurate probabilities?" The answer is "When random samples are obtained
from populations that are normally distributed and have equal variances. "
This may appear to be a tall order. It
is, and in practice no one is able to demonstrate these characteristics
exactly. The next question becomes “Suppose I am not sure my data have these
characteristics. Am I likely to reach the wrong conclusion if I use Table
D?" The answer to this question, fortunately, is "No."
The t test is a
"robust" test, which means that the t distribution leads to
fairly accurate probabilities, even when the data do not meet Assumptions 1 and
2. Boneau (1960) used a
computer to generate distributions when these two assumptions were violated.
For the most part, he found that, even if the populations violate the assumptions,
the t distribution reflects the actual probabilities. Boneau's most
serious warning is
that, when sample sizes are different (for example, N1 = 5 and N2 = 15), then a large violation
of Assumption 2 (for example, one variance being four times the size of the
other) produces a t value for which the tabled t distribution is
a poor model. Under such
circumstances, you may reject H0 when you
15 will give you other statistics with other distributions that you
may use to test the difference between two samples when the first two
assumptions of the t test are not valid.
Using the t Distribution to Test the
Significance of a Correlation Coefficient
In Chapter 5, you learned to
calculate Pearson product-moment correlation coefficients. This section is on
testing the statistical significance of these coefficients. The question is
whether an obtained r, based on a sample, could have come from a
population of pairs of scores for which the parameter correlation is .00. The
answer to this question is based on the size of a t value that is
calculated from the correlation coefficient. The t value is found using
(t) Value Using Correlation
The null hypothesis is that the
population correlation is .00. Samples are drawn, and an r is
calculated. The t distribution is then used to determine whether the
obtained r is significantly different from .00.
an example, suppose you had obtained an r = .40 with 22 pairs of scores.
Does such a correlation indicate a
significant relationship between the two variables, or should it be attributed
(t) Value Example
Table D shows that, for 20 df, a
t value of 2.09 is required to reject the null hypothesis. The obtained t
for r = .40, where N = 22, is less than the tabled t, so
the null hypothesis is retained. That is, a coefficient of .40 would be
expected by chance alone more than 5 times in 100.
In fact, for N = 22, an r =
.43 is required for significance at the .05 level and an r = .54 for the
.01 level. As you can see, even medium-sized correlations can be expected by
chance alone for samples as small as 22. Most researchers strive for N's of
30 or more for correlation problems.
you may wish to determine whether the difference between two correlations
is statistically significant. Several texts discuss this test (Ferguson, 1976,
p. 184 
and Guilford & Fruchter, 1978, p. 163) .
This test assesses whether the means
of two groups are statistically different from one another. The t-test could be
used to assess the effectiveness of a treatment by comparing the means of the
treatment and control groups or alternately to compare the means of the same
group pre and post treatment to assess the effectiveness of treatment. In any
case this test is indicated when you want to compare the means of two groups
especially in the analysis for the posttest-only two-group
randomized experimental design.
T-Test for the Significance of the
Difference between the Means of Two Correlated Samples
You could substitute the control group
mean with pre treatment group mean and the treatment group with the post
treatment group me