Statistics Discussion

Copyright © October 2004 Ted Nissen



1       Introduction. 1

2       Review and More Introduction. 6

3       Central Values and the Organization of Data. 19

4       VARIABILITY. 27

5       Correlation and Regression. 29



8       Theoretical Distributions Including the Normal Distribution. 41

9       Samples and Sampling Distributions. 48

10         Differences between Means. 62

11         The t Distribution and the t-Test 80

12         Analysis of Variance: One-Way Classification. 116

13         Analysis of Variance: Factorial Design. 137

14         The Chi Square Distribution. 162

15         Nonparametric Statistics. 163

16         Vista Formulas and Analysis. 186

17         Hypothesis. 197

18         Summary. 197

19         Glossary. 197



1      Introduction


1.2   Statistics Definition

1.2.1   Algebra and Statistics                 Algebra is a generalization of arithmetic in which letters representing numbers are combined according to the rules of arithmetic                 The product of an algebraic expression, which combines several scores, is a statistic.[1]

1.2.2   Descriptive Statistic                  

1.2.3   Inferential Statistics                  

1.3   Purpose of Statistics


1.4   Terminology


1.4.2   Populations, Samples, and Subsamples                 A population consists of all members of some specified group. Actually, in statistics, a population consists of the measurements on the members and not the members themselves. A sample is a subset of a population. A subsample is a subset of a sample. A population is arbitrarily defined by the investigator and includes all relevant cases.                 Investigators are always interested in some population. Populations are often so large that not all the members can be measured. The investigator must often resort to measuring a sample that is small enough to be manageable but still representative of the population.                 Samples are often divided into subsamples and relationships among the subsamples determined. The investigator would then look for similarities or differences among the subsamples.                 Resorting to the use of samples and subsamples introduces some uncertainty into the conclusions because different samples from the same population nearly always differ from one another in some respects. Inferential statistics are used to determine whether or not such differences should be attributed to chance.

1.4.3   Parameters and Statistics                 A parameter is some numerical characteristic of a population. A statistic is some numerical characteristic of a sample or subsample. A parameter is constant; it does not change unless the population itself changes. There is only one number that is the mean of the population; however, it often cannot be computed, because the population is too large to be measured. Statistics are used as estimates of parameters, although, as we suggested above, a statistic tends to differ from one sample to another. If you have five samples from the same population, you will probably have five different sample means. Remember that parameters are constant; statistics are variable.

1.4.4   Variables                 A variable is something that exists in more than one amount or in more than one form. Memory is a variable. The Wechsler Memory Scale is used to measure people’s memory ability, and variation is found among the memory scores of any group of people. The essence of measurement is the assignment of numbers on the basis of variation.                 Most variables can be classified as quantitative variables. When a quantitative variable is measured, the scores tell you something about the amount or degree of the variable. At the very least, a larger score indicates more of the variable than a smaller score does.                 A score has a range consisting of an upper limit and lower limit, which defines the range. For example, 103=102.5-103.5, the numbers 102.5 and 103.5 are called the lower limit and the upper limit of the score. The idea is that a score can take any fractional value between 102.5 and 103.5, but all scores in that range are rounded off to 103.                 Some variables are qualitative variables. With such variables, the scores (number) are simple used as names; they do not have quantitative meaning. For example, political affiliation is a qualitative variable.

1.5   Scales of Measurement

1.5.1   Introduction                 Numbers mean different things in different situations. Numbers are assigned to objects according to rules. You need to distinguish clearly between the thing you are interested in and the number that symbol9izes or stands for the thing. For example, you have had lots of experience with the numbers 2 and 4. You can state immediately that 4 is twice as much as 2. That statement is correct if you are dealing with numbers themselves, but it may or may not be true when those numbers are symbols for things. The statement is true if the numbers refer to apples; four apples are twice as many as two apples. The statement is not true if the numbers refer to the order that runners finish in a race. Fourth place is not twice anything in relation to second place-not twice as slow or twice as far behind the first-place runner. The point is that the numbers 2 and 4 are used to refer to both apples and finish places in a race, but the numbers mean different things in those two situations.                 S. S. Stevens (1946)[2] identified four different measurement scales that help distinguish different kings of situation in which numbers are assigned to objects. The four scales are; nominal, ordinal, interval, and ratio.

1.5.2   Nominal Scale                 Numbers are used simply as names and have no real quantitative value. It is the scale used for qualitative variables. Numerals on sports uniforms are an example; here, 45 is different from 32, but that is about all we can say. The person represented by 45 is not “more than” the person represented by 32, and certainly it would be meaningless to try to add 45 and 32. Designating different colors, different sexes, or different political parties by numbers will produce nominal scales. With a nominal scale, you can even reassign the numbers and still maintain the original meaning, which as only that the numbered things differ. All things that are alike must have the same number.

1.5.3   Ordinal Scale                 An ordinal scale, has the characteristic of the nominal scale (different numbers mean different things) plus the characteristic of indicating “greater than” or “less than”. In the ordinal scale, the object with the number 3 has less or more of something than the object with the number 5. Finish places in a race are an example of an ordinal scale. The runners finish in rank order, with “1” assigned to the winner, “2” to the runner-up, and so on. Here, 1 means less time than 2. Other examples of ordinal scales are house number, Government Service ranks like GS-5 and GS-7, and statements like “She is a better mathematician than he is.”

1.5.4   Interval Scale                 The interval scale has properties of both the ordinal and nominal scales, plus the additional property that intervals between the numbers are equal. “Equal interval” means that the distance between the things represented by  ”2” and “3” is the same as the distance between the things represented by “3” and “4”. The centigrade thermometer is based on an interval scale. The difference is temperature between 10° and 20° is the same as the difference between 40° and 50°. The centigrade thermometer, like all interval scales, has an arbitrary zero point. On the centigrade, this zero point is the freezing point of water at sea level. Zero degrees on this scale does not mean the complete absence of heat; it is simply a convenient starting point. With interval data, we have one restriction; we may not make simple ratio statements. We may not say that 100° is twice as hot as 50° or that a person with an IQ of 60 is half as intelligent as a person with an IQ of 120.

1.5.5   Ratio Scale                 The fourth kind of scale, the ratio scale, has all the characteristics of the nominal, ordinal, interval scales, plus one: it has a true zero point, which indicates a complete absence of the thing measured. On a ratio scale, zero means “none”. Height, weight, and time are measured with ratio scales. Zero height, zero weight, and zero time mean thaqt no amount of these variables is present. With a true zero point, you can make ratio statements like “16 kilograms is four times heavier than 4 kilograms.”

1.5.6   Conclusion                 Having illustrated with examples the distinctions among these four scales-it is sometimes difficult to classify the variables used in the social and behavioural sciences. Very often they appear to fall between the ordinal and interval scales. It may happen that a score provides more information than simply rank, but equal intervals cannot be proved. Intelligence test scores are an example. In such cases, researchers generally treat the data as if they were based on an interval scale.                 The main reason why this section on scales of measurement is important is that the kind of descriptive statistics you can compute on your numbers depends to some extent upon the kind of scale of measurement the numbers represent. For example, it is not meaningful to compute a mean on nominal data such as the numbers on football players’ jerseys. If the quarterback’s number is 12 and a running back’s number is 23, the mean of the two numbers (17.5) has no meaning at all.

1.6   Statistics and Experimental Design

1.6.1   Introduction                 Statistics involves the manipulation of numbers and the conclusions based on those manipulations. Experimental design deals with how to get the numbers in the first place.

1.6.2   Independent and Dependent Variables                 In the design of a typical simple experiment, the experimenter is interested in the effect that one variable (called the independent variable) has on some other variable (called the dependent variable). Much research is designed to discover cause-and-effect relationships. In such research, differences in the independent variable are the presumed cause for differences in the dependent variable. The experimenter chooses values for the independent variable, administers a different value of the independent variable to each group of subjects, and then measures the dependent variable for each subject. If the scores on the dependent variable differ as a result of differences in the independent variable, the experimenter may be able to conclude that there is a cause-and-effect relationship.

1.6.3   Extraneous (Confounding) Variables                 One of the problems with drawing cause-and-effect conclusions is that you must be sure that changes in the scores on the dependent variable are the result of changes in the independent variable and not the result of changes in some other variables. Variables other than the independent variable that can cause changes in the dependent variable are called extraneous variables.                 It is important, then, that experimenters be aware of and control extraneous variables that might influence their results. The simplest way to control an extraneous variable is to be sure all subjects are equal on that variable.                 Independent variables are often referred to as treatments because the experimenter frequently asks “If I treat this group of subjects this way and treat another group another way, will there be a difference in their behaviour?” The ways that the subjects are treated constitute the levels of the independent variable being studied, and experiments typically have two or more levels.

1.7   Brief History of Statistics


2      Review and More Introduction

2.1   Review of Fundamentals

2.1.1   This section is designed to provide you with a quick review of the rules of arith­metic and simple algebra. We recommend that you work the problems as you come to them, keeping the answers covered while you work. We assume that you once knew all these rules and procedures but that you need to refresh your memory. Thus, we do not include much explanation. For a textbook that does include basic explanations, see Helen M. Walker.[3]

2.1.2   Definitions                 Sum     The answer to an addition problem is called a sum. In Chapter 12, you will calculate a sum of squares, a quantity that is obtained by adding together some squared numbers.                 Difference    The answer to a subtraction problem is called a difference. Much of what you will learn in statistics deals with differences and the extent to which they are significant. In Chapter 10, you will encounter a statistic called the standard error of a difference. Obvi­ously, this statistic involves subtraction.                 Product     The answer to a multiplication problem is called a product. Chapter 7 is about the product-moment correlation coefficient, which requires multiplication. Multiplication problems are indicated either by an x or by parentheses. Thus, 6 x 4 and (6)(4) call for the same operation.                 Quotient    The answer to a division problem is called a quotient. The IQ or intelligence quotient is based on the division of two numbers. The two ways to indicate a division prob­lem are  and —. Thus, 9  4 and 9/4 call for the same operation. It is a good idea to think of any common fraction as a division problem. The numerator is to be divided by the denominator.

2.1.3   Decimals                 Addition and Subtraction of Decimals.    There is only one rule about the addition and sub­traction of numbers that have decimals: keep .the decimal points in a vertical line. The deci­mal point in the answer goes directly below those in the problem. This rule is illustrated in the five problems below.    Example #1                 Multiplication of Decimals    The basic rule for multiplying decimals is that the number of decimal places in the answer is found by adding up the number of decimal places in the two numbers that are being multiplied. To place the decimal point in the product, count from the right.    Example #2                 Division of Decimals    Two methods have been used to teach division of decimals. The older method required the student to move the decimal in the divisor (the number you are dividing by) enough places to the right to make the divisor a whole number. The decimal in the dividend was then moved to the right the same number of places, and division was carried out in the usual way. The new decimal places were identified with carets, and the decimal place in the quotient was just above the caret in the dividend. For example,    Example # 3a&b    The newer method of teaching the division of decimals is to multiply both the divisor and the dividend by the number that will make both of them whole numbers. (Actually, this is the way the caret method works also.) For example:    Example #4    Both of these methods work. Use the one you are more familiar with.                  

2.1.4   Fractions                 In general, there are two ways to deal with fractions    Convert the fraction to a decimal and perform the operations on the decimals    Work directly with the fractions, using a set of rules for each operation. The rule for addition and subtraction is: convert the fractions to ones with common denominators, add or subtract the numerators, and place the result over the common denominator. The rule for multiplication is: multiply the numerators together to get the numerator of the answer, and multiply the denominators together for the denominator of the answer. The rule for division is: invert the divisor and multiply the fractions.    For statistics problems, it is usually easier to convert the fractions to decimals and then work with the decimals. Therefore, this is the method that we will illustrate. However, if you are a whiz at working directly with fractions, by all means continue with your method. To convert a fraction to a decimal, divide the lower number into the upper one. Thus, 3/4 = .75, and 13/17 = .765    Examples Fractions 

2.1.5   Negative Numbers                 Addition of Negative numbers    Any number without a sign is understood to be positive    To add a series of negative numbers, add the numbers in the usual way, and attach a negative sign to the total    Example #1    To add two numbers, one positive and one negative, subtract the smaller number from the larger and attach the sign of the larger to the result    Example #2    To add a series of numbers, of which some are positive and some negative, add all the positive numbers together, all the negative numbers together (see above) and then combine the two sums (see above)    Example #3                 Subtraction of Negative Numbers    To subtract a negative number, change it to positive and add it. Thus    Example #4                 Multiplication of Negative Numbers    When the two numbers to be multiplied are both negative, the product is positive    (-3)(-3)=9 (-6)(-8)=48    When one of the number is negative and the other is positive, the product is negative    (-8)(3)=-24 14 X –2= -28                 Division of Negative Numbers    The rule in division is the same as the rule in multiplication. If the two numbers are both negative, the quotient is positive    (-10)  (-2)=-5 (-4)  (-20)= .20    If one number is negative and the other positive, the quotient is negative    (-10)  2= -5   6  (-18)=  -.33    14  (-7)= -2   (-12)  3=  -4

2.1.6   Proportions and Percents                 A proportion is a part of a whole and can be expressed as a fraction or as a deci­mal. Usually, proportions are expressed as decimals. If eight students in a class of 44 received A's, we may express 8 as a proportion of the whole (44). Thus, 8/44, or .18. The proportion that received A's is .18.                 To convert a proportion to a percent (per one hundred), multiply by 100. Thus: .18 x 100 = 18; 18 percent of the students received A's. As You can see proportions and percents are two ways to express the same idea.                 If you know a proportion (or percent) and the size of the original whole, you can find the number that the proportion represents. If .28 of the students were absent due to illness, and there are 50 students in all, then. 28 of the 50 were absent. (.28)(50) = 14 students who were absent. Here are some more examples.                 Example Proportions and Percents   

2.1.7   Absolute Value                 The absolute value of a number ignores the sign of the number. Thus, the absolute value of -6 is 6. This is expressed with symbols as |-6| = 6. It is expressed verbally as "the absolute value of negative six is six. " In a similar way, the absolute value of 4 - 7 is 3. That is, |4 – 7| = | - 3| = 3.

2.1.8   * Problems                 A * sign ("plus or minus" sign) means to both add and subtract. A * problem always has two answers.                 Example Plus-Minus Problems   

2.1.9   Exponents                 .In the expression 52, 2 is the exponent. The 2 means that 5 is to be multiplied by itself. Thus, 52 = 5 x 5 = 25.                 In elementary statistics, the only exponent used is 2, but it will be used frequently. When a number has an exponent of 2, the number is said to be squared. The expression 42 (pronounced "four squared") means 4 x4, and the product is 16. The squares of whole numbers between 1 and 1000 can be found in Tables in the Appendix of most stats text books.                 Example Exponents   

2.1.10            Complex Expressions             Two rules will suffice for the kinds of complex expressions encountered in statistics.                  Perform the operations within the parentheses first. If there are brackets in the expression, perform the operations within the parentheses and then the operations within the brackets.                  Perform the operations in the numerator and those in the denominator separately, and finally, carry out the division.                  Example 5               

2.1.11            Simple Algebra             To solve a simple algebra problem, isolate the unknown (x) on one side of the equal sign and combine the numbers on the other side. To do this, remember that you can multiply or divide both sides of the equation by the same number without affecting the value of the unknown. For example,             Example 6 a & b                              In a similar way, the same number can be added to or subtracted from both sides of the equation without affecting the value of the unknown.             Example 7                      We will combine some of these steps in the problems we will work for you. Be sure you see shat operation is being performed on both sides in each step             Example 8                 

2.2   Rules, Symbols, and Shortcuts

2.2.1   Rounding Numbers                 There are two parts to the rule for rounding a number. If the digit that is to be dropped is less than 5, simply drop it. If the digit to be dropped is 5 or greater, increase the number to the left of it by one. These are the rules built into most electronic calcula­tors. These two rules are illustrated below                 Example 9 a & B                 A reasonable question is "How many decimal places should an answer in statis­tics have?" A good rule of thumb in statistics is to carry all operations to three decimal places and then, for the final answer, round back to two decimal places.                 Sometimes this rule of thumb could get you into trouble, though. For example, if half way through some work you had a division problem of .0016  .0074, and if you dutifully rounded those four decimals to three (.002  .007), you would get an answer of .2857, which becomes .29. However, division without rounding gives you an an­swer of .2162 or .22. The difference between .22 and .29 may be quite substantial. We will often give you cues if more than two decimal places are necessary but you will always need to be alert to the problems of rounding.

2.2.2   Square Roots                 Statistics problems often require that a square root be found. Three possible solu­tions to this problem are    A calculator with a square-root key    The paper-and-pencil method    Use a Table the back of a statistics book.                 Of the three, a calculator provides the quickest and simplest way to find a square root. If you have a calculator, you're set. The paper -and-pencil method is tedious and error prone, so we will not discuss it. We'll describe the use of Tables and we recom­mend that you use it if you don't have access to a calculator.                 Three Digit Numbers    If you need the square root of a three-digit number (000 to 999), a table will give it to you directly. Simply look in the left-hand column for the number and ad the square root in the third column, under . For example; the square root of 225 is 15.00, and  = 8.37. Square roots are usually carried (or rounded) to two decimal places.                 Numbers between 0 and 10    For numbers between 0 and 10 that have two deci­mal places (.01 to 9.99), The tables will give you the square root. Find your number in the left-hand column by thinking of its decimal point as two places to the right. Find the square root in the  column by moving the decimal point one place to the left. For example, = 1.50. Be sure you understand how these square roots were found:  = 2.52,  = .66, and  = .28.                 Numbers between 10 and 1000 That Have Decimals    For numbers between 10 and 1000 with decimals interpolation is necessary. To interpolate a value for , find a value that is half way (.5 of the distance) between  and . Thus, the square root of 22.5 will be (approximately) half way between 4.69 and 4.80, which is 4.74. For a second example, we will find .  = 9.17, and  =9.22.  will be .35 into the interval between  and . That interval is .05 (9.22 - 9.17). Thus' (.35)(.05) = .02, and = 9.17 + .02 = 9.19. Interpolation is also necessary with numbers between 100 and 1000 that have decimals; these can usually be estimated rather quickly because the difference between the square roots of the whole numbers is so small. Look at the difference between  and , for example.                 Numbers Larger Than 1000    For numbers larger than 1000, the square root can be estimated fairly closely by using the second column in Table A (N 2). Find the large number under N2, and read the square root from the N column. For example,  = 123, and  =34. Most large numbers you encounter will not be found in the N2 column, and you will just have to estimate the square root as closely as possible.

2.2.3   Reciprocals                 This section is about a professional shortcut. This shortcut is efficient if multipli­cation is easier for you than division. If you prefer to divide rather than multiply, skip this section.                 A reciprocal of a number (N) is 1/N. Multiplying a number by 1/N is equivalent to dividing it by N. For example, 82 = 8 x (1/2) = 8 x .5 = 4.0; 25 -7- 5 = 25 x (1/5) = 25 x .20 = 5.0. These examples are easy, but we can also illustrate with more difficult problems; 541 98 = 541 x (1/98) = 541 x .0102 = 5.52. So far, this should be clear, but there should be one nagging question. How did we know that 1/98 = .01O2? The answer is the versatile Table A. Table A contains a column 1/N, and, by looking up 98, you will find that 1/98 = .0102.                             If you must do many division problems on paper, we recommend reciprocals to you. If you have access to an electronic calculator, on the other hand, you won't need the reciprocals in Table A.

2.2.4   Estimating Answers                 Just looking at a problem and making an estimate of the answer before you do any calculating is a very good idea. This is referred to as eyeballing the data and Edward Minium (1978) has captured its importance with Minium's First Law of Statistics: "The eyeball is the statistician's most powerful instrument."                             Estimating answers should keep you from making gross errors, such as misplacing a decimal point. For example, 31.5/5 can be estimated as a little more than 6 If you make this estimate before you divide, you are likely to recognize that an answer of 63.or .63 is incorrect.                 The estimated answer to the problem (21)(108) is 2000, since (20)(100) = 2000.                 The problem (.47)(.20) suggests an estimated answer of .10, since (1/2)(.20) = .10. With .10 in mind, you are not likely to write.94 for the answer, which is .094. Esti­mating answers is also important if you are finding a square root. You can estimate that  is about 10, since  = 10;  is about 1.                 To calculate a mean, eyeball the numbers and estimate the mean. If you estimate a mean of 30 for a group of numbers that are primarily in the 20s, 30s, and 40s, a cal­culated mean of 60 should arouse your suspicion that you have made an error.

2.2.5   Statistical Symbols                 Although as far as we know, there has never been a clinical case of neoicono­phobia (An extreme and unreasonable fear of new symbols) some students show a mild form of this behavior. Symbols like , , and  may cause a grimace, a frown, or a droopy eyelid. In more severe cases, the behavior involves avoiding a statistics course entirely. We're rather sure that you don't have such a severe case, since you have read this far. Even so, if you are a typical beginning student in statistics, symbols like (, ,  and  are not very meaningful to you, and they may even elicit feelings, of uneasiness. We also know from our teaching experience that, by the end of the course, you will know what these symbols mean and be able to approach them with an unruffled psyche-and perhaps even approach them joyously. This section should help you over that initial, mild neoiconophobia, if you suffer from it at all.                 Below are definitions and pronunciations of the symbols used in the next two chapters. Additional symbols will be defined as they occur. Study this list until you know it.                 Symbols                 Pay careful attention to symbols. They serve as shorthand notations for the ideas and concepts you are learning. So, each time a new symbol is introduced, concentrate on it-learn it-memorize its definition and pronunciation. The more meaning a sym­bol has for you, the better you understand the concepts it represents and, of course, the easier the course will be.                 Sometimes we will need to distinguish between two different ('s or two X's. We will use subscripts, and the results will look like 1 and 2, or X1 and X2. Later, we will use subscripts other than numbers to identify a symbol. You will see x and erg. The point to learn here is that subscripts are for identification purposes only; they never indicate multiplication.  does not mean ()().                 Two additional comments-to encourage and to caution you. We encourage you to do more in this course than just read the text, work the problems, and pass the tests, however exciting that may be. We encourage you to occasionally get beyond this ele­mentary text and read journal articles or short portions of other statistics textbooks. We will indicate our recommendations with footnotes at appropriate places. The word of caution that goes with this encouragement is that reading statistics texts is like reading a Russian novel-the same characters have different names in different places. For example, the mean of a sample in some texts is symbolized M rather than , and, in some texts, S.D.,  and  are used as symbols for the standard deviation. If you ex­pect such differences, it will be less difficult for you to make the necessary translations.

3      Central Values and the Organization of Data

3.1   Summary

3.1.1   A typical or representative score from the sample population is a measure of central tendency.

3.1.2   Mode (Mo)                 The most frequently occurring score in the distribution.                 Extreme scores in the distribution do not affect the mode.

3.1.3   Median (Md)                 This score cuts the distribution of scores in half. That is half the scores in the distribution fall above the middle score and half fall below the middle score.  The steps involved in computing the median are    Rank the scores from lowest to highest    In the case of an odd number of scores pick the middle score that divides the scores so that an equal number of scores are above that score and an equal number are below that score. Example  2 5 7 8 12 14 18= 8 would be the median in the aforementioned distribution of scores.    In the case of an even number of scores pick the two middle scores which divide the scores so that an equal number of scores are above those scores and an equal number of scores are below those scores. Then add the two middle scores and divide the product by two. Example  2 5 7 8 12 14 18 20=8+12=20/2=10 would be the median in the aforementioned distribution.                 Extreme scores in the distribution do not affect the median.

3.1.4   Mean  (Average)                 The mean is the sum of scores divided by the number of scores.    Formula  In the above formula X= the sum of the scores and N= the number of scores.                 Extreme scores in the distribution will affect the mean.                 The term average is often used to describe the mean and is usually accurate. Sometimes however the word average is used to describe other measures of central tendency such as mode and median.

3.2   Introduction

3.2.1   Now that the preliminaries are out of the way, you are ready to start on the basics of descriptive statistics. The starting point is an unorganized group of scores or measures, all obtained from the same test or procedure. In an experiment, the scores are measurements on the dependent variable. Measures of central value (often called mea­sures of central tendency) give you one score or measure that represents or is typical of, the entire group You will recall that in Chapter 1 we discussed the mean (arithmetic average). This is one of the three central value statistics. Recall from Chapter 1 that for every statistic there is also a parameter. Statistics are characteristics of samples and parameters are characteristics of population. Fortunately, in the case of the mean, the calculation of the parameter is identical to the calculation of the statistic. This is not true for the standard deviation. (Chapter 4) Throughout this book, we will refer to the sample mean (a statistic) with the symbol  pronounced "ex-bar"-and to the population mean a parameter with the symbol  pronounced "mew."

3.2.2   However, a mean based on a population is interpreted differently from a mean based on a sample. For a population, there is only one mean, . Any sample, however, is only one of many possible samples, and  will vary from sample to sample. A popu­lation mean is obviously better than a sample mean, but often it is impossible to measure the entire population.  Most of the time, then, we must resort to a sample and use as an estimate of .

3.2.3   In this chapter you will learn to                 Organize data gathered on a dependent mea­sure,                 Calculate central values from the organized data and determine whether they are statistics or parameters, and                 Present the data graphically.

3.3    Finding the mean of Unorganized Data

3.3.1   Table 3.1 presents the scores of 100 fourth-grade students on an arithmetic achievement test. These scores were taken from an alphabetical list of the students' names; therefore, the scores themselves are in no meaningful order. You probably already know how to compute the mean of this set of scores. To find the mean, add the scores and divide that sum by the number of scores.

3.3.2   Formula Mean                

3.3.3   Table 3.1                

3.3.4   If these 100 scores are a population, then 39.43 would be a , but if the 100 scores are a sample from some larger population, 39.43 would be the sample mean, .

3.3.5   This mean provides a valuable bit of information. Since a score of 40 on this test is considered average (according to the test manual that accompanies it), this group of youngsters, whose mean score is 39.43, is about average in arithmetic achievement.

3.4   Arranging Scores in Descending Order and Finding the Median

3.4.1   Look again at Table 3.1. If you knew that a score of 40 were considered average, could you tell just by looking that this group is about average? Probably not. Often, in research, so many measurements are made on so many subjects that just looking at all those numbers is a mind-boggling experience. Although you can do many computations using unorganized data, it is often very helpful to organize the numbers in some way. Meaningful organization will permit you to get some general impressions about charac­teristics of the scores by simply' 'eyeballing" the data (looking at it carefully). In addi­tion, organization is almost a necessity for finding a second central value-the median.

3.4.2   One way of making some order out of the chaos in Table 3.1 is to rearrange the num­bers into a list, from highest to lowest. Table 3.2 presents this rearrangement of the arithmetic achievement scores. (It is usual in statistical tables to put the high numbers at the top and the low numbers at the bottom.) Compare the unorganized data of Table 3.1 with the rearranged data of Table 3.2. The ordering from high to low permits you to quickly gain some insights that would have been very difficult to glean from the unorganized data. For example, by simply looking at the center of the table, you get an idea of what the central value is. The highest and lowest scores are readily apparent and you get the impression that there are large differences in the achievement levels of these children. You can see that some scores (such as 44) were achieved by several people and that some (such as 33) were not achieved by anyone. All this information is gleaned simply by quickly eyeballing the rearranged data.

3.4.3   Table 3.2                

3.4.4   Error Detection                 Eyeballing data is a valuable means of avoiding large errors. If the answers you calculate differ from what you expect on the basis of eyeballing, wisdom dictates that you try to reconcile the difference. You have either overlooked something when eyeballing or made a mistake in your computations.

3.4.5   This simple rearrangement of data also permits you to find easily another central­ value statistic, which can be found only with extreme difficulty from Table 3.1. This statistic is called the median. The median is defined as the point (Note that the median, like the mean, is a point and not necessarily an actual score.) on the scale of scores above, which half the scores fall and below which half the scores fall. That is, half of the scores are larger than the median, and half are smaller. Like the mean, the sample median is calculated exactly the same as the population median. Only the interpreta­tions differ.

3.4.6   In Table 3.2 there are 100 scores; therefore, the median will be a point above which there are 50 scores and below which there are 50 scores. This point is some­where among the scores of 39. Remember from Chapter 1, that any number actually stands for a range of numbers that has a lower and upper limit. This number, 39, has a lower limit of 38.5 and an upper limit of 39.5. To find the exact median somewhere within the range of 38.5-39.5 you use a procedure called interpolation. We will give you the procedure and the reasoning that goes with it at the same time. Study it until you understand it. It will come up again.

3.4.7   There are 42 scores below 39. You will need eight more (50 - 42 = 8) scores to reach the median. Since there are ten scores of 39, you need 8/10 of them to reach the median. Assume that those ten scores of 39 are distributed evenly throughout the inter­val of 38.5 and 39.5 and that, therefore, the median is 8/10 of the way through the interval. Adding. 8 to the lower limit of the interval, 38.5, gives you 39,3, which is the median for these scores.

3.4.8   There are occasions when you will need the median of a small number of scores. In such cases, the method we have just given you will work, but it usually is not neces­sary to go through that whole procedure. For example, if N is an odd number and the middle score has a frequency of 1, then it is the median. In the five scores 2, 3, 4, 12, 15, the median is 4. If there had been more than one 4, interpolation would have to be used.

3.4.9   When N is an even number, as in the six scores 2, 3, 4, 5, 12, 15, the point dividing the scores into two equal halves will lie halfway between 4 and 5. The median, then, is 4.5. If there had been more than one 4 or 5, interpolation would have to be used. Sometimes the distance between the two middle numbers will be larger, as in the scores 2, 3, 7, 11. The same principle holds: the median is halfway between 3 and 7. One-way of finding that point is to take the mean of the two numbers: (3 + 7) / 2 = 5, which is the median.

3.4.10            There is no accepted symbol to differentiate the median of a population from the median of a sample. When we need to make this distinction, we do it with words.

3.5   The Simple Frequency Distribution

3.5.1   A more common (and often more useful) method of organizing data is to con­struct a simple frequency distribution. Table 3.3 is a simple frequency distribution for the arithmetic achievement data in Table 3.1.

3.5.2   The most efficient way to reduce unorganized data like Table 3.1 into a simple frequency distribution like Table 3.3 is to follow these steps:                 Find the highest and lowest scores. In Table 3.1, the highest score is 65 and the lowest score is 23.                 In column form, write down in descending order all possible scores between the highest score (65) and the lowest score (23). Head this column with the letter X.                             Start with the number in the upper-left-hand comer of the unorganized scores (a score of 40 in Table 3.1), draw a line through it, and place a tally mark beside 40 in your frequency distribution.                 Continue this process through all the scores.                 Count the number of tallies by each score and place that number beside the tallies in the column headed ƒ. Add up the numbers in the ƒ column to be sure they equal N You have now constructed a simple frequency distribution.                 0ften, when simple frequency distributions are presented formally, the tally marks and all scores with a frequency of zero are deleted.

3.5.3   Don't worry about the ƒ X column in Table 3.3 yet. It is not part of a simple fre­quency distribution, and we will discuss it in the next section.

3.5.4   Table 3.3                

3.6   Finding Central Values of a Simple Frequency Distribution

3.6.1   Mean                 Computation of the mean from a simple frequency distribution is illustrated in table 3.3. Remember that the numbers in the ƒ column represent the number of people making each of the scores. To get N, you must add the numbers in the f column be­cause that's where the people are represented. If you are a devotee of shortcut arith­metic, you may already have discovered or may already know the basic idea behind the procedure: multiplication is shortcut addition. In Table 3.3, the column headed ƒ X means what it says algebraically: multiply f (the number of people making a score) times X, (the score they made) for each of the scores. The reason this is done. is that everyone who made a particular score must be taken into account in the computation of the mean. Since only one person made a score of 65, multiply 1 x 65, and put a 65 in the ƒ X column. No one made a score of 64 and 0 x 64 = 0; put a zero in the ƒ X column. Since four people had scores of 55, multiply 4 x 55 to get 220. After ƒX is computed for all scores, obtain ƒX by adding up the ƒX column. Notice that ƒX in the simple frequency distribution is exactly the same as :X in Table 3.1. To compute the mean from a simple frequency distribution, use the formula                 Mean Frequency distribution   

3.6.2   Median                 The procedure for finding the median of scores arranged in a simple frequency distribution is the same as that for scores arranged in descending order, except that you must now use the frequency column to find the number of people making each score                 The median is still the point with half the scores above and half below it, and is the same point whether you start from the bottom of the distribution or from the top. If you start from the top of Table 3.3, you find that 48 people have scores of 40 or above. Two more are needed to get to 50, the halfway point in the distribution. There are ten scores of 39, and you need two of them. Thus 2/10 should be subtracted from 39.5 (the lower limit of the score of 40); 39.5 - .2 = 39.3.                 Error Detection    Calculating the median by starting from the top of the distribution will produce the same answer as calculating it by starting from the bottom.

3.6.3   Mode                 You may also find the third central-value statistic from the simple frequency distribution. This statistic is called the mode. The mode is the score made by the great­est number of people-the score with the greatest frequency.                 Distribution may have more than one mode. A bimodal distribution is one with two high frequency scores separated by one or more low frequency scores. However, although a distribution may have more than one mode, it can have only one mean and one median.                 A sample mode and a population mode are determined in the same way.                 In Table 3.3, more people had a score of 39 than any other score, so 39 is the mode. You will note, however, that it was close. Ten people scored 39, but nine scored 34 and eight scored 41. A few lucky guesses by children taking the achievement test could have caused significant changes in the mode. This instability of the mode limits its usefulness.  .

3.7   The Grouped Frequency Distribution

3.7.1   There is a way of condensing the data of Table 3.1 even further. The result of such a condensation is called a grouped frequency distribution and Table 3.4 is an example of such a distribution, again using the arithmetic achievement-test scores.5

3.7.2   A formal grouped frequency distribution does not include the tally marks or the X and ƒX columns.

3.7.3   The grouping of data began as a-way of simplifying computations in the days be­fore the invention of all these marvellous computational aids such as computers and calculators. Today, most researchers group their data only when they want to construct a graph or when N's are very large. These two occasions happen often enough to make it important for you to learn about it.

3.7.4   In the grouped frequency distribution, X values are grouped into ranges called class intervals. In Table 3.4, the entire range of scores, from 65 to 23 has been reduced to 15 class intervals, each interval covers three scores and, the size of the interval (the number of scores covered) is indicated by i. For Table 3.4, i = 3. The midpoint of each interval represents all scores in that interval for example, there were nine children who had scores of 33, 34 or 35. The midpoint of the class interval 33-35 is 34. All nine children are represented by 34. Obviously, this procedure may introduce some inaccuracy into computations; however, the amount of error introduced is usually very slight. For example, the mean computed from Table 3.4 is 39.40. The mean com­puted from ungrouped data is 39.43.        ­

3.7.5   Class intervals have upper and lower limits, much like simple scores obtained by measuring a quantitative variable. A class interval of 33-35 has a lower limit of 32.5 and an upper limit of 35.5. Similarly, a class interval of 40-49 has a lower limit of 39.5 and an upper limit of 49.5.

3.7.6   Table 3.4                

3.7.7   Establishing Class Intervals                 There are three conventions that are usually followed in establishing class inter­vals. We call them conventions because they are customs rather than hard-and-fast rules. There are two justifications for these conventions. First, they allow you to get maximum information from your data with minimum effort. Second, they provide some standardization of procedures, which aids in communication among scientists. These conventions are                 Data should be grouped into not fewer than 10 and not more than 20 class intervals.    The primary purpose of grouping data is to provide a clearer picture of trends in the data and to make computations easier. (For example, Table 3.4 shows that there are normally frequencies near the center of the distribution with fewer and fewer as the upper and lower ends of the distribution are approached. If the data are grouped into fewer than 10 intervals, such trends are not as apparent. In Table 3.5, the same scores are grouped into only five class intervals. The concentration of frequencies in the center of the distribution is not nearly so apparent.    Another reason for using at least 10 class intervals is that, as you reduce the num­ber of class intervals, the errors caused by grouping increase. With fewer than 10 class intervals, the errors may no longer be minor. For example, the mean computed from Table 3.4 was 39.40-only .03 points away from the exact mean of 39.43 computed from ungrouped data. The mean computed from Table 3.5, however, is 39.00-an error of .43 points.    On the other hand, the use of more than 20 class intervals may tend to exaggerate fluctuations in the data that are really due. to chance occurrences. . You also sacrifice much of the ease of computation, with little gain in control over errors. So, the conven­tion is: use 10 to 20 class intervals.                 The size of the class intervals (i) should be an odd number or 10 or a multiple of 10. (Some writers include i= 2 as acceptable. Some also object to the use of i= 7 or 9. In actual prac­tice, the most frequently seen i's are 3, 5, 10, and multiples of 10.)    The reason for this is simply computational ease. The midpoint of the interval is used as representative of all scores in the interval; and if i is an odd number, the mid­point will be a whole number. If hs an even number, the midpoint will be a decimal number. In the interval 12-14 (i = 3), the midpoint is the whole number 13. In an interval 12 to 15 (i = 4), the midpoint is the decimal number 13.5. However, if the range of scores is so great that you cannot include all of them in 20 groups with i = 9 or less, it is conventional to place 10 scores or a multiple of 10 in each class interval.                 Begin each class interval with a multiple of .i.­    For example, if the lowest score is 44 and i = 5, the first class interval should be 40-44 because 40 is a multiple of 5. This convention is violated fairly often. However, the practice is followed more often than not. A violation that seems to be justified oc­curs when i = 5. When the interval size is 5, it may be more convenient to begin the interval such that multiples of 5 will fall at the midpoint, since multiples of 5 are easier to manipulate. For example, an interval 23-27 has 25 as its midpoint, while an interval 25-29 has 27 as its midpoint. Multiplying by 25 is easier than multiplying by 27.    In addition to these three conventions, remember that the highest scores go at the top and the lowest scores at the bottom.

3.7.8   Converting Unorganized Data into a Grouped Frequency Distribution                 Now that you know the conventions for establishing class intervals, we will go through the steps for converting a mass of data like that in Table 3.1 into a grouped frequency distribution like Table 3.4:                 Find the highest and lowest scores. In Table 3.1, the highest score is 65, and the lowest score is 23.                 Find the range of scores by subtracting the lowest score from the highest and adding 1: 65 - 23 + I = 43. The 1 is added so that the upper limit of the highest score and the lower limit of the lowest score will be included.                 Determine i by a trial-and-error procedure. Remember that there are to be 10 to 20 class intervals and that the interval size should be odd, 10, or a multiple of 10. Dividing the range by a potential i value tells the number of class intervals that will result. For example, divid­ing the range of 43 by 5 provides a quotient of 8.60. Thus, i = 5 produces 8.6 or 9 class intervals. That does not satisfy the rule calling for at least 10 intervals, but it is close and might be acceptable. In most such cases, however it is better to use a smaller I and get a larger number of intervals. Dividing the range by 3 (43/3) gives you 14.33 or 15 class intervals. It sometimes happens that this process results in an extra class interval. This oc­curs when the lowest score is such that extra scores must be added to the bottom of the dis­tribution to start the interval with a multiple of i. For the data in Table 3.1, the most appro­priate interval size is 3, resulting in 15 class intervals.                 Begin the bottom interval with the lowest score. if it is a multiple of i. If the lowest score is not a multiple of i, begin the interval with the next lower number that is a multiple of i. In the data of Table 3.1, the lowest score, 23, is not a multiple of i. Begin the interval with 21. since it is a multiple of 3. The lowest class interval, then, is 21-23. From there on, it's easy. Simply' begin the next interval with the next number and end it such that it includes three numbers (24-26). Look at the class intervals in Table 3.4. Notice that each interval begins with a number evenly divisible by 3.                           Table 3-5                 The rest of the process is the same as for a simple frequency distribution. For each score in the unorganized data, put a tally mark beside its class interval and cross out the score. Count the tally marks and put the number into the frequency column. Add the frequency column to be sure that: ƒ= N.                 Clue to the Future    The distributions that you have been constructing are empirical distributions based on scores actually gathered in experiments. This chapter and the next two are about these empirical frequency distributions. Starting with Chapter 8, and through­out the rest of the book, you will also make use of theoretical distributions­-distributions based on mathematical formulas and logic rather than on actual obser­vations.

3.8   Finding Central Values of a Grouped Frequency Distribution

3.8.1   Mean                 The procedure for finding the mean of a grouped frequency distribution is similar to that for the simple frequency distribution. In the grouped distribution, however, the midpoint of each interval represents all the scores in the interval. Look again at Table 3.4. Notice the column headed with the letter X. The numbers in that column are the midpoints of the intervals. Assume that the scores in the interval are evenly distributed throughout the interval. Thus, X is the mean for all scores within the interval. After the X column is filled, multiply each X by its ƒ value in order to include all frequencies in that interval. Place the product in the ƒ X column. Summing the ƒ X ..column provides ƒX, which, when divided-by N, yields the mean. In terms of a formula,                 Formula   

3.8.2   Median                 Finding the median of a grouped distribution requires interpolation within the interval containing the median. We will use the data in Table 3.4 to illustrate the pro­cedure. Remember that the median is the point in the distribution that has half the fre­quencies above it and half the frequencies below it. Since N= 100, the median will have 50 frequencies above it and 50 below it. Adding frequencies from the bottom of the distribution, you find that there are 42 who scored below the interval 39-41. You need 8 more frequencies (50 - 42 = 8) to find the median. Since 23 people scored in the interval 39-41, you need 8 of these 23 frequencies or 8/23. Again, you assume that the 23 people in the interval are evenly distributed through the interval. Thus, you need the same proportion of score points in the interval as you have frequencies-that is, 8/23 or, 35 of the 3 score points in the interval. Since .35 x 3 = 1.05, you must go 1.05 score points into the interval to reach the median. Since the lower limit of the in­terval is 38.5, add 1.05 to find the median, which is 39.55. Figure 3.1 illustrates this procedure.                 In summary, the steps for finding the median in a grouped frequency distribution are as follows.                 Divide N by 2                 Starting at the bottom of the distribution, add the frequencies until you find the interval containing the median                 Subtract from N/2 the total frequencies of all intervals below the interval containing the median.                 Divide the difference found in step 3 by the number of frequencies in the interval containing the median.                 Multiply the proportion found in step 4 by i                 Add the product found in step 5 to the lower limit of the interval containing the median.      That sum is the median.                 Figure 3.1   

3.8.3   Mode                 The third central value, the mode, is the midpoint of the interval having the great­est number of frequencies. In Table 3.4, the interval 39-41 has the greatest number of frequencies-23. The midpoint of that interval, 40, is the mode.

3.9   Graphic Presentation of Data

3.9.1   In order to better communicate your findings to colleagues (and to understand them better yourself), you will often find it useful to present the results in the form of a graph. It has been said, with considerable truth, that one picture is worth a thousand words; and a graph is a type of picture. Almost any data can be presented graphically. The major purpose of a graph is to get a clear, overall picture of the data.

3.9.2   Graphs are composed of a horizontal axis (variously called the baseline, X axis or abscissa) and a vertical axis called the Y-axis or ordinate. We will take what seems to be the simplest course and use the terms X and Y.

3.9.3   We will describe two kinds of graphs. The first kind is used to present frequency distributions like those you have been constructing. Frequency polygons, Histograms, ­and bar graphs are examples of this first kind of graph. The second kind we will describe is the line graph, which is used to present the relationship between two different variables.

3.9.4   Illustration XY Axis                

3.9.5   Presenting Frequency Distributions                 Whether you use a frequency polygon, a histogram, or a bar graph to present a frequency distribution depends on the kind of variable you have measured. A frequency polygon or histogram is used for quantitative data, and the bar graph is used for qualita­tive data. It is not wrong to use a bar graph for quantitative data; but most researchers follow the rule given above. Qualitative data, however, should not be presented with a frequency polygon or a histogram. The arithmetic achievement scores (Table 3.1) are an example of quantitative data.                 Frequency Polygon    Figure 3.2 shows a-frequency polygon based on the frequency distribution in Table 3.4. We will use it to demonstrate the characteristics of all frequency polygons. On the X-axis we placed the midpoints of the class intervals. Notice that the midpoints are spaced at equal intervals, with the smallest midpoint at the left and the largest midpoint at the right. The Y-axis is labeled "Frequencies” and is also marked off into equal intervals.    Graphs are designed to "look right." They look right if the height of the figure is 60 percent to 75 percent of its length. Since the midpoints must be plotted along the X axis, you must divide the Y axis into units that will satisfy this rule. Usually, this re­quires a little juggling on your part. Darrell Huff (1954) offers an excellent demonstra­tion of the misleading effects that occur when this convention is violated.    The intersection of the X and Y axes is considered the zero point for both variables. For the Y-axis in Figure 3.2, this is indeed the case. The distance on the Y axis is the same from zero to two as from two to four, and so on. On the X axis, however, that is not the case. Here, the scale jumps from zero to 19 and then is divided into equal units of three. It is conventional to indicate a break in the measuring scale by breaking the axis with slash marks between zero and the lowest score used, as we did in Figure 3.2. It is also conventional to close a polygon at both ends by connecting the curve to the X-axis.              Each point of the frequency polygon represents two numbers; the class midpoint directly below it on the X-axis and the frequency of that class directly across from it on the Y-axis. By looking at the points in Figure 3.2, you can readily see that three people are represented by the midpoint 22, nine people by each of the midpoints 31, 34, and 37, 23 people by the midpoint 40, and so on.    The major purpose of the frequency polygon is to gain an overall view of the distribution of scores. Figure 3.2 makes it clear, for example, that the frequencies are greater for the lower scores than for the higher ones. It also illustrates rather dramati­cally that the greatest number of children scored in the center of the distribution.                 Figure 3-2                 Histogram    Figure 3.3 is a histogram constructed from the same data that were used for the frequency polygon of Figure 3.2. Researchers may choose either of these methods for a given distribution of quantitative data, but the frequency polygon is usually preferred for several reasons: it is easier to construct, gives a generally clearer picture of trends in the data, and can be used to compare different distribution, on the same graph. However frequencies are easier to read from a histogram.    Figure 3-3              Actually, the two figures are very similar. They differ only in that the histogram is made by raising bars from the X axis to the appropriate frequencies instead of plotting points above the midpoints. The width of a bar is from the lower to the upper limit of its class interval. Notice that there is no space between the bars.                 Bar Graph    The third type of graph that presents frequency distributions is the bar graph. A bar graph presents frequencies of the categories of a qualitative variable. An example of a qualitative variable is laundry detergent; the there are many different brands (types of the variable), but the brands don't tell you the order they go in, for example    With quantitative variables, the measurements of the variable impose an order on themselves. Arith­metic achievement scores of 43 and 51 tell you the order they belong in. "Tide" and "Lux" do not signify any order.    Figure 3.4 is an example of a bar graph. Notice that each bar is separated by a small space. This bar graph was constructed by a grocery store manager who had a practical problem to solve. One side of an aisle in his store was stocked with laundry detergent, and he had no more space for this kind of product. How much of the avail­able space should he allot for each brand? For one week, he kept a record of the number of boxes of each brand sold. From this frequency distribution of scores on a qualitative variable, he constructed the bar graph in Figure 3.4. (He, of course, used the names of the brands. We wouldn't dare!)    Brands E, H, and K are obviously the big sellers and should get the greatest amount of space. Brands A and D need very little space. The other brands fall between these. The grocer, of course, would probably consider the relative profits from the sale of the different brands in order to determine just how much space to allot to each. Our purpose here is only to illustrate the use of the bar graph to present qualitative data.

3.9.6   The Line Graph                 Perhaps the most frequently used graph in scientific books and journal articles is the line graph. A line graph is used to present the relationship between two variables.                 . A point on a line -graph represents the two scores made by' one person on each of the two variables. Often, the mean of a group is used rather than one person, but the idea is the same: a group with a mean score of X on one variable had a mean score of Y on the other variable. The point on the graph represents the means of that group on both variables.                 Figure 3-4 & 3-5                 Figure 3-6                 Figure 3-7                 Figure 3-8                 Figure 3.5 is an example of a line graph of the relationship between subjects scores on an anxiety test and their scores on a difficult problem-solving task. Many studies have discovered this general relationship. Notice that performance on the task is better and better for subjects with higher and higher anxiety scores up to the middle range of anxiety. But as anxiety scores continue to increase, performance scores de­crease. Chapter 5, "Correlation and Regression," will make extensive use of a version of this type of line graph.                 A variation of the line graph places performance scores on the Y-axis and some condition of training on the X-axis. Examples of such training conditions are: number of trials, hours of food deprivation, year in school, and: amount of reinforcement. The "score” on the training condition is assigned by the experimenter.­                 Figure 3.6 is a generalized learning curve with a performance measure (scores) on Y axis and number of reinforced trials on the .X axis. Early in training (after only one or two trials), perfor­mance is poor. As trials continue, performance improves rapidly at first and then more and more slowly. Finally, at the extreme right-hand portion of the graph, performance has levelled off; continued trials do not produce further changes in the scores.             A line graph, then, presents a picture of the relationship between two variables. By looking at the line, you can tell what changes take place in the Y variable as the value of the X variable changes.

3.10 Skewed Distributions

3.10.1            Look back at Table 3.4 graphed as Figure 3.2. Notice that the largest frequencies are found in the middle of the distribution. The same thing is true in Problem 3 of this chapter. These distributions are not badly skewed; they are reasonably symmetrical         ln some data, however, the largest frequencies are found at one end of the distribution rather than in the middle. Such distributions are said to he skewed.

3.10.2            The word skew is similar to the word skewer, the name of the cooking imple­ment used !n making shish kebab. A skewer is long and pointed and is thicker at one end than the other (not symmetrical). Although skewed distributions do not function like skewers (you would have a terrible time poking one through a chunk of lamb), the, name does help you remember that a skewed distribution has a thin point on one side.

3.10.3            Figures 3.7 and 3.8 are illustrations of skewed distributions. Figure 3.7 is positive skewed; the thin point is toward the high scores, and the most frequent scores are low ones. Figure 3.8 is negatively skewed; the thin point or skinny end is toward .the low scores and most frequent scores are high ones.

3.10.4            There is a mathematical of measuring the degree of skewness that is more precise than eyeballing, but it is beyond the scope of this book. However, figuring the relationship of the mean to the median is an objective way to determine the direction of the skew. When the mean is numerically smaller than the        median, there is some amount of negative skew.

3.10.5            Figure 3-9            

3.10.6            When the mean is larger than the median there is positive skew. The reason for this is that the mean is affected by the size of the numbers and is pulled in the direction of the extreme scores. The median is not influenced by the size of the scores. The relationship between the mean and the median is illustrated by Figure 3.9. The size of the difference between the mean and the median gives you an indication of how much the distribution is skewed.

3.11 The Mean, Median, and Mode Compared

3.11.1            A common question is [Which measure of central value should I use?" The general answer is "Given a choice, use the mean. " Sometimes, however, the data give you no choice. For example if the frequency distribution is for a nominal variable the mode is the only appropriate measure of central value.

3.11.2            Figure 3-10            

3.11.3            It is meaningless to find a median or to add up the scores and divide to find a mean for data based on a nominal scale. For the data from the voting-behavior experiment the mode is the only measure of central value that is meaningful. For a frequency distribution of an ordinal variable, the median or the mode is appropriate. For data based on interval or ratio data, the mean, median or mode may be used-you have a choice.

3.11.4            Even if you have interval or ratio data, there are two situations in which the mean is inappropriate because it gives an erroneous impression of the distribution. The first situation is the case of a severely skewed distribution. The following story demonstrates why the mean is inappropriate for severely skewed distributions.

3.11.5            The developer of Swampy Acres Retirement Home sites is attempting, with a computer-selected mailing list, to sell the lots in his southern paradise to northern buyers. The marks express concern that flooding might occur. The developer reassures them by explaining that the average elevation of his lots is 78.5 feet and that the water has never exceeded 25 feet in that area. On the average, he has told the truth; but this average truth is misleading. Look at the actual lay of the land in Figure 3.10 and ex­amine the frequency distribution in Table 3.6, which summarizes the picture.

3.11.6            The mean elevation as the developer said is 78.5 feet; however, only 20 lots, all on a cliff, are out of the flood zone. The other 80 lots are, on the average, under water. The mean, in this case, is misleading. In this instance, the central value that describes the typical case is the median because it is unaffected by the size of the few extreme lots on the cliff. The median elevation is 12.5 feet, well below the high-water mark.

3.11.7            Darrell Huff's delightful and informative book, How to Lie with Statistics (1954) gives a number of such examples. We heartily recommend this book to you. It provides many cautions concerning misinformation conveyed through the use of the inappro­priate statistic. A more recent and equally delightful book is Flaws and Fallacies in Statistical Thinking by Stephen Campbell (1974).

3.11.8            There is another instance that requires a median, even though you have a sym­metrical distribution. This is when the class interval with the largest (or smallest) scores is not limited. In such a case, you do not nave a midpoint and, therefore, cannot compute a mean. For example, age data are sometimes reported with the highest category as "75 and over. " The mean cannot be computed. Thus, when one or both of the extreme, class intervals is not limited, the median is the appropriate measure of central value. To reiterate: given a choice, use the mean.

3.11.9            Table 3-6            

3.12 The Mean of a Set of Means

3.12.1            Occasions arise in which means are available from several samples taken from the same population. If these means are combined, the mean of the set of means will give you the best estimate of the population parameter, . If every sample has the same N. you can compute the average mean simply by adding the means and dividing; by the number of means. If, however, the means to be averaged have varying N 's, it is essen­tial that you take into account the various sample sizes by multiplying each mean by its own N before summing. Table 3.7 illustrates this procedure. Four means. are presented, along with two hypothetical sample sizes for each mean. In the left-hand table, the four sample sizes are equal. In the right-hand table, the four sample sizes are not equal. Notice that 18.50 is the mean of the means when the separate means are simply added and the sum divided by the number of means. This gives the correct answer when the sample sizes are equal. However, when sample sizes differ, 18.50 is wrong. Each mean must be multiplied by its respective N, and the mean of the means is 17.60. When N 's are unequal, averaging the means without accounting for sample frequencies always causes an error.

3.12.2            Table 3-7            

3.12.3            Clue to The Future             In Chapter 9 you will learn a most important concept-a concept called a sampling distribution of the mean. The mean of a set of means is an inherent part of that concept.

3.13 Skewed Distributions and Measures of Central Tendency

3.13.1            Introduction             Distributions when the mean median and mode are represented graphically may demonstrate varying degrees of Skewness, which refers to the degree of asymmetry of the graphical curve.

3.13.2            Symmetrical Distribution             In a symmetrical distribution the mean, median and mode all fall in the same point.            

3.13.3            Bimodal Symmetrical Distribution             If there are two modes (bi-modal) even though the mean, median fall in the same point the two modes will represent the highest points of the distribution. This is considered a bimodal symmetrical distribution            

3.13.4            Skewed Distributions             Introduction                  In a symmetrical distribution the largest frequencies are found in the middle whereas in a skewed distribution the largest frequencies are found at one end of the distribution rather than in the middle.                  The word skew is similar to the word skewer which is long and pointed and is thicker at one end than the other (not symmetrical). A skewed distribution has a thin point on one side.                  In a positively skewed; the thin point is toward the high scores, and the most frequent scores are low ones. In the negatively skewed, the thin point or skinny end is toward the low scores, and the most frequent scores are high ones. There are mathematical ways of measuring the degree of skewness that are more precise than eyeballing, but you can figure the relationship of the mean to the median and this provides an objective way to determine the direction of the skew. When the mean is numerically smaller than the median, there is some amount of negative skew. When the mean is larger than the median there is positive skew. The reason for this is that the mean is affected by the size of the numbers and is pulled in the direction of extreme scores. The median is not influenced by the size of the scores. The relationship between the mean and the median is illustrated in the picture below. The size of the difference between the mean and the median gives you an indication of how much the distribution is skewed.             Illustration                 

3.13.5            Positively Skewed Distributions             The positively skewed distribution below demonstrates an asymmetrical pattern. In this case the mode is smaller than the median, which is smaller than the mean.                 This relationship exists between the mode, median and mean because each statistic describes the distribution differently.             The mode represents the most frequently occurring score and thus is the highest point on the X axis in a frequency distribution. The median cuts the distribution in half so that 50% of the scores are on either side.                 The mean unlike the median and mode is affected by larger scores since it is the product of the additive score values divided by their number. The mean represents the balance point in the distribution. Because of this it is drawn towards the skewness and in positively skewed towards the larger values.            

3.13.6            Negatively Skewed Distribution             This distribution is also asymmetrical but with the opposite order of the mean, median, and mode. The mean is smaller than the median, which is smaller than the mode.                 The mode which has the highest value in a frequency distribution points the skewness in a negative direction.            

3.14 The Mean of a Set of Means ()


4      START


5.1   The spread or dispersion of scores is known as variability. If the distribution of scores fall within a narrow range there is little variability. Conversely scores that vary widely connote a distribution that is highly variable. 

5.2   Range

5.2.1   The range is the difference between the largest score and the smallest score.

5.3   Standard Deviation


5.5   Standard Deviation (s) as an Estimate of Population Variability


5.5.2   Deviation Scores


5.5.4   Deviation-Score Method of Computing s from Ungrouped Data


5.5.6   Deviation-Score Method of Computing s from Grouped Data


5.5.8   The Raw-Score Method of Computing s from Ungrouped Data


5.5.10            Raw-Score Method of Computing s from Grouped Data


5.6   The Other Two Standard Deviations,  and S


5.8   Variance


5.10 z Scores

5.10.1            Introduction             You have used measures of central value and measures of variability to describe a distribution of scores. The next statistic, z, is used to describe a single score.             A z score is a mathematical way to change a raw score so that it reflects its relationship to the mean and standard deviation of its fellow scores.             Any distribution of raw scores can be converted to a distribution of z scores; for each raw score, there is a z score. Raw scores above the mean will have positive z scores; those below the mean will have negative z scores.             A z score is also called a standard score because it is a deviation score expressed in standard deviation units. It is the number of standard deviations a score is above or below the mean. A z score tells you the relative position of a raw score in the distribution. (z) scores are also used for inferential purposes. Much larger z scores may occur then.

5.10.2            Formula and Procedure             Formula                      Variables Defined                (z)=z score                S=standard deviation of a sample of scores                (x)=individual raw score                =mean of a sample                        Procedure                  Find the difference between the raw score and the mean                  Divide that difference by the standard deviation of the sample

5.10.3            Use of z Scores             (z) scores are used to compare two scores in the same distribution. They are also used to compare two scores from different distributions, even when the distributions are measuring different things.

5.11 Variance and Standard Deviation

5.11.1            Variance             S2 is the symbol for variance and is a measure of variability from the mean of the distribution of scores.                 Find the mean of the scores.             Subtract the mean from every score.             Square the results of step two.             Sum the results of step three.             Divide the results of step four by N (The number of scores)-1.             Example                Find the mean of the scores. = 50 / 5 = 10             Subtract the mean from every score. The second column above             Square the results of step two. The third column above             Sum the results of step three. 22             Divide the results of step four by N (# of scores)-1. s2 = 22 / (5-1) = 22/4=5.5             Note that the sum of column *2* is zero. This must be the case if the calculations are performed correctly up to that point.

5.11.2            Standard Deviation             S is the symbol for standard deviation and it is the square root of the variance.             The standard deviation is the preferred measure of variability.             Formula                 Example             Take the square of the variance above. Square root of 5.5=2.35

6      Correlation and Regression

6.1   Introduction

6.1.1   Sir Francis Galton (1822-1911) in England conducted some of the earliest investigations making use of statistical analysis. Galton was concerned with the general question of whether people of the same family were more alike than people of different families. Galton needed a method that would describe the degree to which, for example, heights of fathers and their sons were alike. The method he invented for this purpose is called correlation (co-relation). With it, Galton could also measure the degree to which the heights of unrelated men were alike. He could then compare these two results and thus answer his question.

6.1.2   Galton’s student Karl Pearson (1857-1936), with Galton’s aid, later developed a formula that yielded a statistic known as a correlation coefficient. Pearson’s product-moment coefficient, and other correlation coefficients based on Pearson’s work, have been widely used in statistical studies in psychology, education, sociology, medicine, and many other areas.

6.2   Concept of Correlation

6.2.1   In order to compute a correlation, you must have two variables, with values of one variable (X) paired in some logical way with values of the second variable (Y). Such an organization of data is referred to as a bivariate (two-variable) distribution.

6.2.2   Examples                 Same group of people may take two tests and the score results of both tests can be compared.                 Family relationships may be organized as bivariate distribution such as height of fathers is one variable, X and height of sons is another variable Y.

6.3   Positive Correlation

6.3.1   In the case of a positive correlation between two variables, high measurements on one variable tend to be associated with high measurements on the other and low measurements on one with low measurements on the other. In other words, the two variables vary together in the same direction. A perfect positive correlation is 1.00. A scatterplot is used to visualize this relationship with each point in the scatterplot representing a pair of scores represented on the X and Y axis of the chart. The line that runs through the points is called a regression line or “line of best fit”. When there is perfect correlation (+ -1.00), all points fall exactly on the line. When the points are scattered away from the line, correlation is less than perfect and the correlation coefficient falls between .00 (No correlation) and 1.00 (Perfect correlation). It was when Galton cast his data in the form of a scatterplot that he conceived the idea of a correlationship between the variables. It is from the term regression that we get the symbol r for correlation. Galton chose the term regression because it was descriptive of a phenomenon that he discovered in his data on inheritance. He found, for example, that tall fathers had sons somewhat shorter than themselves and that short fathers had sons somewhat taller than themselves. From such data, he conceived his “law of universal regression,” which states that there exists a tendency for each generation to regress, or move toward, the mean of the general population.

6.3.2   Today, the term regression also has a second meaning. It refers to a statistical method that is used to fit a straight line to bivariate data and to predict scores on one variable from scores on a second variable.

6.3.3   It is not necessary that the numbers on the two variables be exactly the same in order to have perfect correlation. The only requirement is that the differences between pairs of scores be all the same. The relationship must be such that all points in a scatterplot will lie on the regression line. If this requirement is met, correlation will be perfect, and an exact prediction can be made.

6.3.4   Nature, of course, is not so accommodating as to permit such perfect prediction, at least at science’s present state of knowledge. People cannot peredict their son’ heights precisely. The points do not all fall on the regression line; some miss it badly. However, as Galton found , there is some positive relationship; the correlation coefficient between father and son height is r=.50. The correlation between math and reading skills is r=.54. Predictions made from these correlations although far from perfect would be far better than a random guess.

6.4   Negative Correlation

6.4.1   Negative correlation occurs where high scores of one variable are associated with low scores of the other. The two variables thus tend to vary together but  in opposite directions. The regression line runs from the upper left of the graph to the lower right. Negative correlation could be changed to positive by changing the type of score plotted on one of the variables.

6.4.2   Perfect negative correlation exists, as does perfect positive correlation, when all points are on the regression line. The correlation coefficient in such a case is –1.00. For example, there is a perfect negative relationship between the amount of money in your checking account and the amount of money you have written check for (if you ignore service charge and deposits). As the amount of money you write checks for increases, your balance decreases by exactly the same amount.

6.4.3   Other examples of negative correlation (less than perfect) are;                 Temperature and inches of snow at the top of a mountain, measured at noon each day in May                 Hours of sunshine and inches of rainfall per day at Miami, Florida                 Number of pounds lost and number of calories consumed per day by a person on a strict diet

6.4.4   Negative correlation permits prediction in the same way that positive correlation does. With correlation, positive is not better than negative. In both cases, the size of the correlation coefficient indicates the strength of the relation ship-the larger the absolute value of the number, the stronger the relationship. The algebraic sign (+ or -) indicates the direction of the relationship.

6.5   Zero Correlation

6.5.1   A zero correlation means that there is no relationship between the two variables. High and low scores on the two variables are not associated in any predictable manner. In the case of zero correlation, the best prediction from any X score is the mean of the Y scores. The regression line, then, runs parallel to the X axis at the height of Y on the Y axis.

6.6   Computation of the correlation Coefficient


6.7   Computational Formulas


6.7.2   Blanched Formula                 This procedure requires you to find the means and standard deviations of both X and Y before computing r.                 Formula    r=(j(X(each value)*Y(each value))/N)-((X(Mean))*((Y)(Mean))/(Sx)* (Sy)                 Variables Defined    j=Sum    XY=Product of each X value multiplied by its paired Y value    X(mean)=Mean of variable X    Y(mean)=Mean of variable Y    Sx=Standard deviation of variable X    Sy= Standard deviation of variable Y    N=Number of pairs of observations                 Procedure    Multiply each paired X and Y score    Sum the products of X*Y    Divide the summed products of X*Y by the number of paired scores (N)    Multiply the mean of the X scores X(mean) by the mean of the Y scores Y(mean)    Minus the product of X(mean)*Y(mean) from the product of the division in step 3    Multiply the standard deviation of X scores  Sx by the standard deviation of Y scores Sy.    Divide the product of step 5 by the product of step 6

6.7.3   Raw Score Formula                 With this formula, you start with the raw scores and obtain r without having to compute means and standard deviations                 Formula    r=(N*(j(X(each value)*Y(each value))))-(( jX)(*( jY))/Square Root [(N*(jX2 )-( jX)2]*[ (N*(jY2 )-( jY) 2]                 Variables Defined    j=Sum    XY=Product of each X value multiplied by its paired Y value    X(mean)=Mean of variable X    Y(mean)=Mean of variable Y    N=Number of pairs of observations                 Procedure    Multiply each paired X and Y score    Sum the products of X*Y    Multiply the summed products by the number of paired observations.    Sum the X scores    Sum the Y scores    Multiply the summed X scores by the Summed Y scores    Minus the product of step 6 (Summed X scores*Summed Y scores) from the product of step 4 (summed products*N)    Square each X score (X2) and sum the products    Multiply the product of step 8 (Summed products of X*X (X2)) by the number of paired scores.                  Sum the X scores and square the product (jX*jX) or (jX) 2.                  Minus the product of step 10 ((jX) 2) from the product of step 9 (N*(jX2)                  Square each Y score (Y 2) and sum the products                  Multiply the product of step 12 (Summed products of Y*Y (Y 2)) by the number of paired scores.                  Sum the Y scores and square the product (jY *j Y) or (jY) 2.                  Minus the product of step 14 ((jY) 2) from the product of step 13 (N*(j Y 2)                  Multiply the product of step 15 [N*(j Y 2)- ((jY) 2)] by the product of step 11 [N*(j X 2)- ((jX) 2)]                  Obtain the square root of step 16 [N*(j X 2)- ((jX) 2)]* [N*(j Y 2)- ((jY) 2)]                  Divide the product of step 7 [(N*jXY)-(( jX)*( jY))] by the product of step 17 [SQUARE ROOT[N*(j X 2)- ((jX) 2)]* [N*(j Y 2)- ((jY) 2)]]

6.8   The Meaning Of r

6.8.1   (r)=is a descriptive statistic or summary index number, like the mean and standard deviation and is used to describe a set of data.

6.8.2   A correlation coefficient is a measure of the relationship between two variables. It describes a the tendency of two variables to vary together (covary); that is, it describes the tendency of high or low values of one variable to be regularly associated with either high or low values of the other variable. The absolute size of the coefficient (from 0 to 1.00) indicates the strength of that tendency to covary.

6.8.3   Illustration                

6.8.4   The above scatterplot shows the correlational relationships of r=.20, .40, .60, and .80. Notice that as the size of the correlation coefficient gets larger, the points cluster more and more closely to the regression line; that is, the envelope containing the points becomes thinner and thinner. This means that a stronger and stronger tendency to covary exists as r becomes larger and larger. It also means that predictions made about values of the Y variable from values of the X variable will be more accurate when r is larger.

6.8.5   The algebraic sign tells the direction of the covariation. When the sign is positive, high values of X are associated with high values of Y, and low values of X are associated with low values of Y. When the sign is negative, high values of X are associated with low values of Y, and low values of X are associated with high values of Y. Knowledge of the size and direction of r, then, permits some prediction of the value of one variable if the value of the other variable is known.

6.8.6   Correlation vs. Causation                 A correlation coefficient does not tell you whether or not one of the variables is causing the variation in the other. Quite possibly some third variable is responsible for the variation in both.                 A correlation coefficient alone cannot establish a causal relationship.

6.8.7   Coefficient of Determination                 This is an overall index that specifies the proportion of variance that two variables have in common.                 Formula    COD=r2                 Variables Defined    COD=Coefficient of Determination    ( r )=Pearson product-moment correlation coefficient                 Procedure    Multiply r * r (r2)                 It could be argued that the proportion of variance the two variables have in common can be attributed to the same cause. Or that this is the percentage of variance which adheres most closely to the regression line.                 Note what happens to a fairly strong correlation of .70 when it is interpreted in terms of variance. Only 49 % of the variance is held in common.                 The coefficient is useful in comparing correlation coefficients. When one compares an r of .80 with an r of .40, the tendency is to think of the .80 as being twice as high as .40, but that is not the case. Correlation coefficients are compared in terms of the amount of common variance. .802=.64, .402=.16, .64/.16=4 Thus, two variables that are correlated with r=.80 have four times as much variance as two variables correlated with r=.40

6.8.8   Practical Significance of r                 How high must a correlation coefficient be before it is of use? How low must it be before we conclude it is useless? Correlation is useful if it improves prediction over guessing. In this sense, any reliable correlation other than zero, whether positive or negative, is of some value because it will reduce to some extent the incorrect predictions that might other wise be made. Very low correlations allow little improvement over guessing in prediction. Such poor prediction usually is not worth the costs involved in practical situations. Generally, researchers are satisfied with lower correlations in theoretical work but require higher ones in practical situations. 

6.9   Correlation and Linearity

6.9.1   For r to be a meaningful statistic, the best fitting line through the scatterplot of points must be a straight line. If a curved regression line fits the data better than a straight lie, r will be low, not reflecting the true relationship between the two variables. The product-moment correlation coefficient is not appropriate as a measure of curved relationships. Special non-linear correlation techniques for such relationships do exist and are described elsewhere.[4] [5]

6.10 Other Kinds of Correlation Coefficients

6.10.1            Dichotomous Variables             Correlations may be computed on data for which one or both of the variables are dichotomous (having only two possible values). An example is the correlation of the dichotomous variable sex and the quantitative variable grade-point average.

6.10.2            Multiple correlation             Several variables can be combined, and the resulting combination can be correlated with one variable. With this technique, called multiple correlation a more precise prediction can be made. Performance in school or on the job can usually be predicted better by using several measures of a person rather than just one.

6.10.3            Partial correlation             A technique called partial correlation allows you to separate or partial out the effects of one variable from the correlation of two other variables. For example, if we want to know the true correlation between achievement-test scores in two school subjects it will probably be necessary to partial out the effects of intelligence since IQ and achievement are correlated.

6.10.4            Rho for Ranked data             Rho is used when the data are ranks rather than raw scores.

6.10.5            Non-linear correlation             If the relationship between two variables is curved rather than linear, the correlation ratio, eta gives the degree of association.

6.10.6            Intermediate-level statistic text books             The above correlation techniques are covered in intermediate level text books. [6] [7]

6.11 Correlation and Regression


6.12 Regression Equation 109


6.12.2            Formula             Y =a+bX

6.12.3            Variables Defined             Y =the Y value predicted from a particular X value (Y is pronounced “y prime”).             a=the point at which the regression line intersects the Y axis             b=the slope of the regression line--that is, the amount Y is increasing for each increase of one unit in X             X=the X value used to predict Y .             Regression Coefficients                  The symbols X and Y can be assigned arbitrarily in correlation, but, in a regression equation, Y is assigned to the variable you wish to predict. To make predictions of Y using the regression equation, you need to calculate the values of the constants a and b, which are called regression coefficients.                  Formula                b=r*(Sy/Sx)                  Variables Defined                r=correlation coefficient for X and Y                Sy =the standard deviation of the Y variable                Sx =the standard deviation of the X variable                Notice that for positive correlation b will be a positive number. For negative correlation b will be negative                  Formula                a=Y(mean)-b*X(mean)                  Variables Defined                Y(mean)=Mean of the Y scores                b=regression coefficient computed above                X(mean)=mean of the x scores

6.12.4            Procedure             Calculate b                  Divide Sy (standard deviation of y) by Sx (standard deviation of x)                  Multiply the product of step 1 (Sy/Sx) by r (correlation coefficient for X and Y)             Calculate a                  Multiply X(mean) (mean of the x scores) by b (regression coefficient computed in step 1 above)                  Minus the product of the previous step above from Y(mean)  (Mean of the Y scores)             Calculate the predicted Y score                  Multiply X (value used to predict Y) by b (calculated in step 1 above)                  Add the product of the previous step above to a (product of step 2 above)

6.12.5            Drawing a Regression Line              

6.12.6            Predicting a Y Score              

6.13 Rank Order Correlation


6.13.2            Web   


6.14 r Distribution Tables

6.14.1            Web              


7.1   A raw score does not reveal its relationship to other scores and must be transformed into a score that reveals these relationships. There are two types of score transformations; percentile ranks and linear transformations.

7.2   Purpose

7.2.1   A relationship between scores is revealed increasing the amount of information for analytical interpretation.

7.2.2   Allows two scores to be compared.

7.3   Percentile Ranks Based On The Sample

7.3.1   The percentile rank is the percentage of scores that fall below a given score.

7.3.2   Procedure                 Rate the scores from lowest to highest and determine total number of scores.   Example  33 28 29 37 31 33 25 33 29 32 35  25 28 29 29 31 32 33 33 33 35 37=Total number of scores=11                 Determine the number of scores falling below the selected score    Example Number=31  Number of scores below=4                 Determine the percentage of scores which fall below the selected score by dividing the number of scores below by the total number of scores and multiplying by 100.   Example=4/11=.364*100=36.4%                 Determine the percentage of scores which fall at the selected score by dividing that number by the total number of scores and multiplying by 100.   Example=1/11=.09009*100=9.09                 Divide the percentage of scores at the selected score by 2 and add the product to the percentage of scores below the selected score.   Example=9.09/2=4.55+36.4=40.95%   This would mean that the percentage of scores falling below the score of 31would be 40.95% and that would be the scores percentile rank.                 Brief Summary of Process   Rank the scores from lowest to highest   Add the percentage of scores that fall below the score to one-half the percentage of scores that fall at the score.   The result is the percentile rank of that score which is the percentage of scores which fall below the selected score.                 Another example=Selected Score=33   ((6/11)+((3/11)/2))*100=68.18%   This would mean that the percentage of scores falling below the score of 33 would be 68.18% and that would be the scores percentile rank.                 Example formula                 Example of the algebraic procedure applied to the selected numbers of 31 and 33.   31   33 

7.4   Percentile Ranks Based On The Normal Curve













8.6.1   Formula                 (Mean (Post Score) – Mean (Pre Score))/(Standard Deviation (Pre Score)/( SQRT Count))



9      Theoretical Distributions Including the Normal Distribution

9.1   Definition of Inferential Statistics

9.1.1   Inferential statistics are concerned with decision-making. Usually, the decision is whether the difference between two samples is probably due to chance or probably due to some other factor. Inferential statistics help you make a decision by giving you the probability that the difference is due to chance. If the probability is very high a decision that the difference is due to chance is supported. If the probability is very low, a decision that the difference is due to some other factor is supported. Descriptive statistics are also used in these decision-making processes.

9.2   Introduction

9.2.1   Distributions from observed scores are called empirical distributions

9.2.2   Theoretical distributions are based on mathematical formulas and logic rather than on empirical observations. The probability that the event was due to chance is found by using a theoretical distribution.

9.2.3   Probability of the occurrence of any event ranges from .00 (there is no possibility that the event will occur) to 1.00 (the event is certain to happen). Theoretical distributions are used to find the probability of an event or a group of evnts.

9.3   Rectangular Distribution

9.3.1   The Histogram below is a theoretical frequency distribution that shows the types and number of cards in an ordinary deck of playing cards. Since there are 13 kinds of cards, and the frequency of each card is four, the theoretical curve is rectangular in shape. (The line that encloses a frequency polygon is called a curve, even if it is straight.) The number in the area above each card is the probability of obtaining that card in a chance draw from the deck. That probability (.077) was obtained by dividing the number of cards that represent the event (4) by the total number of cards (52)

9.3.2   Illustration Theoretical Card Draws                

9.3.3   Probabilities are often stated as “chances in a hundred.” The expression p=.077 means that there are 7.7 chances in 100 of the event in question occurring. Thus from the illustration above you can tell at a glance that there are 7.7 chances in 100 of drawing an ace from a deck of cards.

9.3.4   With this theoretical distribution, you can determine other probabilities. Suppose you wanted to know your chances of drawing a face card or a 10. These are the darkened areas above. Simply add the probabilities associated with a 10, jack, queen, and king. Thus, .077 +077 + 077 + 077=.308. Which means you have 30.8 chances in 100 of drawing one of these face cards or a 10.

9.3.5   One property of the distribution above is true for all theoretical distributions in that the total area under the curve is 1.00. In the above illustration there are 13 kinds of events, each with a probability of .077. Thus, (13)(.077)=1.00. With this arrangement, any statement about area is also a statement about probability. Of the total area under the curve, the proportion that signifies “ace” is .077, and that is also the probability of drawing an ace from the deck.

9.4   Binomial Distribution

9.4.1   The Binomial (two names) is another example of a theoretical distribution.

9.5   Comparison of Theoretical and Empirical Distributions

9.5.1   A theoretical curve represents the “best estimate” of how the events would actually occur. As with all estimates, the theoretical curve is somewhat inaccurate; but in the world of real events it is better than any other estimate. A theoretical distribution is one based on logic and mathematics rather than on observations. It shows you the probability of each event that is part of the distribution. When it is similar to an empirical distribution, the probability figures obtained from the theoretical distribution are accurate predictors of actual events.

9.5.2   There are a number of theoretical distributions that applied statisticians have found useful. (normal distribution, t distribution, F distribution, chi square distribution, and U distribution)

9.6   The Normal Distribution

9.6.1   Early statisticians, who found that frequency distributions of data gathered from a wide variety of fields were similar, established the name normal distribution.

9.6.2   The normal distribution is sometimes called the Gaussian distribution after Carl Friedrich Gauss (1777-1855) who developed the curve (about 1800 as a way to represent the random error in astronomy observations. Because this curve was such an accurate picture of the effects of random variation, early writers referred to the curve as the law of error.

9.6.3   Description of the Normal Distribution                 The normal distribution is a bell-shaped, symmetrical distribution, a theoretical distribution based on a mathematical formula rather than on any empirical observations although empirical curves often look similar to this theoretical distribution. Empirical distributions usually start to look like the normal distribution after 100 or more observations. When the theoretical curve is drawn, the Y-axis is usually omitted. On the X-axis, z scores are used as the unit of measurement for the standardized norm curve with the following formula.                 Formula                 The mean, median, and the mode are the same score-the score on the X-axis at which the curve is at its peak. If a line were drawn from the peak to the mean score on the X-axis, the area under the curve to the left of the line would be half the total area-50%-leaving half the area to the right of the line. The tails of the curve are asymptotic to the X axis; that is , they never actually cross the axis but continue in both directions indefinitely with the distance between the curve and the X axis getting less and less. Although theoretically the curve never ends, it is convenient to think of (and to draw) the curve as extending from -3 to +3.                 The two inflection points in the curve are at exactly -1 and +1. An inflection point is where a curve changes from bowed down to bowed up, or vice versa.                 Curves that are not normal distributions are definitely not abnormal but simply reflect how data is distributed. The use of the word normal is meant to imply frequently found.

9.6.4   Use of the Normal Distribution                 The theoretical normal distribution is used to determine the probability of an event as the figure below illustrates showing the probabilities associated with certain areas. The web link below can calculate these areas between the mean and the z score when you plug in the mean of 0 in the box to the left of the first applet and the z score in the right box then click between for the area between the mean and the z score as the illustration below demonstrates. These probabilities are also found in tables in the back of most statistic textbooks.                 Web Normal Distribution Link                 Illustration of Normal Distribution                 Any normally distributed empirical distribution can be made to correspond to the standardized normal distribution (a theoretical distribution) by using z scores. Converting the raw scores of any empirical normal distribution to z scores will give the distribution a mean equal to zero and a standard deviation equal to 1.00 and that is exactly the scale used in the theoretical normal distribution. With this correspondence established, the theoretical normal distribution can be used to determine the probabilities of empirical events, whether they are IQ scores, tree diameters, or hourly wages.

9.6.5   Finding What Proportion of a Population has Scores of a Particular Size or Greater                 Convert Raw Scores to z Scores    Formula    Variables Defined  (z)=z score  =standard deviation of scores  (x)=individual raw score  =mean    Procedure  Find the difference between the raw score and the mean  Divide that difference by the standard deviation                 Find the proportion of the distribution between the mean and the z score. (This gives you the proportion from the mean)    You can look this up in the back of a statistics textbook in the table for areas under the normal curve between the mean and z    Web Reference  The web link below can calculate these areas between the mean and the z score when you plug in the mean of 0 in the box to the left of the first applet and the z score in the right box then click between for the area between the mean and the z score as the illustration below demonstrates.                 Subtract the proportion between the mean and your z score from .5000    .5000 or 50% of the curve lies to the right of the mean and the proportion you found from the reference in step #2 above is the proportion between the mean and the z score    The difference is the proportion above your z score or the percentage of scores above your raw score expected to be found.

9.6.6   Finding the Score that Separates the Population into Two Proportions                 Instead of starting with a score and calculating proportions, you can also work backward and answer questions about scores if you are given proportions. If for example you want to find a score that is required to be in the top 10% of the population follow the procedure below.                 Formula                 Variables Defined    (z)=z score    =standard deviation of scores    (x)=individual raw score    =mean                 Procedure    Find the difference between the chosen percentage and .5000. For example .5000-.1000=.4000. (If you wanted to find the z score that separates the upper 10% of the distribution from the rest.    The product of step # 1 above is used to calculate the z score for the above equation. To find the z score use the use tables in a stats textbook in the table for areas under the normal curve between the mean and z. Look up the difference in the previous step or its closest approximation and find the z score associated to plug into the equation above. You can also use the web reference below to find the z score   Web Reference     The Web reference below 2nd applet gives you the z score to be used in the equation above. Pug in a mean of 0 and SD (Standard Deviation) of 1, put in the percentage in decimals (eg .10=10%, .20=20%) into the shaded area box, and click the above button to obtain the z score you can use in the above equation.    Plug the z score found in step # 2 above into the equation above to find the raw score which separates the two proportions.

9.6.7   Finding the Proportion of the Population between Two Scores                 Convert Scores to Z scores    Formula    Variables Defined  (z)=z score  =standard deviation of scores  (x)=individual raw score  =mean    Procedure  Find the difference between the raw score and the mean  Divide that difference by the standard deviation                 Find the proportion of the distribution between the mean and the z score. (This gives you the proportion from the mean) for each of the z scores above.    You can look this up in the back of a statistics textbook in the table for areas under the normal curve between the mean and z    Web Reference  The web link below can calculate these areas between the mean and the z score when you plug in the mean of 0 in the box to the left of the first applet and the z score in the right box then click between for the area between the mean and the z score as the illustration below demonstrates.                 Add the proportions to find the Proportion of the Population between Two Scores

9.6.8   Finding the Extreme Scores in a Population                 This section outlines how to find extreme scores that divide the population into a percentage at each tail of the distribution.                 Formula                 Variables Defined    (z)=z score    =standard deviation of scores    (x)=individual raw score    =mean                 Procedure    Divide the percentage by 2    Find the difference between .5000 and the halved percentage    Find the z score from the previous step    Plug the z score into the above equation

9.7   Comparison of Theoretical and Empirical Answers

9.7.1   The accuracy of predictions based on a normal theoretical distribution will depend on how representative the empirical sample as discussed in the next section

10      Samples and Sampling Distributions

10.1 Introduction

10.1.1            An understanding of sampling distributions requires an understanding of samples. A sample, of course, is some part of the whole thing; in statistics the “whole thing” is a population. The population is always the thing of interest; a sample is used only to estimate what the population is like. One obvious problem is to get samples that are representative of the population.

10.1.2            Samples that are random have the best chance of being representative and a sampling distribution can tell you how much faith (probability-wise) you can put in results based on a random sample.

10.1.3            Population             Population means all the members of a specified group. Sometimes the population is one that could actually be measured, given plenty of time and money. Sometimes, however, such measurements are logically impossible. Inferential statistics are used when it is not possible or practical to measure an entire population.             So, using samples and the methods of inferential statistic can make decisions about immeasurable populations. Unfortunately, there is some peril in this. Samples are variable, changeable things. Each one produces a different statistic. How can you be sure that the sample you draw will produce a statistic that will lead to a correct decision about the population? Unfortunately, you cannot be absolutely sure. To draw a sample is to agree to accept some uncertainty about the results. However it is possible to measure this uncertainty. If a great deal of uncertainty exists, the sensible thing to do is suspend judgment. On the other hand, if there is very little uncertainty, the sensible thing to do is reach a conclusion, even though there is a small risk of being wrong. Restated you must  introduce a hypothesis about a population and then, based on the results of a sample, decide that the hypothesis is reasonable or that it should be rejected.

10.2 Representative and Nonrepresentative Samples

10.2.1            Introduction             If you want to know about an unmeasurable population you have to draw a representative sample by using a method of obtaining samples that is more likely to produce a representative sample than any other method. How well a particular method works can be assessed either mathematically or empirically. For an empirical assessment, start with a population of numbers, the parameter of which can be easily calculated. The particular method of sampling is repeatedly used, and the corresponding statistic calculated for each sample. The mean of these sample statistics can then be compared with the parameter.             We will name two methods of sampling that are most likely to produce a representative sample, discuss one of them in detail, and then discuss some ways in which Nonrepresentative samples are obtained when the sampling method is biased.

10.2.2            Random Samples             A method called random sampling is commonly used to obtain a sample that is most likely to be representative of the population. Random has a technical meaning in statistics and does not mean haphazard or unplanned. A random sample in most research situations is one in which every potential sample of size N has an equal probability of being selected. To obtain a random sample, you must                  Define the population of scores                  Identify every member of the population                  Select scores in such a way that every sample has an equal probability of being chosen             Another method is to assign each score a number and use the random number generator below to pick your sample of numbers.                  Random Number Generator             We'll go through these steps with a set of real data-the self-esteem scores of 24 fifth-grade children.2 We define these 24 scores as our population. From these we will pick a random sample of seven scores.             Self Esteem Scores                      . One method of picking a random sample is to write each self-esteem score on a slip of paper, put the 24 slips in a box, jumble them around, and draw out seven. The scores on the chosen slips become a random sample. This method works fine if the slips are all the same size and there are only a few members of the population. If there are many members, this method is tedious.             Another (easier) method of getting a random sample is to use a table of random numbers, such as Table B in the Appendix. To use the table, you must first assign an identifying number to each of the 24 self-esteem scores, thus:             Random Number Assignment                              Each score has been identified with a two-digit number. Now turn to Table B and pick a row and a column in which to start. Any haphazard method will work; close your eyes and stab a place with your finger. Suppose you started at row 35, columns 70-74. Reading horizontally, the digits are 21105. Since you need only two digits to identify any member of our population, use the first two digits, 21. That identifies one score for the sample-a score of 46. From this point, you can read two-digit numbers in any direction-up, down, or sideways-but the decision should have been made before you looked at the numbers. If you had decided to go down, the next number is 33. No self-esteem score has an identifying number of 33, so skip it and go to 59, which gives you the saine problem as 33. In fact, the next five numbers are too large. The sixth number is 07, which identifies the score of 32 for the random sample. The next usable number is 13, a score of 35. Continue in this way until you arrive at the bottom. At this point, you can go in any direction. We will skip over two columns to columns 72 and 73 (you were in columns 70 and 71) and start up. The first number is 12, which identifies a score of 31. The next usable numbers are 19, 05, and 10, giving scores of 35, 42, and 24. Thus, the random sample of seven consists of the following scores: 46, 32, 35, 31, 35, 42, and 24. If Table B had produced the same identifying number twice, you would have ignored it the second time.             What is this table of random numbers? In Table B (and in any table of random lUmbers), the probability of occurrence of any digit from a to 9 at any place in the table s the same-. 10. Thus, you are just as likely to find 000 as 123 or 381. Incidentally, 'ou cannot generate random numbers out of your head. Certain sequences begin to _cur, and (unless warned) you will not include enough repetitions like 666 and 000.        Here are some hints for using a table of random numbers.              Make a check beside the identifying number of a score when it is chosen for the sample. This will help prevent duplications.              If the population is large (over 100), it is more efficient to get all the identifying numbers from the table first. As you select them, put them in some rough order. This will help prevent duplications. After you have all the identifying numbers, go to the population to select the sample.              If the population has exactly 100 members, let 00 be the identifying number for 100. In this way ,you can use two-digit identifying numbers, each one of which matches a population score. This same technique can be applied to populations of 10 or 1000 members.

10.2.3            Stratified Samples             A method called stratified sampling is another way to produce a sample that is very likely to mirror the population. It can be used when an investigator knows the numerical value of some important characteristic of the population. A stratified sample is controlled so that it reflects exactly some know characteristic of the population. Thus, in a stratified sample, not everything is left to chance.             For example, in a public opinion poll on a sensitive political issue, it is important that the sample reflect the proportions of the population who consider themselves Democrat, Republican, and Independent. The investigator draws the sample so it will reflect the proportions found in the population. The same may be done for variables such as sex, age, and socio-economic status. After stratification of the samples has been determined, sampling within each stratum is usually random.             To justify a stratified sample, the investigator must know what var9iables will affect the results and what the population characteristics are for those variables. Some times the investigator has this information (as from census data), but many times such information is just not available (as in most research situations).

10.2.4            Biased Samples             A biased sample is one that is drawn using a method that systematically underselects or overselects from certain groups within the population. Thus, in a biased sampling technique, every sample of a given size does not have an equal opportunity of being selected. With biased sampling techniques, you are much more likely to get a Nonrepresentative sample than you are with random or stratified sampling techniques.             For example, it is reasonable to conclude that some results based on mailed questionnaires are not valid, because the samples are biased since not all of the recipients will respond and those that do may be different than those that do. Therefore, the sample is biased. The probability of bias is particularly high if the questionnaire elicits feelings of pride or despair or disgust or apathy in some of the recipients.             With a nice random sample you can predict fairly accurately your chance of being wrong. If it is higher than you would like, you can reduce it by increasing sample size. With a biased sample, however, you do not have any basis for assessing your margin of error and you don’t know how much confidence to put in your predictions. You may be right or you may be very wrong. You may get generalizable results from such samples, but you cannot be sure. The search for biased samples in someone else’s research is a popular (and serious) game among researchers.

10.3 Sampling Distributions

10.3.1            Introduction             The two categories of sampling distributions are: sampling distributions in general and sampling distributions of the mean.             A sampling distribution is a frequency distribution of sample statistics. Drawing many random samples from a population and calculating a statistic on each sample could obtain a sampling distribution. These statistics would be arranged into a frequency distribution. From such a distribution you could find the probability of obtaining any particular values of the statistic.             Every sampling distribution is for a particular statistic (such as the mean, variance, correlation coefficient and so forth). In this section you will learn only about the sampling distribution of the mean. It will serve as an introduction to sampling distributions in general, some others of which you will find out about in later sections.

10.4 The Sampling Distribution of the Mean

10.4.1            Introduction             Empirical Sampling Distribution of the Mean                  An empirical sampling distribution of the mean is a frequency distribution of sample means                Every sample is drawn randomly from the same population                The sample size (N) is the same for all samples                The number of samples is very large                  Illustration                The following illustration shows 200 separate random samples, each with N=10 from a population of 24 self esteem scores. The mean of each group of 10 was calculated, and arranged in 200 sample means () into the frequency polygon. The mean (parameter of the 24 self esteem scores is 35.375. In the illustration below most of the statistics (sample means) are fai8rly good estimates of that parameter. Some of the ’s, of course, miss the mark widely; but most are pretty close. The illustration below is an empirical sampling distribution of the mean. Thus, a sampling distribution of the mean is a frequency distribution of sample means.                Empirical sampling Distribution of the Means (Frequency Distribution of sample means)                        You will never use an empirical sampling distribution of the mean in any of your calculations; you will always use theoretical ones that come from mathematical formulas. An empirical sampling distribution of the mean is easier to understand for illustration purposes.                  Central Limit Theorem                For any population of scores, regardless of form, the sampling distribution of the mean will approach a normal distribution as N (sample size) gets larger. Furthermore, the sampling distribution of the mean will have a mean equal to the  and a standard deviation equal to .                  Now you know not only that sampling distributions of the mean are normal curves but also that, if you know the population parameters  and , you can determine the parameters of the sampling distribution.                  One qualification is that the sample size (N) be large. How many does it take to make a large sample? The traditional answer is 30 or more, although, if the population itself is symmetrical, a sampling distribution of the mean will be normal with sample sizes much smaller than 30. If the population is severely skewed samples with 30 (or more) may be required.                  The mean of the sampling distribution of means will be the same as the population mean, . The standard deviation of the sampling distribution will be the standard deviation of the population () divided by the square root of the sample size.                  The Central Limit Theorem works regardless of the form of the original population. Thus, the sampling distribution of the mean of scores coming from a rectangular or bimodal population approaches normal if N is large.                  The standard deviation of any sampling distribution is called the standard error, and the mean is called the expected value. In this context, and in several others in statistics, the term error means deviations or random variation. Sometimes, error refers to a mistake, but most often it is used to indicate deviations or random variation.              In the case of the sampling distribution of the mean, we are dealing with a standard error of the mean (symbolized   and the expected value of the mean [symbolized E().Although E() is rarely encountered the standard error is commonly used. Be sure that you understand that it is the standard deviation of the sampling distribution of some statistic. In this section, it is the standard deviation of a sampling distribution of the mean.              Illustration Theoretical Sampling Distribution of the Mean, N=10            Population          Mean=35.375          Standard Deviation=6.304            Sampling Distribution of the Mean for N=10          Mean=35.375          Standard Deviation=6.304/ =1.993            Illustration         

10.4.2            Use of a Sampling Distribution of the Mean             Since the sampling distribution of the mean is a normal curve, you can apply what you learned in the last chapter about normally distributed scores to questions about sample means. In the above illustration notice the question mark points to the area below a mean of 32 of sample means and asks what proportion of sample mean scores would fall below that score. First you would find the standard error of the mean, then the z score which allows you to determine the proportion.             Standard error of the mean Formula                      Procedure                  Divide the standard deviation of the population by the square root of the number of the sample size.             (z) score Formula                      Variables Defined                  =Standard error of the mean                  *=Mean of the sample                  =Mean of the population                  z=z score             Procedure                  Find the difference between the population mean and the sample mean                  Divide the difference found in the previous step by the standard error of the mean to determine the z score.                  Find the proportion associated with the z score of the previous step with the Web link below                  Web Reference                Using the web reference below, click the below button and type in your z score to find the proportion of scores which fall below that score. Likewise knowing the z score you could find scores between the z score and the mean or any other combination by clicking the appropriate button and inserting your z score.             Using the Illustration Theoretical Sampling Distribution of the Mean (above) with a z score of 1.993, you would expect a proportion of .0455 of the means to be less than 32. We can check this prediction by determining the proportion of those 200 random samples that had means of 32 or less. By checking the frequency distribution from which the theoretical sampling distribution was drawn (Empirical sampling Distribution of the Means (Frequency Distribution of sample means)) (see above) we found the empirical proportion to be .0400. Missing by ½ of 1 percent isn’t bad, and once again, you find that a theoretical normal distribution predicts an actual empirical proportion quite nicely.             What effect does sample size have on a sampling distribution? When the sample size (N) becomes larger  will become smaller. See the equation above and illustration below. This illustration shows some sampling distributions of the mean based on the population of 24 self-esteem scores. The sample sizes are 3, 5, 20, 20. A sample mean of 39 is included in all four figures as a reference point. Notice that, as  becomes smaller, a sample mean of 39 becomes a rarer and rarer event. The good investigator, with an experiment to do, will keep in mind what we have just demonstrated about the effect of sample size on the sampling distribution and will use reasonably large samples.                  Illustration Sampling distributions of the mean for four different sample sizes. All samples are drawn from the same population. Note how a sample mean of 39 becomes rarer and rarer as  becomes smaller.               

10.5 Calculating a Sampling Distribution when Parameters are not Available

10.5.1            Introduction             All of the foregoing information is based on the assumption that you have the population parameters, and, as you know, that is seldom the case. Fortunately, with a little modification of the formula and no modification of logic, the random sample you learned to draw can be used for estimating the population parameters.             When you have only a sample standard deviation with which to estimate the standard error of the mean, the formula is the following             The statistic s is an estimate of , and  is required for use of the normal curve. The larger the sample size, the more reliable s is. As a practical matter, s is considered reliable enough if N is  30. As a technical matter, the normal curve is only appropriate when you know  and .

10.5.2            Standard Error of the Mean Estimated from a sample             Formula                  s =s/*             Variables Defined                  s=standard error of the mean estimated from a sample                  s=standard deviation of a sample                  N=sample size             Procedure                  Divide s by the square root of N to find s.

10.6 Confidence Intervals

10.6.1            Introduction             Mathematical statisticians identify two different types of decision-making processes as statistical inference. The first process is called hypothesis testing, and the second is called estimation. Hypothesis testing means to hypothesize a value for a parameter, compare (or test) the parameter with an empirical statistic, and decide whether the parameter is reasonable. Hypothesis testing is just what you have been doing so far in this chapter. Hypothesis testing is the more popular technique of statistical inference.             The other kind of inferential statistics, estimation, can take two forms-parameter estimation and confidence intervals. Parameter estimation means that one particular point is estimated to be the parameter of the population. A confidence interval is a range of values bounded by a lower and an upper limit. The interval is expected, with a certain degree of confidence, to contain the parameter. These confidence intervals are based on sampling distributions.

10.6.2            The Concept of a Confidence Interval             A confidence interval is simply a range of values with a lower and an upper limit. With a certain degree of confidence (usually 95% or 99%), you can state that the two limits contain the parameter. The following example shows how the size of the interval and the degree of confidence are directly related (that is, as one increases the other increases also).             A sampling distribution can be used to establish both confidence and the interval. The result is a lower and an upper limit for the unknown population parameter.             Here is the rationale for confidence intervals. Suppose you define a population of scores. A random sample is drawn and the mean () calculated. Using this mean (and the techniques described in the next section), a statistic called a confidence interval is calculated. (We will use a 95% confidence interval in this explanation.) Now, suppose that from this population many more random samples are drawn and a 95% confidence interval calculated for each. For most of the samples,  will be close to  and  will fall within the confidence interval. Occasionally, of course, a sample will produce an  far from  and the confidence interval about   will not contain . The method is such, however, that the probability of these rare events can be measured and held to an acceptable minimum like 5%. The result of all this is a method that produces confidence intervals, 95% which contain .             In real life situation, you draw one sample and calculate one interval. You do not know whether or not  lies between the two limits, but the method you have used makes you 95% confident that it does.

10.6.3            Calculating the Limits of a Confidence Interval             Introduction                  Having drawn a random sample and calculated the mean and standard error, the Upper and Lower limit confidence Interval may be calculated.                  The term confidence level is used for problems of estimation, such as confidence intervals, and the term significance level is used for problems of hypothesis testing.             Formulas                  s =s/*                  LL=-z*( s)                  UL=+z*( s)             Variables Defined                  s=standard error of the mean estimated from a sample                  s=standard deviation of a sample                  N=sample size                  *=Mean of the sample                  z=z score (1.96=95% 2.58 =99%)             Procedure                  Standard error of the mean estimated from a sample s                Divide s by the square root of N to find s.                  Lower Limit                Multiply the z score (based on the confidence interval you want (1.96=95% 2.58 =99%)) by s                Find the difference between * and the product of the previous step to determine the lower limit score                  Upper Limit                Multiply the z score (based on the confidence interval you want (1.96=95% 2.58 =99%)) by s                Find the sum of the * and the product of the previous step to determine the upper limit score

10.7 Other Sampling Distributions

10.7.1            Introduction             Now you have been introduced to the sampling distribution of the mean. The mean is clearly the most popular statistic among researchers. There are times, however, when the statistic necessary to answer a researcher’s question is not the mean. For example, to find the degree of relationship between two variables, you need a correlation coefficient. To determine whether a treatment causes more variable responses, you need a standard deviation. Proportions are commonly used statistics. In each of these cases (and indeed, for any statistic), the basic hypothesis testing procedure you have just learned is often used by researchers.             Procedure                  Hypothesize a population parameter                  Draw a random sample and calculate a statistic                  Compare the statistic with a sampling distribution of that statistic and decide whether such a sample statistic is likely if the hypothesized population parameter is true             There are sampling distributions for statistics other than the mean such as the t distribution. In addition, some statistics have sampling distributions that are normal, thus allowing you to use the familiar normal curve.             Along with every sampling distribution comes a standard error. Just as every statistic has its sampling distribution, every statistic has its standard error. For example, the standard error of the median is the standard deviation of the sampling distribution of the median. The standard error of the variance is the standard deviation of the sampling distribution of the variance. Worst of all, the standard error of the standard deviation is the standard deviation of the sampling distribution of the standard deviation. If you followed that sentence, you probably understand the concept of standard error quite well.             The main points we want to emphasize are that statistics are variable things, that a picture of that variety is a sampling distribution, and that a sampling distribution can be used to obtain probability figures.

10.8 A Taste of Reality

10.8.1            Introduction             The techniques of inferential statistics that you are learning in this book are based on the assumption that a random sample has been drawn. But how often do you find random samples in actual data analysis? Seldom. However, there are two justifications for the continued use of non-random samples.             In the first place, every experiment is an exercise in practicality. Any investigator has a limited amount of time, money, equipment, and personnel to draw upon. Usually, a truly random sample of a large population is just not practical, so the experimenter tries to obtain a representative sample, being careful to balance or eliminate as many sources of bias as possible.             In the second place, the only real test of generalizability is empirical-that is finding out whether the results based on a sample are also true for other samples. This kind of check-up is practiced continually. Usually, the results based on samples that are unsystematic (but not random) are true for other samples from the same population.             Both of these justifications develop a very hollow ring, however, if someone demonstrates that one of your samples is biased and that a representative sample proves your conclusions false.

11      Differences between Means

11.1 Introduction

11.1.1            One of the best things about statistics is that it helps you to understand experiments and the experimental method. The experimental method is probably the most powerful method we have of finding out about natural phenomena. Few ifs, ands, or buts or other qualifiers need to be attached to conclusions based on results from a sound experiment.

11.1.2            The sections below will discuss the simplest kind of experiment and then show how the statistical techniques you have learned about sampling distributions can be expanded to answer research questions.

11.2 A Short Lesson on How to Design An Experiment

11.2.1            The basic ideas underlying a simple two-group experiment are not very complicated             The logic of an experiment                  Start with two equivalent groups and treat them exactly alike except for one thing. Then measure both groups and attribute any difference between the two to the one way in which they were treated differently.             The above summary of an experiment is described more fully in the table below

11.2.2            Illustration Summary of simple Experiment Table 8-1            

11.2.3            The fundamental question of the experiment outlined above is “What is the effect of Treatment A on a person’s ability to perform Task Q” In more formal terms, the question is “For Task Q scores, is the mean of the population of those who have had Treatment A different from the mean of the population of those who have not had Treatment A?” This experiment has an independent variable with two levels (Treatment A or no Treatment A) and a dependent variable (scores on Task Q). A population of subjects is defined and two random samples are drawn.             An equivalent statement is that there are two populations to begin with and that the two population means are equal. On random sample is then drawn from each population. Actually, when two samples are drawn from on population, the correct procedure is to randomly assign each subject to a group immediately after it is drawn from the population. This procedure continues until both groups are filled.

11.2.4            These random samples are both representative of the population and (approximately) equivalent to each other. Treatment A is then administered to one group (commonly called the experimental group) but not to the other group (commonly called the control group). Except for Treatment A, both groups are treated exactly the same way. That is, extraneous variables are held constant or balanced out for the two groups. Both groups perform Task Q and the mean score for each group is calculated. The two sample means almost surely will differ. The question now is whether this observed difference is due to sampling variation (a chance difference) or to Treatment A. You can answer this question by using the techniques of inferential statistics. (See illustration above) In the above example the word treatment refers to different levels of the independent variable. The illustrations experiment had two treatments.

11.2.5            In some experimental designs, subjects are assigned to treatments by the experimenter, in others, the experimenter uses a group of subjects who have already been “treated” 9for example, being males or being children of authoritarian parents). In either of these designs, the methods of inferential statistics are the same, although the interpretation of the first kind of experiment is usually less open to attack.             This issue is discussed more fully in Research Design and Methodology textbooks.

11.2.6            Inferential statistics are used to help you decide whether or not a difference between sample means should be attributed to chance.

11.3 The Logic of Inferential Statistics (The rationale for using the null hypothesis)

11.3.1            A decision must be made about the population of those given Treatment A, but is must be made on the basis of sample data. Accept from the start that because of your decision to use samples, you can never know for sure whether or not Treatment A has an effect. Nothing is ever proved through the use of inferential statistics. You can only state probabilities, which are never exactly one or zero. The decision-making goes like this. In a well-designed two-group experiment, all the imaginable results can be reduced to two possible outcomes: either Treatment A has an effect or it does not. Make a tentative assumption that Treatment A does not have an effect and then, using the results of the experiment for guidance, find out how probable it is that the assumption is correct. If it is not very probable, rule it out and say that Treatment A has an effect. If the assumption is probable, you are back where you began: you have the same two possibilities you started with. (Negative inference)

11.3.2            Putting this into the language of an experiment. Begin with two logical possibilities, a and b             Treatment A did not have an effect. That is , the mean of the population of scores of those who received Treatment A is equal to the mean of the population of scores of those who did not receive Treatment A, and thus the difference between population means is zero. This possibility is symbolized H0 (pronounced “H sub oh”).             Treatment A did have an effect. That is, the mean of the population of scores of those who received Treatment a is not equal to the mean of the population of scores of those who did not receive Treatment A. This possibility is symbolized H1 (pronounced “H sub one”).

11.3.3            Tentatively assume that Treatment A had no effect (that is, assume H0). If H0 is true, the two random samples should be alike except for the usual variations in samples. Thus, the difference in the sample means is tentatively assumed to be due to chance.

11.3.4            Determine the sampling distribution for these differences in sample means. This sampling distribution gives you an idea of the differences you can expect if only chance is at work.

11.3.5            By subtraction, obtain the actural difference between the experimental group mean and the control group mean.

11.3.6            Compare the difference obtained to the differences expected (from Step 3) and conclude that the difference obtained was:             Expected. Differences of this size are very probable just by chance, and the most reasonable conclusion is that the difference between the experimental group and the control group may be attributed to chance. Thus, retain both possibilities in Step 1.             Unexpected. Differences of this size are highly improbable, and the most reasonable conclusion is that the difference between the experimental group and the control group is due to something besides chance. Thus, reject H0 (possibility a in Step 1) and accept H1 (possibility b); that is, conclude that Treatment A had an effect.

11.3.7            The basic idea is to assume that there is no difference between the two population means and then let the data tell you whether the assumption is reasonable. If the assumption is not reasonable, you are left with only one alternative: the populations have different means.

11.3.8            The assumption of no difference is so common in statistics that it has a name: the null hypothesis, symbolized, as you have already learned, H0. The null hypothesis is often stated in formal terms:             H0: 1 -2 =0             H0: 1=2

11.3.9            That is, the null hypothesis states that the mean of one population is equal to the mean of a second population.             Actually, the concept of the null hypothesis is broader than simply the assumption of no difference although that is the only version used in this section. Under some circumstances, a difference other thatn zero might be the hypothesis tested.

11.3.10      H1 is referred to as an alternative hypothesis. Actually, there are an infinite number of alternative hypotheses-that is, the existence of any difference other than zero. In practice, however, it is usual to choose one of three possible alternative hypotheses before the data are gathered:        H1: 1 2              In the example of the simple experiment, this hypothesis states that Treatment A had an effect, without stating whether the treatment improves or disrupts performance on Task Q. Most of the problems in this section use this H1 as the alternative to H0. If you reject H0 and accept this H1, you must examine the means and decide whether Treatment A facilitated or disrupted performance on Task Q.        H1: 1 >2              The hypothesis states that Treatment A improves performance on Task Q.        H1: 1 <2              The hypothesis states that Treatment A disrupts performance on Task Q.

11.3.11      The null hypothesis is proposed and this proposal may meet with one of two fates at the hands of the data. The null hypothesis may be rejected, which allows you to accept an alternative hypothesis. Or it may be retained. If it is retained, it is not proved as true; it is simply retained as one among many possibilities.

11.3.12      Perhaps an analogy will help with this distinction about terminology. Suppose a masked man has burglarised a house and stolen all the silver. There are two suspects,H1 and H0. The lawyer for H0 tries to establish beyond reasonable doubt that her client was out of state during the time of the robbery. If she can do this, it will exonerate H0 (H0 will be rejected, leaving only H1 as a suspect). However, if she cannot establish this, the situation will revert to its original state: H1 or H0 could have stolen the silver away, and both are retained as suspects. So the null hypothesis can be rejected or retained but it can never be proved with certainty to be true or false by using the methods of inferential statistics. Statisticians are usually very careful with words. That is probably because they are used to mathematical symbols, which are very precise. Regardless of the reason, this distinction between retained and proved although subtle, is important.

11.4 Sampling Distribution of a Difference Between Means

11.4.1            A difference is simply the answer in a subtraction problem. As explained in the section on the logic of inferential statistics, the difference that is of interest is the difference between two means. You evaluate the obtained difference by comparing it with a sampling distribution of differences between means (often called a sampling distribution of mean differences).

11.4.2            Recall that a sampling distribution is a frequency distribution of sample statistics, all calculated from samples of the same size drawn from the same population; the standard deviation of that frequency distribution is called a standard error. Precisely the same logic holds for a sampling distribution of differences between means.

11.4.3            We can best explain a sampling distribution of differences between means by describing the procedure for generating an empirical sampling distribution of mean differences. Define a population of scores. Randomly draw two samples, calculate the mean of each, and subtract the second mean from the first. Do this many times and then arrange all the differences into a frequency distribution. Such a distribution will consist of a number of scores, each of which is a difference between two sample means. Think carefully about the mean of the sampling distribution of mean differences. Stop reading and decide what the numerical value of this mean will be. The mean of a sampling distribution of mean differences is zero because, on the average, the sample means will be close to , and the differences will be close to zero. These small positive and negative differences will then cancel each other out.

11.4.4            This sampling distribution of mean differences has a standard deviation called the standard error of a difference between means.

11.4.5            In many experiments, it is obvious thaqt there are two populations to begin with. The question, however, is whether they are equal on the dependent variable. To generate a sampling distribution of differences between means in this case, assume that , on the dependent variable, the two population have the same mean, standard deviation, and form (shape of the distribution), Then draw one sample from each population, calculate the means, and subtract one from the other. Continue this many times. Arrange the differences between sample means into a frequency distribution.

11.4.6            The sampling distributions of differences between means that you will use will be theoretical distributions, not the empirical ones we described in the last two paragraphs. However, a description of the procedures for an empirical distribution, which is what we’ve just given, is usually easier to understand in the beginning.

11.4.7            Two things about a sampling distribution of mean differences are constant: the mean and the form. The mean is zero, and the form is normal if the sample means are based on large samples. Again the traditional answer to the question “What is a large sample?” is “30 or more.”

11.4.8            Example Experiment             The question of this experiment was “Are the racial attitudes of 9th graders different from those of 12th graders?” The null hypothesis was that the population means were equal (H0: 1=2). The alternative hypothesis was that they were not equal (H1: 1 2). The subjects in this experiment were 9th and 12th grade black and white students who expressed their attitudes about persons of their own sex but different race. Higher scores represent more positive attitudes. The table below shows the summary data. As you can quickly calculate from the first table below the obtained mean difference between samples of 9th and 12th graders is 4.10. Now a decision must be made. Should this difference in samples be ascribed to chance (retain H0; there is no difference between the population means)? Or should we say that such a difference is so unlikely that it is due not to chance but to the different characteristics of 9th and 12th grade students (reject H0 and accept H1; there is a difference between the populations)? Using a sampling distribution of mean differences (see 2nd illustration below, a decision can be made.             Data from an experiment that compared the racial attitudes of 9th and 12th grad students                      Sampling distribution from the racial attitudes study. It is based on chance and shows z scores, probabilities of those z scores, and differences between sample means.(Sampling Distribution Of Differences Between Means)                      The second illustration above shows a sampling distribution of differences between means that is based on the assumption that there are no population differences between 9th and 12th graders-that is, that the true difference between the population means is zero.. The figure is a normal curve that shows you z scores, possible differences between sample means in the racial attitudes study, and probabilities associated with those z scores and difference scores. Our obtained difference, 4.10, is not even shown on the distribution. Such events are very rare if only chance is at work. From the Figure you can see that a difference of 3.96 or more would be expected five times in 10,000 (.0005). Since a difference of –3.96 or greater also has a probability of .0005, we can add the two probabilities together to get .001. Since our difference was 4.10 (less probable than 3.96), we can conclude that the probability of a difference of 4.10 being due to chance is less than .001. This probability is very small, indeed, and it seems reasonable to rule out chance; that is, to reject H0 and, thus, accept H1. By examining the means of the two groups in table two above we can write a conclusion using the terms in the experiment. “Twelfth graders have more positive attitudes toward people of their own sex, but different race than do ninth graders.”

11.5 A Problem and Its Accepted Solution

11.5.1            The probability that populations of 9th and 12th grade attitude scores are the same was so small (p< .001) that it was easy to rule out chance as an explanation for the difference. But what if that probability had been .01, or .05, or .25, or .50? How to divide this continuum into a group of events that is “ due to chance” and another that is “not due to chance”-that is the problem.

11.5.2            It is probably clear to you that whatever solution is adopted will appear to be an arbitrary one. Breaking any continuum into two parts will leave you uncomfortable about the events close to either side of the break. Nevertheless, a solution does exist.

11.5.3            The generally accepted solution is to say that the .05 level of probability is the cut-off between “ due to chance” and “ not due to chance.” The name of the cut-off point that separates “ due to chance” and “not due to chance” is the level of significance. If an event has a probability of .05 or less (for example, p=.03, p=.01, or p=.001), H0 is rejected, and the event is considered significant ( not due to chance). If an event has a probability of .051 or greater (for example, p=.06, p=.50, or p=.99), H0 is retained, and the event is considered not significant (may be due to chance). Here, the word significant is not synonymous with “important.” A significant event in statistics is one that is not ascribed to chance.

11.5.4            The area  of the sampling distribution that covers the events that are “not due to chance” is called the critical region. If an event falls in the critical region, H0 is rejected. The figure above identifies the critical region for the .05 level of significance. As you can see, the difference in means between 9th and 12th grade racial attitudes (4.10) falls in the critical region, so H0 should be rejected.

11.5.5            Although widely adopted, the .05 level of significance is not universal. Some investigators use the .01 level in their research. When the .01 level is used and H1: 1 2, the critical region consists of .005 in each tail of the sampling distribution. In the figure above differences greater than –3.10 or 3.10 are required in order to reject H0 at the .01 level.

11.5.6            In textbooks, a lot of lip service is paid to the .05 level of significance as the cont off point for decision making. In actual research, the practice is to run the experiment and report any significant differences at the smallest correct probability value. Thus, in the same report, some differences may be reported as significant at the .001 level, some at the .01 level, and some at the .05 level. At present, it is uncommon to report probabilities greater than .05 as significant, although some researcher argue that the.10 or even the .20 level may be justified in certain situations.

11.6 How to Construct A Sampling Distribution of Differences Between Means

11.6.1            You already know two important characteristics of a sampling distribution of differences between means. The mean is 0, and the form is normal. When we constructed the illustration above of the sampling distribution of differences between the racial attitudes of 9th and 12th graders, we used the normal curve table and a form of the familiar z score.

11.6.2            General Formula             The formula in the text is the “working model” of the more general Formula. Since our null hypothesis is that 1-2=0-, the term in parentheses on the right is 0, leaving you with the “working model.” This more general formula is of a form you have seen before and will see again: the difference between a statistic (*1-*2) and a parameter (1-2) divided by the standard error of the statistic.             General Formula                      Working (Model) Formula (z Score For Observed Mean Difference)                      Formula Standard Error of Mean                  s =s/*             Formula Standard Error of Difference                      Formula Difference between Sample Means Associated with each z Score                      Variables Defined                  (z)=z score for the observed mean difference                  *1=Mean of one sample                  *2=mean of a second sample                  =standard error of a difference                  s1 =Standard Error of the mean of Sample 1                  s2 = Standard Error of the mean of Sample 2                  (*1-*2) =Difference between Sample Means             Procedure                  Standard error of the mean estimated from a sample s                Divide s by the square root of N to find s.                  Standard Error of Difference                Square the Standard Error of the mean of Sample 1 and add it to the square of the Standard Error of the mean of Sample 2.                Find the square root of the result of the previous step to find the Standard Error of Difference                  z Score For Observed Mean Difference                Find the difference between the mean of sample 1 and the mean of sample                Divide the difference found in the previous step by the standard error of difference found in the previous section.                  Difference between Sample Means Associated with each z Score                Multiply the z score found in a stats textbook table or with the Web reference below by the standard error of a difference to determine the difference between Sample Means             Discussion                  Standard Error of Difference                When creating a Sampling Distribution Of Differences Between Means as in the illustration above (see Sampling distribution from the racial attitudes study) the tick marks at the baseline of the illustration (like the standard deviation) represents increments of the standard error of difference.                   Probability of Difference this large or Larger Occurring as a Result of Chance                The probabilities are; .25 .125 .025 .005 .0005                These probabilities are displayed in the illustration above (see Sampling distribution from the racial attitudes study) at the bottom of the chart                  Finding the z score associated with the probabilities                There are at least two ways of determining the z score associated with the above probabilities               Look up the z score in a table in the back of a stats text book. To do this you will need to subtract the probabilities above from .5000 to find the correct z score which will give you the proportions                  .25 .375 .475 .495 .4995                  Look these up in a table in the back of a sts text book to find the z scores listed below plug oin the following probability figures .25 .125 .025 .005 .0005               Using the web reference below                  Plug in the following probabilities  .25 .125 .025 .005 .0005 into the shaded area of the 3rd applet and click the above or below button                  Web Reference                  The following z scores are associated                .67 1.15 1.96 2.58 3.30                  Difference between Sample Means                These scores are placed between the z scores and probabilities (see illustration above) (see Sampling distribution from the racial attitudes study)                  z Score For Observed Mean Difference                This score is compared in the chart with the statistics in the Sampling Distribution Of Differences Between Means to determine whether the difference is significant.        Simple z score method (z Score For Observed Mean Difference ) (No Charting)              Procedure            Simply determine the z Score For Observed Mean Difference go to the 2nd applet from the web reference below, and plug in the z score to determine the proportion above the z score to determine proportion occurring by chance.             Web Reference  

11.7 An Analysis of Potential Mistakes

11.7.1            Introduction             The significance level is the probability that the null hypothesis will be rejected in error when it is true (a decision known as a Type I error). The significance of a result is also called its p-value; the smaller the p-value, the more significant the result is said to be.             At first glance, the idea of adopting a significance level of 5% seems preposterous to some who argue for greater certainty.  How about using a level of significance of one in a million, which reduces uncertainty to almost nothing. It is true that adopting the .05 level of significance leaves some room for mistaking a chance difference for a real difference. Lowering the level of significance will reduce the probability of this kind of mistake, but it increases the probability of another kind. Uncertainty about the conclusion will remain. In this section, we will discuss the two kinds of mistakes that are possible. You will be able to pick up some hints on reducing uncertainty, but if you agree to draw a sample, you agree to accept some uncertainty about the results.             Type I Error                  Rejecting the Null Hypothesis when it is true. The probability of a Type I error is symbolized by (alpha).             Type II Error                  Accepting the Null Hypothesis when it is false. The probability of a Type II error is symbolized by  (beta)             You are already somewhat familiar with  from your study of level of significance. When the .05 level of significance is adopted, the experimenter concludes that and event with p< .05 is not due to chance. The experimenter could be wrong; if so, a Type I error has been made. The probability of a Type I error--is controlled by the level of significance you adopt.             A proper way to think of  and a Type I error is in terms of “in the long run” (see illustration above) (see Sampling distribution from the racial attitudes study) is a theoretical sampling distribution of mean differences. It is a picture of repeated sampling (that is, the long run). All those differences came from sample means that were drawn from the same population, but some differences were so large they could be expected to occur only 5 percent of the time. In an experiment, however, you have only one difference, which is based on your two sample means. If this difference is so large that you conclude that there are two populations whose means are not equal, you may have made a Type I error. However, the probability of such an error is not more than .05.             The calculation of  is a more complicated matter. For one thing, a Type II error can be committed only when the two populations have different means. Naturally, the farther apart the means are, the more likely you are to detect it, and thus the lower  is. We will discuss other factors that affect  in the last section. “How to reject the Null Hypothesis.”             The general relationship between  and  is an inverse one. As  goes down,  goes up. That is, if you insist on a larger difference between means before you call the difference nonchance, you are less likely to detect a real nonchance difference if it is small. The illustration below demonstrates this relationship.             Illustration Frequency distribution of raw scores when H0 is false                 The illustration above is a picture of two populations. Since these are populations, the “truth” is that the mean of the experimental group is four points higher than that of the control group. Such “truth” is available only in hypothetical examples in textbooks. In the real world of experimentation you do not know population parameters. This example, however, should help you understand the relation of  to . If a sample is drawn from each population, there is only one correct decision: reject H0. However, will the investigator make the correct decision? Would a difference of four be expected between sample means from Populations A and B (14-10=4)? To evaluate the probability of a difference of four, see if it falls in the critical region of the sampling distribution of mean differences, shown in the illustration below. (We arbitrarily picked this sampling distribution so we could illustrate the points below.)        Illustration Sampling distribution of differences between means from Populations A and B if H0 were true             As you can see in the illustration above, a difference of 4 score points would be expected 4.56 percent of the time. If  had been set at .05, you would correctly reject H0, since the probability of the obtained difference (.0456) is less than .05. However, if  had been set at .01, you would not reject H0, since the obtained probability (.0456) is not less than .01. Failure to reject H0 in this case is a Type II error.        At this point, we can return to our discussion of setting the significance level. The suggestion was “Why not reduce the significance level to one in a million?” From the analysis of the potential mistakes, you can answer that when you decrease , you increase . So protection from one error is traded for liability to another kind of error.        Most persons who use statistics as a tool set  (usually at .05) and let  fall where it may. The actual calculation of , although important, is beyond the scope of this discussion.

11.8 One-Tailed and Two-Tailed Tests

11.8.1            Introduction             Earlier, we discussed the fact that in practice it is usual to choose one of three possible alternative hypotheses before the data are gathered.                  H1: 12= This hypothesis simply says that the population means differ but makes no statement about the direction of the difference.                  H2: 1  2= Here, the hypothesis is made that the mean of the first population is greater than the mean of the second population                  H3: 1  2=The mean of the first population is smaller than the mean of the second population             So far in this section, you have been working with the first H1.You have tested the null hypothesis, 1=2, against the alternative hypothesis 12. The null hypothesis was rejected when you found large positive deviations (1>2) or large negative deviations (1<2). When  was set at .05, the .05 was divided into .025 in each tail of the sampling distribution,  as seen in the illustration below.                  Illustration                    In a similar way, you found the probability of a difference by multiplying by 2 the probability obtained from the z score. With such a test, you can reject H0 and accept either of the possible alternative hypotheses, 1  2 or 1  2. This is called a two-tailed test of significance, for reasons that should be obvious from the illustration above.             Sometimes, however, an investigator is concerned only with deviations in one direction; that is, the alternative hypothesis of interest is either 1  2 or 1  2. In either case, a one-tailed test is appropriate. The illustration below is a p;icture of the sampling distribution for a one-tailed test, for  1  2.                  Illustration                    For a one-tailed test, the critical region is all in one end of the sampling distribution. The only outcome that allows you to reject H0 is one in which 1 is so much larger than 2 that the z score is 1.65 or more. Notice in the above illustration that if you are running a one-tailed test there is no way to conclude that 1 is less than 2, even if 2 is many times the size of 1. In a one-tailed test, you are interested in only one kind of difference. One-tailed tests are usually used when an investigator knows a great deal about the particular research area or when practical reasons dictate an interest in establishing 1  2 but not 1  2.             There is some controversy about the use of one or two-tailed test. When in doubt use a two-tailed test. The decision to use a one-tailed or a two-tailed test should be made before the data are gathered.

11.9 Significant Results and Important Results

11.9.1            The word “significant” has a precise technical meaning in statistics and other meanings in other contexts.

11.9.2            A study that has statistically significant results may or may not have important results. You have to decide about the importance without the help of inferential statistics.

11.10                   How To Reject the Null Hypothesis

11.10.1      To reject H0 is to be left with only one alternative, H1, from which a conclusion can be drawn. To retain H0 is to be left up in the air. You don’t know whether the null hypothesis is really true or whether it is false and you just failed to detect it. So, if you are going to design and run an experiment, you should maximise your chances of rejecting H0. There are three factors to consider actual difference, standard error, and .

11.10.2      In order to get this discussion out of the realm of the hypothetical and into the realm of the practical, consider the following problem. Supposing you want to select a research project which seeks to reject H0. You decide to try to show that widgets are different from controls. Accept for a moment the idea that widgets are different-that H0 should be rejected. What are the factors that determine whether you will conclude from your experiment that widgets are different?        Actual Difference              The larger the actual difference between widgets and controls, the more likely you are to reject H0. There is a practical limit, though. If the difference is too large, other people will call your experiment trivial, saying that it demonstrates the obvious and that anyone can see that widgets are different. On the other hand, small differences can be difficult to detect. Pre-experiment estimations of actual differences are usually based on your own experience.        The Standard Error of a Difference              Review the formula below                You can see that as  gets smaller, z gets larger, and you are more likely to reject H0. This is true, of course, only if widgets are really different from controls. Here are two ways you can reduce the size of .            Sample Size          The larger the sample, the smaller the standard error of the difference. (See Illustration) This illustration shows that the larger the sample size, the smaller the standard error of the mean. The same relationship is true for the standard error of a difference.          Some Texts [8] show you how to calculate the sample size required to reject H0. In order to do this calculation, you must make assumptions about the size of the actual difference. Many times, the size of the sample is dictated by practical consideration-time, money, or the availability of widgets.            Sample variability          Reducing the variability in the sample will produce a smaller . You can reduce variability by using reliable measuring instruments, recording data correctly, and, in short, reducing the “noise” or random error in your experiment.        Alpha              The larger  is, the more likely you are to reject H0. The limit to this factor is your colleagues’ sneer when you report that widgets are “significantly different at the .40 level.” Everyone believes that such differences should be attributed to chance. Sometimes practical considerations may permit the use of =.10. If wegetws and controls both could be used to treat a deadly illness and both have the same side effects, but “widgets are significantly better at the .10 level,” then widgets will be used. (Also, more data will then be gathered [sample size increased] to see whether the difference between widgets and controls is reliable.)

11.10.3      We will close this section on how to reject the null hypothesis by telling you that these three factors are discussed in intermediate-level texts under the topic power. The power of a statistical test is defined as 1-. The more powerful the test, the more likely it is to detect any actual difference between widgets and controls.

12      The t Distribution and the t-Test [9]

12.1 Introduction

12.1.1            The techniques you have learned so for require the use of the normal distribution to assess probabilities. These probabilities will be accurate if you have used  in your calculations or if N is so large that s is a reliable estimate of  . In this section, you will learn about a distribution that will give you accurate probabilities when you do not know  and N is not large. The logic you have used, however, will be used again. That is, you assume the null hypothesis, draw random samples, introduce the independent variable, and calculate a mean difference on the dependent variable. If these differences cannot be attributed to chance, reject the null hypothesis and interpret the results.

12.1.2            At this point you may suspect that the normal curve is an indispensable part of modern statistical living. Up until now, in this tract, it has been. However, in the next sections you will encounter several sampling distributions, none of which is normal, but all of which can be used to determine the probability that a particular event occurred by chance. Deciding which distribution to use is not a difficult task but it does require some practise. Remember that a theoretical distribution is accurate if the assumptions on which it is based are true for the data from the experiment. By knowing the assumptions a distribution requires and the nature of your data, you can pick an appropriate distribution.

12.1.3            This section is about a theoretical distribution called the t distribution. The t is a lowercase one; capital T has entirely different meanings. The t distribution is used to find answers to the four kinds of problems listed below. The t distribution is used when  is not known and sample sizes are too small to ensure that s is a reliable estimate or . Problems 1, 2, and 4 are problems of hypothesis testing. Problem 3 requires the establishment of a confidence interval.             Did a sample with a mean  come from a population with a mean ?             Did two samples, with means 1 and 2 come from the same population?             What is the confidence interval about the difference between two sample means?             Did a Pearson product-moment correlation coefficient, based on sample data, come from a population with a true correlation of .00 for the two variables?

12.1.4            W. S. Gosset (1876-1937) invented the t distribution in 1908 after he was hired in 1899 by Arthur Guinness, Son & Company, a brewery in Dublin, Ireland to determine if a new strain of barley, developed by botanical scientists, had a greater yield than the old barley standard.

12.1.5            For more information, see "Gosset, W. S.," in Dictionary of National Biography, 1931-40, Lon­don: Oxford University Press, 1949, or L. McMullen & E. S. Pearson, "William Sealy Gosset, 1876­-1937," Biometrika, 1939,205-253.

12.1.6            Prohibited by the company to publish in Biometrika a journal founded in 1901 by Francis Galton, Gosset published his new mathematical statistics under the pseudonym “Student” which became known as “Student’s t.” (No one seems to know why the letter t was cho­sen. E. S. Pearson surmises that t was simply a "free letter"-that is, no one had yet used t to designate a statistic.) Since he worked for the Guinness Company all his life, Gosset continued to use the pseudonym "Student" for his publications in mathematical statistics. Gosset was very devoted to his company, working hard and rising through the ranks. He was appointed head brewer a few months before his death in 1937.

12.1.7            Gosset was confronted with the problem of gathering, in a limited amount of time, data about the brewing process. He recognized that the sample sizes were so small that s was not an accurate estimate of  and thus the normal-curve model was not appropriate. After working out the mathematics of distributions based on s, which is a statistic and, therefore, variable, rather than on , which is a parameter and, there­fore, constant, Gosset found that the theoretical distribution depended upon sample size, a different distribution for each N. These distributions make up a family of curves that have come to be called the t distribution.

12.1.8            In Gosset's work, you again see how a practical question forced the development of a statistical tool. (Remember that Francis Galton invented the concept of the correla­tion coefficient in order to assess the degree to which characteristics of fathers are­ found in their sons.) In Gosset's case, an example of a practical question was "Will this new strain of barley, developed by the botanical scientists, have a greater yield than our old standard?" Such questions were answered with data from experiments carried out on the ten farms maintained by the Guinness Company in the principal barley-growing regions of Ireland. A typical experiment might involve two one-acre plots (one planted with the old barley, one with the new) on each of the ten farms. Gosset then was confronted with ten one-acre yields for the old barley and ten for the new. Was the difference in yields due to sampling fluctuation, or was it a reliable dif­ference between the two strains? He made the decision using his newly derived t distribution.

12.1.9            We will describe some characteristics of the t distribution and then compare t with the normal distribution. The following two sections are on hypothesis testing: one section on samples that are independent of each other and one on samples that are cor­related. Next, you will use the t distribution to establish confidence intervals about a mean difference. Then you will learn the assumptions that are required if you choose to use a t test to analyse your data. Finally, you will learn how to determine whether a correlation coefficient is statistically significant. Problems 1-4, mentioned above, will be dealt with in order.

12.2 The t Distribution

12.2.1            Rather than just one t distribution, there are many t distributions. In fact, there is a t distribution for each sample size from 1 to . These different t distributions are described as having different degrees of freedom, and there is a different t distribution for each degree of freedom. Degrees of freedom is abbreviated df (which is a simple symbol; do not multiply d times f). We'll start with a definition of degrees of freedom as sample size minus 1. Thus, df = N - 1. If the sample consists of 12 members, df = 11.

12.2.2            Figure 9.1 is a picture of four of these t distributions, each based on a different number of degrees of freedom. You can see that, as the degrees of freedom become fewer, a larger proportion of the curve is contained in the tails.

12.2.3            You know from your work with the normal curve that a theoretical distribution is used to determine a probability and that, on the basis of the probability; the null-hy­pothesis is retained or rejected. You will be glad to learn that the logic of using the t distribution to make a decision is just like the logic of using the normal distribution.

12.2.4            Z Formula            

12.2.5            Recall that z is normally distributed. You probably also recall that, if z = 1. 96, the chances are only 5 in 100 that the mean  came from the population with mean .

12.2.6            Figure 9-1            

12.2.7            In a similar way, if the samples are small, you can calculate a t value from the formula

12.2.8            t Formula            

12.2.9            The number of degrees of freedom (df) determines which t distribution is appropriate, and from it you can find a t value that would be expected to occur by chance 5 times in 100. Figure 9.2 separates the t distributions of Figure 9.1. The t values in Figure 9.2 are those associated with the interval that contains 95 percent of the cases, leaving 2.5 percent in each tail. Look at each of the four curves.

12.2.10      If you looked at Figure 9.2 carefully, you may have been suspicious that the t distribution for df =  is a normal curve. It is. As df approaches the t distribution approaches the normal distribution. When df = 30, the t distribution is almost normal. Now you understand why we repeatedly cautioned, in chapters that used the normal curve, that N must be at least 30 (unless you know  or that the distribution of the population is symmetrical). Even when N = 30, the t distribution is more accurate than the normal distribution for assessing probabilities and so, in most research studies (that use samples), t is used rather than z.

12.2.11      A reasonable question now is "Where did those t values of 4.30, 2.26, 2.06, and 1.96 come from?" The answer is Table D. Table D is really a condensed version of 34 t distributions. Look at Table D and note that there are 34 different de­grees of freedom in the left-hand column.

12.2.12      Table D       

12.2.13      Across the top under “ Levels for Two-Tailed Test" you will see six selected probability figures, .20, .10, .05, .02, .01, and .001.

12.2.14      Follow the .05 column down to df = 2, 9, 25, and and you will find t values of 4.30, 2.26, 2.06, and 1.96.

12.2.15      Table D differs in several ways from the normal-curve table. In the normal-curve table, the z scores are on the margin of the table and the probability figures are in the body of the table.

12.2.16      Illustration of Normal Distribution       

12.2.17      Figure 9-2       

12.2.18      In the t-distribution table, the opposite is true; the t values are in the body of the table and the probability figures are on the top and bottom margins. Also, in the normal-curve table, you can find the exact probability of any z score; in Table D, the exact probability is given for only six t values. These six are commonly chosen as  levels by experimenters. Finally, if you wish to conduct a one-tailed test, use the probability figures shown under that heading at the bottom of Table D. Note that the probability figures are one-half those for a two-tailed test. You might draw a t distribu­tion, put in values for a two-tailed test, and see for yourself that reducing the probability figure by one-half is appropriate for a one-tailed test.

12.2.19      As a general rule, researchers run two-tailed tests. If a one-tailed test is used, a justification is usually given. In this text we will routinely use two-tailed tests.

12.2.20                We'll use student's t distribution to decide whether a particular sample mean came from a particular population.

12.2.21      A Belgian, Adolphe Quetelet ('Ka-tle) (1796-1874), is regarded as the first per­son to recognize that social and biological measurements may be distributed according to the "normal law of error" (the normal distribution). Quetelet made this discovery while developing actuarial (life expectancy) tables for a Brussels life insurance company. Later, he began making anthropometric (body) measurements and, in 1836, he developed Quetelet's Index (QI), a ratio in which weight in grams was divided by height in centimetres. This index was supposed to permit evaluation of a person's nutri­tional status: very large numbers indicated obesity and very small numbers indicated starvation.

12.2.22      Suppose a present-day anthropologist read that Quetelet had found a mean QI value of 375 on the entire population of French army conscripts. No standard deviation was given because it had not yet been invented. Our anthropologist, wondering if there has been a change during the last hundred years, obtains a random sample of 20 present­ day Frenchmen who have just been inducted into the Army. She finds a mean of 400 and a standard deviation of 60. One now familiar question remains, "Should this mean increase of 25 QI points be attributed to chance or not?" To answer this question, we will perform a t test. As usual, we will require p  .05 to reject chance as an explanation.

12.2.23      t Formula Logic       

12.2.24      Upon looking in Table D under the column for a two-tailed test with  = .05 at the row for 19 df, you'll find a t value of 2.09. Our anthropologist's t is less than 2.09 so the null hypothesis should be retained and the difference between present-day sol­diers and those of old should be attributed to chance.

12.2.25      Quetelet's Index is not currently used by anthropologists. There were several later attempts to develop a more reliable index of nutrition and most of those attempts were successful. Some of Quetelet's ideas are still around, though. For example, it was from Quetelet, it seems, that Francis Galton got the idea that the phenomenon of genius could be treated mathematically, an idea that led to correlation. (Galton seems to turn up in many stories about important concepts.)

12.3 Degrees of Freedom

12.3.1            Summary             The number of degrees of freedom is always equal to the number of observations minus the number of necessary relations obtaining among these observations OR The number of degrees of freedom is equal to the number of original observations minus the number of parameters estimated from the observations

12.3.2            You have been determining “degrees of freedom" by a rule-of-thumb technique: N - 1. Now it is time for us to explain the concept more thoroughly, in order to pre­pare you for statistical techniques in which df N - 1.

12.3.3            It is somewhat difficult to obtain an intuitive understanding of the concept of de­grees of freedom without the use of mathematics. If the following explanation leaves you scratching your head, you might read Helen Walker's [10] excellent article in the Jour­nal of Educational Psychology (Walker, 1940).

12.3.4            The freedom in degrees of freedom refers to freedom of a number to have any possible value. If you were asked to pick two numbers, and there were no restrictions, both numbers would be free to vary (take any value) and you would have two degrees of freedom. If, however, a restriction is imposed-namely, that X = 20-one de­gree of freedom is lost because of that restriction. That is, when you now pick the two numbers, only one of them is free to vary. As an example, if you choose 3 for the first number, the second number must be 17. The second number is not free to vary, because of the restriction that X = 20.

12.3.5            In a similar way, if you were to pick five numbers, with a restriction that X = 20, you would have four degrees of freedom. Once four num­bers are chosen (say, -5,3, 16, and 8), the last number (-2) is determined.

12.3.6            The restriction that X = 20 may seem to you to be an "out-of-the-blue" example and unrelated to your earlier work in statistics; in a way it is, but some of the statistics you have calculated have had a similar restriction built in. For example, when you found s, as required in the formula for t, you used some algebraic version of

12.3.7            Formula Standard Error of Mean for a Sample            

12.3.8            The restriction that is built in is that  (X - X) is always zero and, in order to meet that requirement, one of the X's is determined. All X's are free to vary except one, and the degrees of freedom for s is N - 1. Thus, for the problem of using the t distribution to determine whether a sample came from a population with a mean , df = N - 1. Walker (1940) summarizes the reasoning above by stating: "A universal rule holds: The number of degrees of freedom is always equal to the number of observations minus the number of necessary relations obtaining among these observations. " A necessary relationship for s is that  (X - X) = O. Another way of stating this rule is that the number of degrees of freedom is equal to the number of original observations minus the number of parameters estimated from the observations. In the case of s, one degree of freedom is subtracted because  is used as an estimate of .

12.4 Independent-Samples and Correlated-Samples Designs

12.4.1            Now we switch from the question of whether a sample came from a population with a mean, , to the more common question of whether two samples came from populations with identical means. That is, the mean of one group is compared with the mean of another group, and the difference is attributed to chance (null hypothesis re­tained) or to a treatment (null hypothesis rejected).

12.4.2            However there are two kinds of two-groups designs. With an independent­ samples design, the subjects serve in only one of the two groups, and there is no reason to believe that there is any correlation between the scores of the two groups. With a correlated-samples design, there is a correlational relationship between the scores of the two groups. The difference between these designs is important because the calcula­tion of the t value for independent samples is different from the calculation for corre­lated samples. You may not be able to tell which design has been used just by looking at the numbers; instead, you must be able to identify the design from the description of the procedures in the experiment. The design dictates which formula for t to use. The purpose of both designs, however, is to determine the probability that the two samples have a common population mean.

12.4.3            Clue to the Future             Most of the rest of this chapter is organized around independent-samples and correlated-samples designs. Three-fourths of Chapter 15 (Nonparametric Statistics) is also organized around these "two designs. In Chapters 12 (Analysis of Variance: One-Way Classification) and 13 (Analysis of Variance: Factorial Design), though, the procedures you will learn are appropriate only for independent samples.

12.4.4            Correlated-samples experiments are designed so that there are pairs of scores. One member of the pair is in one group, and the other member is in the second group. For example, you might ask whether fathers are shorter than their sons (or more reli­gious, or more racially prejudiced, or whatever).

12.4.5            Table 9-1            

12.4.6            Table 9-2            

12.4.7            The null hypothesis is fathers = sons. In this design, there is a logical pairing of father and son scores, as seen in Table 9.1. Sometimes the researcher pairs up two subjects on some objective basis. Subjects with similar grade-point averages may be paired, and then one assigned to the experi­mental group and one to the control group. A third example of a correlated-samples design is a before-and-after experiment, with the dependent variable measured before and after the same treatment. Again, pairing is appropriate: the' 'before" score is paired with the "after" score for each individual.

12.4.8            Did you notice that Table 9.1 is the same as Table 5.1, which outlined the basic requirement for the calculation of a correlation coefficient? As you will soon see, that correlation coefficient is a part of determining whether fathers = sons.

12.4.9            In the independent-samples design, the subjects are often assigned randomly to one of the two groups, and there is no logical reason to pair a score in one group with a score in the other group. The independent-samples design corresponds to the experi­mental design outlined in Table 8.1. An example of an independent-samples design is shown in Table 9.2. The null hypothesis to be tested is experimental = control.

12.4.10      Both of these designs utilize random sampling, but, with an independent-samples design, the subjects are randomly selected from a population of individuals. In a corre­lated-samples design, pairs are randomly selected from a population of pairs.

12.5 Using the t Distribution for Independent Samples

12.5.1            The experiments in this section are similar to those in Chapter 10, except that now you are confronted with data for which the normal curve is not appropriate because N is too small. As before, the two samples are independent of each other. "Independent" means that there is no relationship between the groups before the independent variable is introduced. Independence is often achieved by random assignment of subjects to one or the other of the groups. Some textbooks express this lack of relationship by calling this design a "noncorrelated design" or an "uncorrelated design. "

12.5.2            Using the t distribution to test a hypothesis is very similar to using the normal distribution. The null hypothesis is that the two populations have the same mean, and thus any difference between the two sample means is due to chance. The t distribution tells you the probability that the difference you observe is due to chance if the null hypothesis is true. You simply establish an  level, and if your observed difference is less probable than , reject the null hypothesis and conclude that the two means came from populations with different means. If your observed difference is more probable than , retain the null hypothesis. Does this sound familiar? We hope so.

12.5.3            The way to find the probability of the observed difference is to use a t test. The probability of the resulting t value can be found in Table D. For an independent-samples design, the formula for the t test is

12.5.4            Independent-samples t Test            

12.5.5       The t test, like many other statistical tests, is a ratio of a statistic over a measure of variability. 1 - 2 is a statistic and, of course, S1- 2 is a measure of variability. You have seen this basic form before and you will see it again.        .

12.5.6            Table 9.3 shows several formulas for calculating , S1- 2. Use formulas in the top half of the table when the two samples have an unequal number of scores. In the special situation where N1 = N2, the formulas simplify into those shown in the bottom half of Table 9.3. The deviation-score formulas are included in case you have to solve a problem without a calculator. If you have a calculator, you can work the problems more quickly by using the raw-score formulas.

12.5.7            The formula for degrees of freedom for independent samples is df = N1 + N2 - 2. The reasoning is as follows. For each sample, the number of degrees of freedom is N - 1, since, for each sample, (X - ) = O. Thus, the total degrees of freedom is (N1 - 1) + (N2 - 1) = N1 + N2 - 2.

12.5.8            Table 9-3            

12.5.9            Table 9-4            

12.5.10      Here is an example of an experiment in which the results were analysed with an independent-samples t test. Thirteen monkeys were randomly assigned to either an experimental group (drug) or a control group (placebo). (Monkey research is very expensive, so experiments are carried out with small N's. Thus, small­ sample statistical techniques are a must.) The experimental group (N = 7) was given the drug for eight days, while the control group (N = 6) was given a placebo (an inert substance). After eight days of injections, training began on a complex problem-solving task. Training and shots were continued for six days, after which the number of errors was tabulated. The number of errors each animal made and the t test are pre­sented in Table 9.4.

12.5.11      Figure 9-3      

12.5.12      The null hypothesis is that the drug made no difference-that the difference obtained was due just to chance. Since the N's are unequal for the two samples, the longer formula for the standard error must be used. Consulting Table D for 11 df, you'll find that a t = 2.20 is required in order to reject the null hypothesis with a = .05. Since the obtained t = -2.99, reject the null hypothesis. The final (and perhaps most important) step is to interpret the results. Since the experimental group, on the average, made fewer errors (39.71 vs. 57.33), we may conclude that the drug treatment facilitated learning. We will often express tabled t values as t.O5 (11 df) = 2.20. This gives you the critical value of t (2.20) for a particular df (11) and level of significance ( = .05).)

12.5.13      Notice that the absolute value of the obtained t ( |t| = |- 2.99 | = 2.99) is larger than the tabled t (2.20). In order to reject the null hypothesis, the absolute value of the obtained t must be as great as, or greater than, the tabled t. The larger the obtained | t I, the smaller the probability that the difference between means occurred by chance. Figure 9.3 should help you see why this is so. Notice in Figure 9.3 that, as the values of | t I become larger, less and less of the area of the curve remains in the tails of the distribution. Remember that the area under the curve is a probability.

12.5.14      Recall that we have been conducting a two-tailed test. That is, the probability figure for a particular t value is the probability of + t or larger plus the probability of - t or smaller. In Figure 9.3, t ,05 (11 df) = 2.201. This means that, if the null hypothesis is true, a t value of +2.201 would occur 2 1/2 percent of the time and a t value of -2.201 would occur 2 1/2 percent of the time.

12.5.15      If you are working these problems with paper and pencil, Table A, "Squares, Square Roots, and Reciprocals," will be an aid to you. For example, 1/7 + 1/6 is easily converted into .143 + .167 with the reciprocals column; I/N.  Adding decimals is easier than adding fractions.

12.5.16      Formulas and Procedure        Standard error of the difference between means              N1N2            Formula          Raw Score Formulas                Deviation Score Formulas                  Procedure               Variables Defined                 N1=N2            Formula          Raw Score Formulas                Deviation Score Formulas                Variables Defined          =Standard error of the difference between means            Procedure           

12.6 Using the t Distribution for Correlated Samples (Some texts use the term dependent samples instead of correlated sample)s

12.6.1            A correlated-samples design may come about in a number of ways. Fortunately, the actual arithmetic in calculating a t value is the same for any of the three correlated­ samples designs. The three types of designs are natural pairs, matched pairs, and repeated measures.

12.6.2            Natural Pairs             In a natural-pairs investigation, the experimenter does not assign the subjects to one group or the other-the pairing occurs prior to the investigation. Table 9.1 identifies one way in which natural pairs may occur-father and son. Problems 8 and 13 describe experiments utilizing natural pairs.

12.6.3            Matched Pairs             In some situations, the experimenter has control over the ways pairs are formed. Matched pairs can be formed in several ways. One way is for two subjects to be paired on the basis of similar scores on a pretest that is related to the dependent variable. For example, a hypnotic susceptibility test might be given to a group of subjects. Two examples of hypnotic suggestibility pre-tests are ;[11] [12]  Subjects with similar scores could be paired and then one member of each pair randomly assigned to either the experimental or control group. The result is two groups equivalent in hypnotizability.             Another variation of matched pairs is the split-litter technique used with non­human animals. Half of a litter is assigned randomly to each group. In this way, the genetics of one group is matched with that of the other. The same technique has been used in human experiments with twins or siblings. Student's barley experiments and the experiment described in Problem 9 are examples of starting with two similar sub­jects and assigning them at random to one of two treatments.             Still another example of the matched-pairs technique is the treatment of each member of the control group according to what happens to its paired member in the experimental group. Because of the forced correspondence, this is called .a yoked­ control design. Problem 11 describes a yoked-control design.             The difference between the matched-pairs design and a natural-pairs design is that, with the matched pairs, the investigator can randomly assign one member of the pair to a treatment. In the natural-pairs design, the investigator has no control over assignment. Although the statistics are the same, the natural-pairs design is usually open to more interpretations than the matched-pairs design.

12.6.4            Repeated Measures             A third kind of correlated-samples design is called a repeated-measures design because more than one measure is taken on each subject. This design often takes the form of a before and-after experiment. A pretest is given, some treatment is adminis­tered, and a post-test is given. The mean of the scores on the post-test is compared with the mean of the scores on the pretest to determine the effectiveness of the treatment. Clearly, there are two scores that should be paired: the pretest and the post-test scores of each subject. In such an experiment, each person is said to serve as his or her own control. .             All three of these methods of forming groups have one thing in common: a mean­ingful correlation may be calculated for the data. The name correlated samples comes from this fact. With a correlated-samples design, one variable is designated X, the other Y.

12.6.5            Calculating a t Value for Correlated Samples             . The formula for t when the data come from correlated samples has a familiar theme: a difference between means divided by the standard error of the difference. The standard error of the difference between means of correlated samples is symbolized *. One formula for a t test between correlated samples is                 where             =             df=N-1, where N= the number of pairs             The number of degrees of freedom in a correlated-samples case is the number of pairs minus one. Although each pair has two values, once one value is determined, the             other is restricted to a similar value. (After all, they are called correlated samples.) In addition, another degree of freedom is subtracted when  is calculated. This loss is similar to the loss of 1 df  when s is calculated.             As you can see by comparing the denominator of the correlated-samples t test with that of the t test on for independent samples (when N1 =N2), the dif­ference lies in the term 2rxy (S)(S). Of course, when rxy = 0, this term drops out of the formula, and the standard error is the same as for independent samples.             Also notice what happens to the standard-error term in the correlated-samples case where r > 0: the standard error is reduced. Such a reduction will increase the size of t. Whether this reduction will increase the likelihood of rejecting the null hypothesis depends on how much t is increased, since the degrees of freedom in a correlated­ samples design are fewer than in the independent-samples design.        The formula = is used only for illustration purposes. There is an algebraically equivalent but arithmetically easier calculation called the direct-difference method, which does not require you to calculate r. To find the  by the direct-difference method, find the difference between each pair of scores, calculate the standard deviation of these difference scores, and divide the standard deviation by the square root of the number of pairs.        To find a t value using the direct-difference method,        T value using Direct Difference Method             Here is an example of a correlated-samples design and a t-test analysis. Suppose you were interested in the effects of interracial contact on racial attitudes. You have a fairly reliable test of racial attitudes, in which high scores indicate more positive atti­tudes. You administer the test one Monday morning to a biracial group of fourteen 12­year-old boys who do not know each other but who have signed up for a weeklong community day camp. The campers then spend the next week taking nature walks, playing ball, eating lunch, swimming, and doing the kinds of things that camp direc­tors dream up to keep 12-year-old boys busy. On Saturday morning, the boys are again given the racial-attitude test. Thus, the data consist of 14 pairs of before-and-after scores. The null hypothesis is that the mean of the population of "before" scores .is equal to the mean of the population of "after" scores or, in terms of the specific ex­periment, that a week of interracial contact has no effect on racial attitudes.        Suppose the data in Table 9.5 were obtained. We will set = .01 and perform the analysis. Using the sum of the D and D2 columns in Table 9.5, we can find .        Table 9-5             Since t.01 (13 df) = 3.01, this difference is significant beyond the .01 level. That is, p < .01, The "after" mean was larger than the "before" mean; therefore, we may conclude that, after the week of camp, racial attitudes were significantly more positive than before.        You might note that -  = , the mean of the difference scores. In the prob­lem above, D = -8l and N = 14, so = D/N = -81i14 = -5.78.        Gosset preferred the correlated-samples design. In his agriculture experiments, he found a significant correlation between the yields of the old barley and the new barley grown on adjacent plots. This correlation reduced the standard-error term in the denominator of the t test, making the correlated-samples design more sen­sitive than the independent-samples design for detecting a difference between means.        Illustration Formulas              Formula (Illustration formula)                Variables Defined            *       =standard error of the difference between correlated means (direct-difference method)                         =            df=N-1            N=number of pairs            sor s=Standard Error of Mean (see formula below)            =Correlation between X & Y              Formula Standard Error of Mean            s =s/*              Variables Defined Standard Error of Mean            s or s=standard error of the mean of X or Y scores            s=standard deviation of a sample            N=sample size              Procedure            sor s=Standard Error of Mean          s             Determine the standard deviation of X scores             Determine the square root of the total number of scores             Divide the product of step #1 (standard deviation of X scores) by the product of step #2 (square root of the number of X scores)          s             Determine the standard deviation of Y scores             Determine the square root of the total number of scores             Divide the product of step #1 (standard deviation of Y scores) by the product of step #2 (square root of the number of Y scores)          *             Square s (multiply it by itself)             Square s (multiply it by itself)             Add Squared s to Squared s             Determine the  (Correlation between X & Y)             Multiply the  by 2             Multiply s by s             Multiply the product of step #6 (s Xs s) by the product of step #5 ( Xs 2)             Subtract the product of step #7 (( Xs 2)  Xs (s Xs s)) from the product of step #3 (Squared s + Squared s)             Obtain the square root of step #8 to obtain the * score            (t) value           Computation Formula (Direct-Difference Method)              Formula                Variables Defined            = standard error of the difference between correlated means (direct-difference method)          =            =Standard deviation of the distribution of differences between correlated scores (direct-difference method)             D=X-Y            N=Number of pairs of scores              Procedure            () Standard deviation of the distribution of differences between correlated scores (direct-difference method)          Create a column with the difference between the means. That is find the difference between each pretest and posttest  score (minus the posttest from the pretest) and put that number in a column          Create a column with the squared differences between the means. That is multiply the difference between the means by itself          Sum the column of squared differences (the column created in step 2)          Sum the column of differences (step1) and square the sum (multiply it by itself). Then divide this product by the number of score pairs.          Minus the product of the previous step (step 4) from the sum of the squared differences (step 3)          Take the number of score pairs and minus 1 from that number          Divide the product of step 5 by the product of step 6 to determine the () score            (t) Score          Find the difference between  and          Obtain the square root of the number of score pairs          Divide  by the product of step 2 to obtain the t score           

12.7 Using the t Distribution to Establish a Confidence Interval about a Mean Difference

12.7.1            Introduction             This involves using the t Distribution to establish a confidence interval about a mean difference             Establishes an upper and lower limit of the difference between the means usually with a 95% degree of confidence which would still allow for the rejection of the null hypothesis.             As you probably recall from Chapter 9 (Samples and Sampling Distributions), a confidence interval is a range of values within which a parameter is expected to be. A confidence interval is established for a specified degree of confidence, usually 95 percent or 99 percent.             In this section, you will learn how to establish a confidence interval about a mean difference. The problems here are similar to those dealt with in Chapter 9, except that                  Probabilities will be established with the t distribution rather than with the normal distribu­tion.                  2. The parameter of interest is a difference between two population means rather than a popula­tion mean.             The first point can be dispensed with rather quickly. You have already practiced using the t distribution to establish probabilities; you will use Table D in this section, too.             The second point will require a little more explanation. The questions you have been answering so far in this chapter have been hypothesis-testing questions, of the form "Does1 -2 =0?" You answered each question by drawing two samples, calculating the means, and finding the difference. If the probability of the difference was very small, the hypothesis H0: 1 -2 =0 was rejected. Suppose you have re­jected the null hypothesis but someone wants more information than that and asks, “What is the real difference between 1 and 2?" The person recognizes that the real difference is not zero but wonders what it is. You are being asked to make an esti­mate of 1 -2. You establish a confidence interval about the difference between 1 and  2 or  and , you can state with a specified degree of confidence that 1 -2  falls within the interval.

12.7.2            Confidence Intervals for Independent Samples             The sampling distribution of 1 -  2 is a t distribution with N1 + N 2 - 2 degrees of freedom. The lower and upper limits of the confidence interval about a mean difference are found with the following formulas:             Confidence Interval Upper and Lower Limits      ­             For a 95 percent confidence interval, use the t value in Table D associated with =. 05. For 99 percent confidence change  to .01.             For an example, we will use the calculations you worked up in Problem 16 on the time required to do problems on the two different brands of desk calculators. We will establish a 95 percent confidence interval about the difference found.                         As your calculations revealed,             Confidence Interval Calculation                      Thus, .65 and 2.35 are the lower and upper limits of a 95 percent confidence interval for the mean difference between the two kinds of calculators.             One of the benefits of establishing a confidence interval about a mean difference is that you also test the null hypothesis, 1 -2 =0, in the process (see Natrella, 1960)[13]. If 0 were outside the confidence interval, then the null hypothesis would be rejected using hypothesis-testing procedures. In the example we just worked, the confidence interval was .65 to 2.35 minutes; a value of 0 falls outside this interval. Thus, we can reject H0: 1 -2 =0 at the .05 level.             Sometimes, hypothesis testing is not sufficient and the extra information of con­fidence intervals is desirable. Here is one example of how this “extra information" on confidence intervals might be put to work in this calculator-purchasing problem. Sup­pose that the new brand is faster, but it is also more expensive. Is it still a better buy?        Through cost-benefit-analysis procedures, the purchasing agent can show that, given a machine life of five years, a reduction of time per problem of 1.7 minutes justifies the increased cost. If she has the confidence interval you just worked out, she can see im­mediately that such a difference in machines (1.7 minutes) is within the confidence interval. The new machines are the better buy.

12.7.3            Confidence Intervals for Correlated Samples             The sampling distribution of  -  is also a t distribution. The number of de­grees of freedom is N - 1. As in the section on hypothesis testing of correlated samples, N is the number of pairs of scores. The lower and upper limits of the confidence inter­val about a mean difference between correlated samples are             Confidence Interval Correlated Samples                      A word of caution is appropriate here. For confidence intervals for either independent or correlated samples, use a t value from Table D, not one calculated from the data.             The interpretation of a confidence interval about a difference between means is very similar to the interpretation you made of confidence intervals about a sample mean. Again, the method is such that repeated sampling from two populations will pro­duce a series of confidence intervals, 95 (or 99) percent of which will contain the true difference between the population means. You have sampled only once so the proper interpretation is that you are 95 (or 99) percent confident that the true difference falls between your lower and upper limits. It would probably be helpful to you to reread the material on interpreting a confidence interval about a mean, (Confidence Intervals).             Degrees of Freedom                  N-1             t score                  Use the t score from the table at alpha .05             Formulas                  Upper Limit (UL)                (( (mean)-  (mean))+t*()                  Lower Limit (LL)                ((- )-(t*())                  Variables Defined                = standard error of the difference between correlated means (direct-difference method)               =                =Mean of X scores                = Mean of Y scores                (t)=This is the t value form the back of a statistics textbook (t distribution table) or from a t value calculator from the Web                N=number of pairs of scores                df=the degrees of freedom for this equation is N-1               Example                  Procedure                Upper Confidence Interval Calculation               Subtract the Mean of Y scores from the Mean of X scores               Multiply  by the t score found in the table. Look across from the degrees of freedom (N-1) and under the alpha level .05. .02, .001 ect               Add the product of step #1 to the product of step #2 for the upper limit confidence interval                Lower Confidence Interval Calculation               Subtract the Mean of Y scores from the Mean of X scores               Multiply  by the t score found in the table. Look across from the degrees of freedom (N-1) and under the alpha level .05. .02, .001 ect               Subtract the product of step #1 to the product of step #2 for the lower limit confidence interval                 

12.8 Assumptions for Using the t Distribution

12.8.1            You can perform a t test on the difference between means on any two-group data you have or any that you can beg, borrow, buy, or steal. No doubt about it, you can easily come up with a t value using

12.8.2            Independent-samples t Test            

12.8.3            You can then attach a probability figure to your t value by deciding that the t distribu­tion is an appropriate model of your empirical situation.

12.8.4            In a similar way, you can calculate a confidence interval about the difference be­tween means in any two-group experiment. By deciding that the t distribution is an accurate model, you can claim you are “99 percent confident that the true difference between the population means is between thus and so."

12.8.5            But should you decide to use the t distribution? When is it an accurate reflection of the empirical probabilities?

12.8.6            The t distribution will give correct results when the assumptions it is based on are true for the populations being analysed. The t distribution, like the normal curve, is a theoretical distribution. In deriving the t distribution, mathematical statisticians make three assumptions.             The dependent-variable scores for both populations are nonnal1y distributed.             The variances of the dependent-variable scores for the two populations are equal.             The scores on the dependent variable are random samples from the population.

12.8.7            Assumption 3 requires three explanations. First, in a correlated-samples design, the pairs of scores should be random samples from the population you are interested in.

12.8.8            Second, Assumption 3 ensures that any sampling errors will fall equally into both groups and that you may generalize from sample to population. Many times it is a physical impossibility to sample randomly from the population. In these cases, you should randomly assign the subjects available to one of the two groups. This will randomise errors, but your generalization to the population will be on less secure grounds than if you had obtained a truly random sample.

12.8.9            Third, Assumption 3 ensures the independence of the scores. That is, knowing one score within a group does not help you predict other scores in that same group. Either random sampling from the population or random assignment of subjects to groups-will serve to achieve this independence.

12.8.10      Now we can return to the major question of this section: "When will the t distri­bution produce accurate probabilities?" The answer is "When random samples are ob­tained from populations that are normally distributed and have equal variances. "

12.8.11      This may appear to be a tall order. It is, and in practice no one is able to demonstrate these characteristics exactly. The next question becomes “Suppose I am not sure my data have these characteristics. Am I likely to reach the wrong conclusion if I use Table D?" The answer to this question, fortunately, is "No."

12.8.12      The t test is a "robust" test, which means that the t distribution leads to fairly accurate probabilities, even when the data do not meet Assumptions 1 and 2. Boneau (1960)[14] used a computer to generate distributions when these two assumptions were violated. For the most part, he found that, even if the populations violate the assump­tions, the t distribution reflects the actual probabilities. Boneau's most serious warning is that, when sample sizes are different (for example, N1 = 5 and N2 = 15), then a large violation of Assumption 2 (for example, one variance being four times the size of the other) produces a t value for which the tabled t distribution is a poor model. Under such circumstances, you may reject H0 when you should not.

12.8.13      Chapter 15 will give you other statistics with other distributions that you may use to test the difference between two samples when the first two assumptions of the t test are not valid.

12.9 Using the t Distribution to Test the Significance of a Correlation Coefficient

12.9.1            In Chapter 5, you learned to calculate Pearson product-moment correlation co­efficients. This section is on testing the statistical significance of these coefficients. The question is whether an obtained r, based on a sample, could have come from a population of pairs of scores for which the parameter correlation is .00. The answer to this question is based on the size of a t value that is calculated from the correlation coefficient. The t value is found using the formula

12.9.2            (t) Value Using Correlation Coefficient                

12.9.3            The null hypothesis is that the population correlation is .00. Samples are drawn, and an r is calculated. The t distribution is then used to determine whether the obtained r is significantly different from .00.

12.9.4                      As an example, suppose you had obtained an r = .40 with 22 pairs of scores.

12.9.5            Does such a correlation indicate a significant relationship between the two variables, or should it be attributed to chance?

12.9.6            (t) Value Example            

12.9.7            Table D shows that, for 20 df, a t value of 2.09 is required to reject the null hypothesis. The obtained t for r = .40, where N = 22, is less than the tabled t, so the null hypothesis is retained. That is, a coefficient of .40 would be expected by chance alone more than 5 times in 100.

12.9.8            In fact, for N = 22, an r = .43 is required for significance at the .05 level and an r = .54 for the .01 level. As you can see, even medium-sized correlations can be expected by chance alone for samples as small as 22. Most researchers strive for N's of 30 or more for correlation problems.

12.10                   START

12.10.1      Sometimes you may wish to determine whether the difference between two corre­lations is statistically significant. Several texts discuss this test (Ferguson, 1976, p. 184 [15] and Guilford & Fruchter, 1978, p. 163) [16].

12.11                   Purpose

12.11.1      This test assesses whether the means of two groups are statistically different from one another. The t-test could be used to assess the effectiveness of a treatment by comparing the means of the treatment and control groups or alternately to compare the means of the same group pre and post treatment to assess the effectiveness of treatment. In any case this test is indicated when you want to compare the means of two groups especially in the analysis for the posttest-only two-group randomized experimental design.

12.11.2      T-Test for the Significance of the Difference between the Means of Two Correlated Samples         

12.11.3      Example        You could substitute the control group mean with pre treatment group mean and the treatment group with the post treatment group me