ED 602

Statistical Research for Behavioral Sciences

Brian G. Smith, Ph.D.

Lesson -8

You may also check your understanding of the material on the Wadsworth web site. Click on the Publisher Help Site button.

 

Homework - Lesson 8

Any student may may do the assignments from any area. You may run through this work an unlimited number of times. If you make errors, you will be referred to the appropriate area of the book for re-study.

 

.

 

Assessment - Lesson 8

You will have two options to take the quiz. If you fail to achieve 100% on the quiz, you will not able to advance to the next lesson. After failing on the second take, please email the instructor at ed602@mnstate.edu so remedial action can be taken.

Homework and Quizzes are on Desire 2 Learn. Click on the Desire 2 Learn link, log in, select the Homework/Quizzes icon and choose the appropriate homework or quiz.

Assignments and Information

 
Reading: Chapter 16
  Definition Page: Contains definitions arranged alphabetically.

 

Notes
So far in this course we have been looking at describing data from experiments. What change do we see in our dependent variable when we manipulate the independent variable? Correlation asks a very different question. Correlation looks at the statistical relationship between two variables.
Histograms were a graphic representation of frequency distributions. They let us see the basic shape of our distribution, so we would know if it is normal or skewed, unimodal or multimodal, tightly compressed or spread out. For correlation, we use scatterplots to visually inspect our data.

Using Scatterplots

  • plot 2 variables, one on the X axis, the other on the Y axis
  • look for relationships
  • positive relationship - points on the graph go from the lower left corner to the upper right. In other words, as X increases so does Y.
  • negative relationship - points on the graph go from the upper left corner to the lower right. In other words, as X increases, Y decreases.
  • near-zero relationship - points on the graph are all over. In other words, there is no obvious relationship between X and Y.
  • non-linear relationship (also called U-shaped or curvilinear) - The points in the graph move in one direction for a while and then turn and go the other direction again. One example would be, as X increases Y increases for a while then decreases. (think of weight and health - if X starts low with malnourished subjects, then increasing the weight would increase the health, however once you pass the normal weight range and start getting to obese weights, the health starts dropping again.)
  • strength of relationship - the stronger the relationship the more accurately you can predict the Y score for a known X score. Strong relationships look like a line, weak relationships look more like a football.

Pearson Correlation Coefficient (also called the Pearson r and the Pearson product-moment Correlation Coefficient) - a statistic that indicates the degree of linear relationship between two variables measured at the interval or ratio level.

Using the Pearson r

  • The value of r gives you information about the strength and direction of the relationship
  • The closer the r is to (Joe, please add a plus or minus sign here)1.00, the stronger the relationship is. The closer the r is to 0.00, the weaker the relationship is.
  • A positive r means a positive relationship.
  • A negative r means a negative relationship.
  • r has a range of -1.00 to +1.00, so if you get an r higher than 1.00 in either direction, that means you goofed something up in the math.
  • for most educational research you need your r to be significant at an alpha level of 0.05 or lower.

Before you can use Pearson r for your data, you must make sure these assumptions have been met.

  • You are using interval or ratio scale data.
  • There is a linear relationship between the variables. You can not use the Pearson r of data that has a scatterplot that is non-linear.
  • You used random sampling for your subjects.
  • Your data is fairly normally distributed
  • For each possible X on the scale, there “should be” a Y score. A restricted range of scores could hide the true relationship between variables.

Correlation and Causality

Correlation can show a relationship between two variables, but you can almost never claim a cause and effect relationship. You don’t want to be the researcher who says that there should be a ban on ice cream sales because you found that there is a strong positive relationship between ice cream sales and boating accidents. While it is true that ice cream sales and boating accidents both increase dramatically in the summer, you can’t really claim that eating ice cream causes boating accidents. There is a possible outside influence (summer) that could play a part in that high correlation.

Coefficient of Determination (r2)

  • The value of Pearson r squared.
  • Shows you how much shared information there is between two variables
  • Spearman Rank-order Correlation Coefficient (rs)
  • Used in place of the Pearson r when data on both variables is in rank order (on the ordinal scale)
  • A positive r means a positive relationship
  • A negative r means a negative relationship.
  • r has a range of -1.00 to +1.00, so if you get an r higher than 1.00 in either direction, that means you goofed something up in the math.
  • for most educational research you need your r to be significant at an alpha level of 0.05 or lower.

Now lets try some calculations based on our BASC study.

Lets assume that while we are gathering data on the BASC to look for cultural bias, we become curious about how these scores can be used. Is there a relationship between BASC scores and the number of detentions a student will get in a semester? Is there a relationship between BASC scores and scores on a much easier to administer test of sociability? Is there a relationship between BASC scores and IQ? We gather the following data on our 25 Hispanic youth that are participating in our original study.

Student #
BASC
Detentions
Sociability
IQ
         
1
25
0
73
101
2
32
0
78
87
3
37
0
70
96
4
40
0
69
105
5
41
1
74
89
6
44
2
73
115
7
45
2
67
94
8
48
0
71
91
9
49
1
66
109
10
50
0
69
102
11
51
1
68
99
12
51
1
62
98

13

52
2
59
103
14
52
3
46
85
15
52
2
55
107
16
55
4
58
103
17
55
6
39
83
18
57
5
46
112
19
59
5
41
97
20
60
6
38
101
21
62
9
40
86
22
63
7
36
104
23
65
6
28
108
24
67
10
29
95
25
73
12
14
100

Using our SPSS program, we fill this data in on the spreadsheet in columns 1-4. Make sure to save the data since we will be using it again in lesson 9. Then go to the tool bar at the top, and click on “graphs”. Slide down the list to “Scatter...” and click again. You want to keep it on “simple” and click “define”. I put the BASC data on the X axis, and the other variables were placed on the Y axis, one at a time. Here are the graphs I got, see if they seem to match yours.

The information typed underneath each graph also came from SPSS. Go up to the tool bar and click “Analyze”, “correlate”, “bivariate” and select the two variables you are interested in.

           
a Correlations
BASC DETENTION
BASC Pearson Correlation 1 .855
Sig. (2-tailed) . .000
N 25 25
DETENTION Pearson Correlation .855 1
Sig. (2-tailed) .000 .
N 25 25
** Correlation is significant at the 0.01 level (2-tailed).

This, in English, states that there is a Pearson r of 0.855 which is significant at and alpha of less then 0.01.

There are 25 subjects in each group.

           
           
b Correlations
BASC SOCIABILITY
BASC Pearson Correlation 1 -.892
Sig. (2-tailed) . .000
N 25 25
SOCIABILITY Pearson Correlation -.892 1
Sig. (2-tailed) .000 .
N 25 25
** Correlation is significant at the 0.01 level (2-tailed).

This, in English, states that there is a Pearson r of 0.896 which is significant at and alpha of less then 0.01.

There are 25 subjects in each group.

           
           
c Correlations
BASC IQ
BASC Pearson Correlation 1 .089
Sig. (2-tailed) . .672
N 25 25
IQ Pearson Correlation .089 1
Sig. (2-tailed) .672 .
N 25 25

This, in English, states that there is a Pearson r of 0..089 which is not significant. This is also obvious by the grouping on the graph.

There are 25 subjects in each group.

 

           

 

SPSS Tips:

 

 

We can see from the graphs and the Pearson r scores that there is a positive relationship between our BASC scores and the number of detentions a student serves. The higher the BASC, the more detentions they are likely to do. We can also see that we have an r of 0.855, which is statistically significant at an alpha of 0.01. (On page 709 in the text you will find the table of critical values of r. We have 25 pairs of scores, so our degrees of freedom are 2 less, which is 23. We would need an r of at least 0.396 to have significance at the alpha of 0.05, or an r of 0.505 for significance at the 0.01 alpha level. We are well above both, and would report our significance at the 0.01 alpha level.)

We can see from the graphs and the Pearson r scores that there is a negative relationship between BASC scores and scores on the sociability scale. (This is because BASC scores get worse as they get higher, and sociability scores get better as they get higher.) The higher a student’s score on the BASC the lower their score is likely to be on the sociability measure. We can also see that we have an r of -0.892, which again is significant at the 0.01 alpha level.

Finally, there seems to be no relationship between BASC scores and IQ scores. The points are all over the graph. The Pearson r for this correlation is 0.089, which is not significant at either the 0.05 or the 0.01 alpha level.

For those of you who want to try the math by hand, to get a better feel for how the formula works, lets double check the computers r for BASC scores and Sociability. I have made a table of the scores, their squares, and the total of the BASC scores times the sociability scores.

We will use the computational formula, since it works directly with our raw scores.

     
 
 
 
a) We fill in the data from the table
a.
 
 
b) We do our multiplying and exponents
b.
 
 
c) We do the subtracting within the parentheses
c.
 
 
d) We multiply again
d.
 
 
e )We find the square root
e.
 
 
f) And finally, we divide.
f.

 

As you can see, we end up with the same r, -0.892, that the SPSS program gave us.

Vocabulary

Bivariate distribution – A distribution in which two scores are obtained from each subject.

Scatterplot – A graph of a bivariate distribution in which the X variable is plotted on the horizontal axis and the Y variable is plotted on the vertical axis.

Correlational studies – Studies in which two or more variables are measured to find the direction and degree to which they covary.

Covary – Two variables covary when a change in one variable is related to a change in the other variable.

Linear relationship – A relationship between two variables that can be described by a straight line.

Positive relationships – A linear relationship between two variables in which as the value of the first variable increases, the value of the second variable tends to increase as well.

Negative relationship – A linear relationship between two variables in which as the value of one variable increases, the value of the other variable tends to decrease.

Curvilinear relationship – A relationship between two variables that is not linear. It starts with a positive or negative trend, but at some midpoint changes direction forming a U-shaped curve.

Near- Zero relationship – A bivariate distribution that has no obvious relationship between variables.

Correlation coefficient – A descriptive statistic that expresses the degree of relationship between two variables.

Pearson correlation coefficient ( r ) – A statistic that indicates the degree of linear relationship between two variables that have been measured in either interval or ratio level.

Sums of Products of X and Y ( SPXY) - The value of SP= Sum(X-)(Y-) for variables X and Y.

Coefficient of determination ( r2 ) – The value of r2 indicating the common variance of variables X and Y.

Spearman rank-order correlation coefficient ( rs) – a correlation coefficient used with ordinal measurements.