Bivariate Correlation & Regression

 Correlation measures:

        1.  Direction of relationship (+ or -)

        2.  Form of relationship (linear is most common)

        3.  Degree of relationship (measured by the numerical value of the correlation.  A value of 1.00 indicates a perfect relationship and a value of zero indicates no relationship)

Examples of different values for linear correlations: (a) shows a strong positive relationship, approximately +0.90; (b) shows a relatively weak negative correlation, approximately 0.40; (c) shows a perfect negative correlation, 1.00; (d) shows no linear trend, 0.00.
To compute a correlation you need two scores, X and Y, for each individual in the sample. 
The Pearson correlation requires that the scores be numerical values from an interval or ratio scale of measurement. 
Other correlational methods exist for other scales of measurement.

Conceptually, Pearson correlation coefficient computes

Regression--the statistical technique used to find the best-fitting straight line (i.e., the regression line) for a set of data.

The total error between the data points and the line is obtained by squaring each distance and then summing the squared values.   Total squared error = Ʃ(Y Ŷ)2
The regression equation is designed to produce the minimum sum of squared errors.
Best fitting line has the smallest total squared error: the line is called the least-squared-error solution and is expressed as: Ŷ = bX + a

where the value of "b" is the slope and the value of "a" is the y-intercept



Caution when interpreting predicted values from regression equations:

1) The predicted value is not perfect. Three will be some error between predicted Y values and the actual data...As absolute value of the correlation coefficient gets closer to zero, the magnitude of the error will increase.

The unpredicted variability can be used to compute the standard error of estimate, which is a measure of the average distance between the actual Y values and the predicted Y values.

The standard error of estimate (SEE) provides a measure of how accurately the regression equation predicts the Y values.  For example, SEE of 2.16 would tell us that the standard or average distance between the actual data points and the regression line is 2.16 units.  There will be, on average, 2.16 units of discrepancy between our predicted values we obtain using the regression equation and the actual values in the data.


 2) The regression equation should not be used to make predictions for X values that fall outside of the range of values covered by the sample data.


coefficient of determination  *   

Indicates the size or strength of the relationship between x and y.  Measures the proportion of y variability that is associated with the x variable.

categories for r2 (same as for t-test or ANOVA)

.01 for small

.09 for medium

.25 or larger for a large correlation

In regression r2 also provides a measure of the accuracy of the prediction using the regression equation. Because r2 measures the predicted portion of variability in the Y scores, we can use the expression (1 - r2) to measure the unpredicted portion or residual variability.