Outline

 

Introduction to Regression

 

Discussion of the equation for a straight line

 

Discussion of the regression equation

 


 

CORRELATION: Used to measure relationships.

 

REGRESSION: Used to make predictions on the basis of some relationship, so that some error in the prediction is eliminated.  For example, see the Figure.

 

 

 

 

 

 

 

 

 

 Figure 16-16  (p. 553)
Hypothetical data showing the relationship between SAT scores and GPA with a regression line drawn through the data points. The regression line defines a precise, one-to-one relationship between each X value (SAT score) and its corresponding Y value (GPA).

 

 

The line serves the following purposes.

 

1.         Makes the relation between X (SATs) and Y (GPAs) easier to see.

 

2.         The line identifies the center (central tendency) of the relation.

 

3.         The line can be used for prediction.  It establishes a precise relation between each X and a corresponding Y.

 

Our goal is to develop a procedure that identifies and defines the straight line that provides the best fit for any specific set of data. 

 

What does "best fit" mean?  We want the error between the actual Y value and the predicted Y value to be minimized.

 

The best fitting line is the one that has the smallest total error, so that the variability of Y scores about the regression line is as small as possible.

 

This line can be presented by a simple equation.  So we need to find the equation for the line that best describes the relationship.

 

Examples will be discussed in class.

 

 

Y = bX + a

 

 

 

 

 

 

 

 

 

 

Figure 16-17  (p. 555)
Relationship between total cost and number of hours playing tennis. The tennis club charges a $25 membership fee plus $5 per hour. The relationship is described by a linear equation:  Total cost = $5 (number of hours) + $25  Y = bX + a.