Reliability of Measures

 

Reliability refers to the consistency or stability of a measure.  A good measure of a variable should be reliable.

 

Any observed score on a measure of a variable reflects a true score on the variable plus measurement error.

 

A reliable measure has little measurement error. Thus, the observed score on the measure reflects the true score on the variable.

 

A measure of a variable that is unreliable cannot be used to study the variable.

 

Reliability is often assessed by measuring the same people more than once.  A reliable measure will yield the same or almost the same score each time.

 

Reliability can sometimes be established through replication.

Assessing Reliability

 

Test-retest reliability: Established when two successive (at Time 1 and Time 2) measurements are closely related. (The relation is established by calculating the correlation between the two).

 

Inter-rater reliability: Established when two observers simultaneously measure the DV and their measures are closely related.

 

Split-half reliability: Established when scores from two halves of a test are closely related. This is an internal consistency measure and is based on the idea that one should respond consistently to all the questions, hence the expectations that scores on the two halves should be related.

 

Odd-even reliability: Established when scores from the odd-numbered items are closely related to the scores from the even-numbered items.

 

Item-total reliability: Established when the scores on each item on a measure are closely related to the total score on the measure.  Often used to eliminate items that have low correlations with the total score. (Another index of internal consistency).

 

Validity of Measures

 

The extent to which the measure of a variable actually measures what it claims to measure.  More generally, validity is the extent to which an operational definition of a variable does in fact reflect the underlying conceptual variable.

 

It is often assessed by assessing the construct validity of the DV.

 

Construct validity

 

Refers to the extent to which an abstract construct (e.g. intelligence) can be inferred from the operational definition of that construct.

 

Do the experimental operations represent the construct of interest to the experimenter?

 

Construct validity is established when the manipulation of a variable relates to other manipulations in theoretically meaningful ways.

 

Construct validity can be achieved by:

·        Providing a clear definition of the abstract construct (pre-testing may be necessary)

·        Obtaining data to show that the empirical representation of the IV does not vary with measures of related but different conceptual variables.

 

Validity can also be established by obtaining convergent and divergent data.

 

Convergent data: data indicating agreement in several measures of the same construct.  (e.g. correlating scores on an IQ test with scores on a different IQ test)

 

Divergent data: data indicating lack of relationship between measures of different constructs. (e.g. IQ scores should not be related to anxiety scores, or depression scores).

 

Other types of validity include:

 

Face validity: A general estimate of whether a measure appears to measure the variable being studied.

 

Predictive or Criterion validity: Established when the measure of a predictor variable relates to scores on a criterion variable.

 

Concurrent validity: Established when scores from a new measure are closely related to scores obtained from a more established measure of the same variable.