Reliability
refers to the consistency or stability of a measure. A good measure of a variable should be reliable.
Any
observed score on a measure of a variable reflects a true score on the variable
plus measurement error.
A
reliable measure has little measurement error. Thus, the observed score on the
measure reflects the true score on the variable.
A
measure of a variable that is unreliable cannot be used to study the variable.
Reliability
is often assessed by measuring the same people more than once.
A reliable measure will yield the same or almost the same score each
time.
Reliability
can sometimes be established through replication.
Assessing
Reliability
Test-retest
reliability:
Established when two successive (at Time 1 and Time 2) measurements are closely
related. (The relation is established by calculating the correlation between the
two).
Inter-rater
reliability:
Established when two observers simultaneously measure the DV and their measures
are closely related.
Split-half
reliability:
Established when scores from two halves of a test are closely related. This is
an internal consistency measure and is based on the idea that one should respond
consistently to all the questions, hence the expectations that scores on the two
halves should be related.
Odd-even
reliability:
Established when scores from the odd-numbered items are closely related to the
scores from the even-numbered items.
Item-total
reliability:
Established when the scores on each item on a measure are closely related to the
total score on the measure. Often
used to eliminate items that have low correlations with the total score.
(Another index of internal consistency).
The
extent to which the measure of a variable actually measures what it claims to
measure. More generally, validity
is the extent to which an operational definition of a variable does in fact
reflect the underlying conceptual variable.
It
is often assessed by assessing the construct validity of the DV.
Construct
validity
Refers
to the extent to which an abstract construct (e.g. intelligence) can be inferred
from the operational definition of that construct.
Do
the experimental operations represent the construct of interest to the
experimenter?
Construct
validity is established when the manipulation of a variable relates to other
manipulations in theoretically meaningful ways.
Construct
validity can be achieved by:
·
Providing
a clear definition of the abstract construct (pre-testing may be necessary)
·
Obtaining
data to show that the empirical representation of the IV does not vary with
measures of related but different conceptual variables.
Validity
can also be established by obtaining convergent and divergent data.
Convergent
data: data
indicating agreement in several measures of the same construct.
(e.g. correlating scores on an IQ test with scores on a different IQ
test)
Divergent
data: data
indicating lack of relationship between measures of different constructs. (e.g.
IQ scores should not be related to anxiety scores, or depression scores).
Other
types of validity include:
Face
validity: A
general estimate of whether a measure appears to measure the variable
being studied.
Predictive
or Criterion validity: Established when the measure of a predictor variable relates to scores
on a criterion variable.
Concurrent
validity:
Established when scores from a new measure are closely related to scores
obtained from a more established measure of the same variable.