Psy 232 Class notes GW 9 & 10: t-test
What statistic is used when the z-score cannot be used? The t-test
How does it differ from the z-score?
It uses
the standard error of the mean for the sample means instead of
the
standard error of the mean for the population.
The formulae for (1) a z-score and (2) a t-score.
z
= (M – μ)/σM
t = (M - μ)/ sM.
Where sM = s/√n
-or- √(s2/n), and df = n -1.
The logic of these formulae:
the denominator = difference occurring just by “chance.”
the numerator = difference between means based on hypothesis.
If numerator = denominator, then the numerator’s difference
must have occurred just by chance.
[i.e., z-score = 0 or t-score = 0]
If numerator ≠ denominator, then the numerator’s difference
did NOT occur just by chance.
[i.e., z ≠ 0 or t ≠ 0]
Questions to consider regarding the logic of the two formulae:
What does the numerator measure?
a tx – i.e., the effect of an IV
or a pre-existing difference, i.e., a sample that started out diff from the pop.
What does the denominator measure?
chance, which is comprised of
— the effects of individual differences within the groups that are
represented by the means
— experimental error (things that could not be controlled by the
experimenter)
Where would the effects of one's experimental "treatment" show up?
Where would the effects of individual diffs and other "chance" factors appear?
Where would the effects of increasing sample size (n) appear?
Where would efforts to reduce unwanted variability appear?
Other issues/questions:
What do the terms "significant" and "not significant" mean in scientific reports?
The parametric assumptions for use of the t statistic:
1 Normality = The DV is normally distributed in the population
2 Equal variances = The variances of the samples & populations are equal
3 Independence = Scores are independent of one another
4 Random sampling = Participants are selected randomly from the population
In science, “assumption” does not mean “maybe,” or “a guess,” or etc.
It’s
a requirement for a testing procedure.
If you cannot meet the assumption/requirement,
then the testing procedure is likely to produce an error.
Sometimes it’s a fairly small error, sometimes large.
But the problem is that we don’t have
any way of knowing how large the error is.
Naturally, the more assumptions we have to meet, the more chance we have to make
errors –
and more serious ones – depending on how many violations occur & how serious
those violations are.
Scary, isn’t it!
How can one can ensure that the foregoing assumptions are met?
1 Normality =
2 Equal variances =
3 Independence =
4 Random sampling =
Independent measures 2-sample t-test
This
test is used to compare two samples, presumably from two different populations.
E.g., a
sample of males versus a sample of females, or a tx group versus a control
group.
Notation to reflect two groups:
µ1 and µ2 = the means of the two populations
M1 and M2 = the means of the two samples
n1 and n2 = the number of scores in the two samples
The hypotheses for “non-directional” tests:
H0: µ1 = µ2. [Note: GW prefers µ1 - µ2 = 0.]
H1: µ1 ≠ µ2. [Note: GW prefers µ1 - µ2 ≠ 0.]
The formula:
Since we have two sample means, the numerator for the t-formula becomes
M1 - M2
And since we have two sample standard errors, the denominator becomes
s(M1–M2)
In
the foregoing, we are assuming that the two variances are equal (ensured via
relatively large
and equal n’s), and thus can simply be added together.
[Note: When the variances are not equal, the t-test formula is adjusted to account for the inequality,
but otherwise the logic of the t-test remains the same.]
Thus for the independent samples t-test, where the n’s are large and equal,
t = (M1 - M2)/s(M1–M2)
That is, the differences between the two sample-means divided by their standard error.
Degrees of freedom: since
there are two groups & hence two means, we have to account for
the loss of one degree of freedom from each group: thus, df = (n1
- 1 + n2 - 1).
In effect, this simply means the two group’s total n minus 2.
The logic of the two-sample t-formula is the same as that for the single-sample formula.
That is, the comparison of numerator to denominator, and the factors affecting each, stays the same.
In words, the logic goes something like this:
t
= (effect of IV on the means) / (effect of chance factors – i.e., individual
diffs
among participants & experimental error)
If the numerator ≤ the denominator, then the IV had no effect – the diff is due to chance.
If
the numerator is significantly > the denominator, then the IV did have an
effect;
the diff is not due to chance.
NOTE:
this logic will be true, with only minor adjustments, for all the
parametric tests that we study.
How do we know for sure whether a diff is due to chance or to the IV?
We look at the p-value for the
t-test.
If p > than .05 that chance is
the likely cause of the diff,
then we go with that. However, if
p < .05 that the diff is due
to chance, then we say that chance
probably is not the cause of the diff, so the effect of the IV is the probable
cause.
Power and versatility
The 2-sample t-test is less powerful than the z-test or the single-sample t-test:
It uses NO information about the populations
thus we must estimate everything
(i.e., the means & standard errors).
Because the estimates aren’t
exact, we don’t have the certainty about them that we have when
we actually know the population mean & std. error.
However, it's still a very powerful test, as are all parametric tests, so the foregoing difference is
fairly minor.
But
the two-sample t-test is
more versatile, since it can be used
to compare TWO samples,
presumably from two different populations.
The z-test and single-sample t-test are limited to one sample compared to one population.
Thus the two-sample t-test’s versatility is a major advantage over the other two tests.
Effect size
The
foregoing formula is used to determine whether there is a “statistically
significant” difference
between two sample means, with α as the criterion for “probably
different” versus “probably not different.”
But what about the size of the difference?
The
formula for effect size, in terms of percentage of variance among individual
scores that is
accounted for by knowing the difference between the means, is the same as that
for the single-sample t-test:
r2 = t2/ (t2 + df)
Alternatively, Cohen’s d could be calculated for the same purpose.
In a research report “results” section:
Group 1 differed significantly from Group 2, M = xxx versus xxx, respectively, t(df) = xxx, p = .xxx, r2 = .xxx.
– or –
As Figure 1 shows, Group 1 differed significantly from Group 2, respectively, t(df) = xxx, p = .xxx, r2 = .xxx.