Psy 232 Class notes GW 9 & 10: t-test

 

What statistic is used when the z-score cannot be used?   The t-test

How does it differ from the z-score?

      It uses the standard error of the mean for the sample means instead of
      the standard error of the mean for the population.

 

The formulae for (1) a z-score and (2) a t-score.

 z = (M – μ)/σM               

 t = (M - μ)/ sM.     Where sM = s/√n  -or- √(s2/n), and df = n -1.

 

The logic of these formulae:

                the denominator = difference occurring just by “chance.”

                the numerator = difference between means based on hypothesis.

                If numerator = denominator, then the numerator’s difference

                                                must have occurred just by chance.

                                    [i.e., z-score = 0 or t-score = 0]

                If numerator ≠ denominator, then the numerator’s difference

                                                did NOT occur just by chance.

                                    [i.e., z ≠ 0 or t ≠ 0]

 

Questions to consider regarding the logic of the two formulae:

            What does the numerator measure? 

                        a tx – i.e., the effect of an IV

                        or a pre-existing difference, i.e., a sample that started out diff from the pop.

            What does the denominator measure?

                         chance, which is comprised of

                        — the effects of individual differences within the groups that are
                             represented by the means

                        — experimental error (things that could not be controlled by the
                             experimenter)

           Where would the effects of one's experimental "treatment" show up? 

           Where would the effects of individual diffs and other "chance" factors appear?         

            Where would the effects of increasing sample size (n) appear?

             Where would efforts to reduce unwanted variability appear?

 

 Other issues/questions:

What do the terms "significant" and "not significant" mean in scientific reports?

 

The parametric assumptions for use of the t statistic:

            1 Normality = The DV is normally distributed in the population

            2 Equal variances = The variances of the samples & populations are equal

            3 Independence = Scores are independent of one another

            4 Random sampling = Participants are selected randomly from the population

 

In science, “assumption” does not mean “maybe,” or “a guess,” or etc.

It’s a requirement for a testing procedure.  If you cannot meet the assumption/requirement,
then the testing procedure is likely to produce an error.

Sometimes it’s a fairly small error, sometimes large.  But the problem is that we don’t have
any way of knowing how large the error is.

Naturally, the more assumptions we have to meet, the more chance we have to make errors –
and more serious ones – depending on how many violations occur & how serious those violations are. 
Scary, isn’t it!

 

How can one can ensure that the foregoing assumptions are met?

            1 Normality =

            2 Equal variances =

            3 Independence =

            4 Random sampling =

 

Independent measures 2-sample t-test

 

This test is used to compare two samples, presumably from two different populations.   
      E.g., a sample of males versus a sample of females, or a tx group versus a control group.

 

Notation to reflect two groups:

                µ1 and µ2 = the means of the two populations

                M1 and M2 = the means of the two samples

                n1 and n2 = the number of scores in the two samples

 

The hypotheses for “non-directional” tests:

                H0: µ1 = µ2.  [Note:  GW prefers µ1 - µ2 = 0.]

                H1: µ1 ≠ µ2. [Note:  GW prefers µ1 - µ2 ≠ 0.] 

 

The formula:

Since we have two sample means, the numerator for the t-formula becomes

                M1 - M2

And since we have two sample standard errors, the denominator becomes

                s(M1–M2)

In the foregoing, we are assuming that the two variances are equal (ensured via relatively large
and equal n’s), and thus can simply be added together.

[Note:  When the variances are not equal, the t-test formula is adjusted to account for the inequality,

but otherwise the logic of the t-test remains the same.]

 

Thus for the independent samples t-test, where the n’s are large and equal,

                t = (M1 - M2)/s(M1–M2)

That is, the differences between the two sample-means divided by their standard error.

 

Degrees of freedom:  since there are two groups & hence two means, we have to account for
the loss of one degree of freedom from each group: thus, df = (n1 - 1 + n2 - 1). 

In effect, this simply means the two group’s total n minus 2.

 

The logic of the two-sample t-formula is the same as that for the single-sample formula.

That is, the comparison of numerator to denominator, and the factors affecting each, stays the same. 

In words, the logic goes something like this:

 

t = (effect of IV on the means) / (effect of chance factors – i.e., individual diffs         
                                                    among participants & experimental error)

 

If the numerator ≤ the denominator, then the IV had no effect – the diff is due to chance.

If the numerator is significantly > the denominator, then the IV did have an effect;
the diff is not due to chance.

NOTE:  this logic will be true, with only minor adjustments, for all the parametric tests that we study.

How do we know for sure whether a diff is due to chance or to the IV? 
We look at the p-value for the t-test.  If p > than .05 that chance is the likely cause of the diff,
then we go with that.  However, if p < .05 that the diff is due to chance, then we say that chance
probably is not the cause of the diff, so the effect of the IV is the probable cause. 

 

Power and versatility

The 2-sample t-test is less powerful than the z-test or the single-sample t-test:

    It uses NO information about the populations  thus we must estimate everything
    (i.e., the means & standard errors).

   Because the estimates aren’t exact, we don’t have the certainty about them that we have when
   we actually know the population mean & std. error.

 However, it's still a very powerful test, as are all parametric tests, so the foregoing difference is

   fairly minor.

 

But the two-sample t-test is more versatile, since it can be used to compare TWO samples,
presumably from two different populations.

The z-test and single-sample t-test are limited to one sample compared to one population.

Thus the two-sample t-test’s versatility is a major advantage over the other two tests.

 

Effect size

The foregoing formula is used to determine whether there is a “statistically significant” difference
between two sample means, with α as the criterion for “probably different” versus “probably not different.”

 

But what about the size of the difference? 

The formula for effect size, in terms of percentage of variance among individual scores that is
accounted for by knowing the difference between the means, is the same as that for the single-sample t-test:

 

                r2 = t2/ (t2 + df)

 

Alternatively, Cohen’s d could be calculated for the same purpose.

 

In a research report “results” section:

 

Group 1 differed significantly from Group 2, M = xxx versus xxx, respectively, t(df) = xxx, p = .xxx, r2 = .xxx.

        or –

 

As Figure 1 shows, Group 1 differed significantly from Group 2, respectively, t(df) = xxx, p = .xxx, r2 = .xxx.