Psy 232 Class notes GW13: Introduction to Analysis of variance (a.k.a., ANOVA, or AOV, or the F-test)
                                      
With a brief aside re: related-samples t-test

Aside re: related samples t-test
Thus far:
Correlation & regression – looking for rel's between variables & predicting new scores based on a regression equation
z-test – comparing a sample-mean to a population-mean, given that we know both μ & σ
Single-sample t-test – same as z-test, except that we don’t know σ, and thus must estimate it using s.
Independent t-test – two sample means are compared (e.g., the mean of a tx group versus the mean of a control group)

All are very powerful statistical procedures for helping us to understand and predict behaviors.

But what do we do when we have one sample that is tested twice?
For example, perhaps we test a group of people on a topic,
then provide them with instruction about the topic,
then re-test the same people to see if the instruction was effective in improving their knowledge.

This would be a repeated-measures design:  Pre-test à intervention à post-test

It’s commonly used in education & in some clinical settings, especially when the entire procedure can be completed in a fairly short time (note possible confounding variable from changes arising with time itself if lengthy times are allowed). Although we do have two sets of scores, the independent t-test would not be appropriate since the two sets are not independent.

Related samples t-test:  Sometimes “matched pairs” are used,
but most researchers rely on
repeated measures – the same participants are tested twice.

Conceptually, this test is the same as the independent t-test and the z-test:

t = (difference between means) / (chance differences)

Remember that chance diffs = individual diffs among the participants + experimental error

Technically it’s different, though, in that we are now concerned with the difference between two sets of scores from the same people, rather than two separate means from two separate groups of people.

To compute this t, we compute the differences between each person’s pairs of scores, and then compute the mean of those differences.  Then we determine the standard error of the mean differences, and compare the mean difference to that standard error.

Thus,   t = MD / S MD     [GW has (MD – uD)/S MD  , but since uD = 0 according to the null hypothesis,. . ..]

Its major differences from the independent t-test:

-- tests each participant twice, hence only needs half the # of participants for the same # of scores

-- individual differences are reduced by half, since the individual diffs in the first test = the individual diffs in the second test.  Thus the “chance” factor in the denominator of the formula is greatly reduced. 

-- provides a considerable gain in power:  if there is a difference between the two tests, the repeated measures design is more likely to find it than is the independent groups design.

 

Now to the main event:  ANOVA

Heretofore we have considered situations where we compared only two situations or  two means at a time

            using the z-test or one of the t-tests: 

            one sample, represented by its mean versus the population mean

            two samples, represented by two means, each presumably representing the population mean

            one sample, two scores per person, with the mean difference between the scores presumably

                        representing the mean difference that one would expect to occur in the population

 

But what if there are more than two groups?  More than two situations?  More than two treatments?

We can't subtract three or more means from each other: that doesn't make sense. 

So the t-test procedures can't work for comparisons of three or more means.

 

Solution:

            Analysis of variance, also known as the F-test.

            In the t-test we simply compare the difference between two means

                        to the difference we would expect to occur just by chance

                        as estimated by the standard error of the mean.

            In the F-test, we compare the average differences among a set of means, as measured by their

                        variance, to the average differences that we would expect to occur among the individual

                        scores, as measured by their variance.

            So, the total of the differences can be divided into two components, called
            partitioning the variance:

                        the variance of  the groups means and the variance of the individual scores.

                        Either of the foregoing could be used to estimate the population variance.

 

            If the means are all from the same population, then the two variance estimates should both equal the

            population variance:

                         

                        σ2 = s2 (among the means) = s2 (among the individual scores)

           

            Recall that all stats tests have a similar format:

                        Test Statistic = (differences among groups) / (differences expected by chance)

 

            For the F-test, the foregoing format, using symbols, would be

                        F =  (variance of the means) / (variance of the scores)

 

Recall that ordinary variance for a single set of scores is  s2 = Σ(x )2/(n-1), where n-1 = df,

            and Σ(x )2 represents sums of squares, also known as SS.

But now we have two different variances, and more than one set of scores are involved.

So we need to figure out a way to compute the two variance estimates for the F-test.

 

Vocabulary:  the analysis of variance uses a special vocabulary, so it's imperative to learn it.

            Factor  =          The independent variable

            SSb      =          The sum of the squared deviations among the means of the groups or conditions

            SSw     =          The sum of the squared deviations within the groups or conditions

            MSb    =          Mean Square between groups or conditions this refers to the average variance

                                    among the means [i.e., differences due to "treatment effects"]

            MS   =          Means Square within groups or conditions this refers to the average variance

                                    among the scores within each group or condition [i.e., "chance" differences]

            k          =          The number of groups or conditions

            n          =          The number of observations, participants, or scores within each group

            N         =          The total number of observations, participants, or scores for all groups combined

            df-total =          The total degrees of freedom for the entire set of scores = N-1

            df-between =   The number of degrees of freedom among the set of means = k-1

            df-within =       The number of degrees of freedom within each group = N-k

           

To get to the two variance estimates, using the foregoing vocabulary, we need

            F = MSb / MSw,

where   MSb = SSb / df-between = SSb/(k-1)

and       MSw = SSw / df-within = SSw/(N-k)

    Note that squared #'s are always positive, so if MSb = MSw, then MSb/MSw = 1.

    Thus, according to the null hypothesis, F = 1 [compare to the null for tt = 0].

    If F > 1, the null hypothesis is rejected, and we say that the groups' means are probably different.

 

Post-hoc tests

When a F-test reveals a "statistically significant" effect of the IV, all it really tells us is that there is

        a significant difference somewhere among the means.  But it doesn't tell us exactly where.

        So we use "post-hoc" tests to determine where the difference is.

        Two criteria for post-hoc tests:

            -- Three or more groups.

            -- A statistically significant effect, according to the F-test.  If the F-test did not find a

                    general difference, then a post-hoc test won't find a specific one!

        Several post-hoc tests are available.

            Two commonly used ones are Scheffe's and Tukey's HSD.

            The Scheffe test is more conservative:  it is reluctant to identify a small difference as significant;

                    that is, it is less willing to risk a Type I error.

            Tukey's HSD is more liberal:  it is more willing to identify a small difference as significant; that is,

                    it is more willing to risk a Type I error.

 

Although Student's t-test preceded the F-test by some 40 years, we now know that the two tests are

        closely related, and that the t is actually just a special case of the F -- when the IV has only 2 groups.

        This close relation can be expressed in an equation:

                F = t2

        The logic of the two tests, in terms of the structures of their formulae, is also similar:

                t = (difference between sample means)/(differences expected by chance)

                F = (variance among sample means)/(variance expected by chance)

        That is, in both cases the effects of an IV, seen in the numerator, is compared to the effects of chance,

                seen in the denominator.

 

Assumptions for the F-test.

        Independence of scores within each sample.

        Normal distribution of the DV in the populations from which the samples are drawn..

        Homogeneity of variance ("equal variances").

        DV has an interval or ratio number scale.

 

Effect size:  η2 = SSbetween/SStotal

Note:  η2 ("eta-squared") is the formal symbol in AOV for effect size. 

It is equivalent to r2 as a measure of effect size in other contexts.