Hypothesis Testing, Effect Size, & Power

I. Logic of Hypothesis Testing

A. Hypothesis Testing

- an inferential procedure that uses sample data to evaluate a hypothesis about a population

- general scheme

1. State hypothesis about population

2. obtain random sample

3. compare
 (M -
m) or  (X -m) if n = 1

- assumption: if treatment has effect it adds a constant to each score

 

First suppose that n=1

 

 

 

 

 

 

 

 

B. Procedure

1. State Hypothesis

Ho = null hypothesis, treatment has no effect

H1 = treatment has effect (alternative or experimental hypothesis)

2. Set criteria for decision

- there is always some discrepancy between sample stats and pop. parameters

- sampling error

 

--t distribution

- generally not normal - flattened and stretched out

- approximates normal in the way that t approximates z

- shape determined by df

Degrees of Freedom

- df = n - 1

- the greater n is, the more closely s represents s, and then the better t represents z

 

3.  Collect sample data

calculate test statistic--Single Sample t-test

Formula:

t = M-µ/ SM     

4. Evaluate Null hypothesis

Reject Ho

Retain Ho (Fail to reject Ho)

 

II. Evaluating Hypotheses

A. Alpha Level (a)

- minimize risk of type I error

1. determine what data are expected if Ho true

2. determine what data are unlikely if Ho true

3. use distribution of sample means separated into two parts

- Xbar or M expected (hi prob) if Ho true

- Xbar or M unlikely (low prob) if Ho true

4. The alpha level defines very unlikely (e.g., extreme 5% of distribution) scores to obtain by chance

- Xbar or M compatible with middle of distribution

- Xbar or M compatible with extremes of distribution

5. When Ho falls into tails, we reject Ho

- very unlikely sample if the treatment had no effect

 

BAssumptions for Parametric Tests

1.normality--DV must be normally distributed

2. independent observations

3. homogeneity of variance, s not changed by treatment

4. interval or ratio scale for the DV 

 

C. One-tailed test--Critical region in only one tail

- reject Ho with smaller difference between M and m

- more "sensitive"

- increase the possibility of Type I error (false alarm)

 

III.  Errors in Hypothesis Testing

A. Type I error - reject Ho when true

B. Type II error - fail to reject Ho when false

C. Power

- the probability of detecting a treatment effect when one is indeed present.

- power is the opposite of Type II error (when a treatment effect really exists in the population).

-power = 1 – (type II error) or 1 – (beta)

-as type II error decreases, power increases

- by decreasing type I error (move from .05 to .01) we directly increase type II error (and thereby decrease power).

 

The Relationship between Power and Sample Size

Suppose σ = 40

 

 

    D.  Effect Size

Important limitation of the hypothesis testing procedure:

 It makes a relative comparison: the size of the treatment effect relative to the difference expected by chance. If the standard error is very small, then the treatment effect can also be very small and still be bigger than chance. 

Therefore, a significant effect does not necessarily mean a big effect.

 Also, if the sample size is large enough, any treatment effect, no matter how small, can be enough for us to reject the null hypothesis.

 

Figure 8-11  (p. 262)
The appearance of a 15-point treatment effect in two different situations. In part (a), the standard deviation is σ = 100 and the 15-point effect is relatively small. In part (b), the standard deviation is σ = 15 and the 15-point effect is relatively large. Cohen’s d uses the standard deviation to help measure effect size.

Calculating effect size: 

Cohen's d = mean difference / standard deviation

  d                     Evaluation    

 0.2                      Small effect

       0.5                  Medium effect

       0.8                        Large effect

       1.10                    Very Large

       1.40                     Extremely Large

***************************************************************

Alternative effect size for t-tests: r2 = t2 / (t2 + df)

Advantage to this one is that people are familiar with it.

 Values range from 0.00 to 1.00.

  What proportion of the total variability in the scores is accounted for by the treatment?

Magnitude of  r2     Evaluation    

.09 and below                      Small effect

between .09 & .25             Medium effect

over .25                            Large effect

The Relationship between Power and Effect Size

 

Think about the following:
Suppose that a researcher normally uses an alpha level of .05 for hypothesis tests, but this time uses an alpha level of .01.

a) What does this change in alpha do to the risk of a Type I error?

b) What does this change in alpha level do to the amount of power?