Psy 342 Learning & Memory

Chapter 4

Instrumental/Operant Conditioning

I. Early Experiments

            A. Thorndike

The graph below is called a ______________ and summarizes how quickly cats learned to escape from a puzzle box.

Thorndike's measure of conditioning was _____?

Trial and error or insight? 

 

For Thorndike’s cats, the behaviors that opened the door were followed by certain consequences: __________and____________. 

As a result of these consequences, the cat became _____ likely to repeat the effective escape behaviors, which ultimately decreased the animal’s escape latency. 

“Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected to the situation, so that, when it (the situation) recurs, they will be more likely to recur.”

instrumental conditioning—an organism’s behavior changes because of the consequences that follow the behavior.

 

            B. Skinner

Responsible for an enormous increase of interest in instrumental/operant conditioning in the 1940s and 1950s.

 How did Skinner’s methods differ from Thorndike’s?

     Discrete-trial procedure vs. free operant procedures

       Response latency vs. response rate

 

II. Behavior, Consequences, & Contingencies

Instrumental/operant contingency:

Behavior Consequence (further, this relationship is contingent)

 

           A.  Reinforcement & Punishment

reinforcement—the contingency that results when the consequence of a behavior causes the future probability of that behavior to INCREASE

Then what is a reinforcer?

 

punishment—the contingency that results when the consequence of a behavior causes the future rate of the behavior to decrease

Then what is a punisher?

 

positive reinforcement—when reinforcement involves the presentation of a stimulus

positive punishment—when punishment involves the presentation of a stimulus

In this instance, positive does NOT mean __________.

 

negative reinforcement—when the consequence in a reinforcement  contingency is the removal of a stimulus

negative punishment—when the consequence in a punishment situation is the removal of a stimulus

In this instance, negative does NOT mean ___________.

 

4 key questions to help you determine the type of contingency involved.

 

  1. What is the behavior?
  2. What is the consequence of that behavior?
  3. Does the behavior produce (positive contingency, +) or prevent/remove an event (negative contingency, -)?
  4. Does the rate or probability of the behavior increase ( ­, reinforcement contingency) or decrease  ( ¯, punishment contingency) in the future?

B. How might reinforcers and punishers serve an adaptive role in behavior?  

 

            C. Application

Operant conditioning in the treatment of alcohol abuse

One effective treatment for alcohol abuse is the community reinforcement approach (CRA) (Azrin, Sisson, Meyers, & Godley, 1982; Hunt & Azrin, 1973).

 

 

 Explain the meaning and importance of the following statements:

(1) We define consequences and contingencies by their effects on behavior, not by what we EXPECT their effects to be.

 

(2) If behavior is followed by a reinforcer, it is the behavior that is reinforced, not the organism.

 

(3) The contingent relationship between behavior and outcomes can lead to behaviors that do not appear to make sense.

 

             

II. Instrumental Conditioning Paradigms

            Instrumental vs. Operant Distinction

            A. Runways/Mazes

            B. Escape & Avoidance Paradigms

            C. Operant Procedures

                        Bar-Press

                        Key-Peck

                        Human Operant Responses

                        Measuring Operant Responses—usually rate at which organisms emit response is the dependent variable in most operant conditioning studies

Cumulative recorder creates a record of __________________ as a function of time.

 

                        Shaping procedure

Reinforcement of behaviors that are closer and closer approximations of the target response

Application: The Lovaas treatment program (established in 1960's) uses operant conditioning principles to decrease problem behavior of autistic children and to increase the frequency of appropriate behaviors (e.g., language and social skills).

III. Positive Reinforcement Situations

Presentation of an event that increases the probability of future behavior.

So responding is influenced mainly by characteristics of the reinforcer.

 

            A. Amount/Magnitude

Generally, responding occurs faster and becomes more accurate as we increase the amount of a reinforcer delivered after each response.

 

E.g., Crespi (1942)—5 groups of rats to run down a straight runway for food

IV: # of food pellets in goal box (1, 4, 16, 64, or 256)  

**Results: After 20 trials, the running speeds of the groups corresponded directly to the number of pellets received.

 

Complicating factor—Some researchers have disagreed as to the definition of reinforcer magnitude.

 

Several studies have shown that if an experimenter gives two groups the same amount of food for each response, but for one group the food is simply broken into more pieces, the group receiving the most pieces will respond faster.

 

**Quality of reinforcement also matters.

 

We can define quality by assessing how vigorously an organism consumes a reward when it is presented.

 

High quality reinforcers are consumed quickly by organisms, whereas low quality reinforcers are consumed less quickly.

 

            B. Delay of Reinforcement

Generally, the longer that reinforcement is delayed after a response, the poorer the performance of that response.

 

E.J. Capaldi (1978)—Two groups of rats received the same amount and quality of food reinforcement on each trial.  

 

  ***One reason for this difference might be ____________________? (Why would a 10 second delay matter? What could possibly go on during the 10 s delay?)

  

            C. Contrast Effects—Effects of quantity, quality, and delay of reinforcement on instrumental responding can vary depending on an organism’s past experiences with a particular reinforcer.

 

Crespi (1942)—Group 1 trained to traverse a runway for large amount of food, Group 2 for a small amount of food

 After several trials, the rats receiving the large reinforcer were running consistently faster than the small-reinforcement group.

 Then switched half of the large-reinf. animals to the small reinforcer being given to the other rats (large-small).

 And switched half the small-reinforcer rats to the large-reinforcer magnitude (small-large).

  

  

            D. Intermittent Reinforcement—Often, we can't reinforce each time an appropriate response is emitted. 

What is the effect of such reinforcement inconsistency on behavior?

It is clear that instrumental learning is not dependent on continuous reinforcement. In most species, instrumental responding develops quite efficiently even when reinforcement occurs only intermittently.

 

            Schedules of Reinforcement

                        

Ratio schedules—deliver reinforcer only after a  certain number of responses

                        Fixed Ratio (FR)

                        Variable Ratio (VR)

  Interval schedules—deliver reinforcer only for the first response that occurs after a period of time has elapsed.  

                        Fixed interval (FI)  

                        Variable interval (VI)  

                        

                   Concurrent schedules—when more than one reinforcement schedule is operating at the same time

When we are faced with choices between different schedules of reinforcement, how do we respond?

   

Hernstein (1961) measured the rate at which pigeons pecked keys that were associated with different reinforcement schedules.

 

He found that pigeons did NOT simply choose the key having the most favorable schedule of reinforcement. That is, they did not make the vast majority of their responses to the key that provided the most reinforcement in the shortest period of time.

 

Instead, pigeons divided their responses between the keys, and the rate of responding on each key was directly related to the rate of reinforcement on that key.

 

As a result, Hernstein proposed the matching law 

The formula for the matching law is:

 

IV.  Negative Reinforcement Situations

    A. Escape Learning

          Amount of Reinforcement—In escape learning, the amount of reinforcement corresponds to the degree to which the aversive stimulation is reduced after a successful response.

Campbell & Kraeling (1953)—exposed all rats to the same shock intensity in the runway and reduced this shock by varying degrees in the safe box.

 That is, all animals received shock reduction after responding, but most animals still received SOME shock in the safe box.

 The speed of the escape response was a direct function of the degree to which shock was reduced.

 **Escape learning depends more on the amount of negative reinforcement (degree to which aversive stimulation is reduced) than on the intensity of the aversive stimulus per se.

 

Delay of Reinforcement—time between the escape response and the reduction of the aversive stimulation.

 Fowler & Trapold (1962)—exposed rats to shock in an alleyway and required the animals to run to a safe box to escape the shock.

The delay in shock offset varied between 0 and 16 seconds.

 

  Results-- escape speeds decreased as the delay of shock offset increased. 

          B. Avoidance Learning—A stimulus signals the presentation of the aversive event. The organism must respond in the presence of the signal in order to avoid the aversive event.

 

Two stimuli—the signal and the aversive stimulus

The characteristics of both of these stimuli are important determinants of avoidance learning.

 

                   Intensity of Aversive Stimulus

 

                   Signal-Aversive Stimulus Interval--Avoidance learning appears to occur most efficiently when the signal and the aversive stimulus overlap, as is the case in a delayed-conditioning procedure.  

 

                    Duration of the signal before onset of aversive event--In general, signals of longer duration tend to facilitate the learning of the avoidance response (Low & Low, 1962).

                     Termination of the Signal--Even when the organism’s response results in avoidance of the aversive stimulus, the rate of learning depends on whether the response also leads to the termination of the signal.

Kamin (1957) conducted a two-way avoidance experiment using rats. All rats were able to avoid shock by moving to the safe chamber during the signal-shock interval. For one group, the signal terminated as soon as the avoidance response occurred (o-second delay). In the other three groups, the signal ended either 2.5, 5, or 10 seconds after the avoidance response.

 

          Punishment Situations (Read p. 118-122 for assign. 2)

                   Intensity & Duration of Punishment

                   Delay & Non contingent Delivery of Punishment

                   Responses Produced by the Punishing Stimulus

 

            V. The Discriminative Stimulus

A stimulus that is reliably present when a particular behavior is reinforced.  

After instrumental conditioning in the presence of a discriminative stimulus, the presentation of that discriminative stimulus will increase the probability of behaviors that have been reinforced in its presence….

Discriminative stimuli make a behavior more probable, but ultimately they DO NOT CAUSE the behavior to occur. It is the reinforcement contingency that controls the rate of behavior.

  Due to its association with a reward, the discriminative stimulus can also be used to reward behavior--secondary reinforcer

 

            VI. Extinction

          Variables Affecting Extinction of Positively Reinforced Responses

                   Conditions Present During Learning

                   Conditions Present During Extinction

          Variables Affecting Extinction of Negatively Reinforced Responses