Terry Chapters 4 & 5
Instrumental/Operant Conditioning

I. Early Experiments

            A. Thorndike

The graph below is called a ______________ and summarizes how quickly cats learned to escape from a puzzle box.

Thorndike's measure of conditioning was _____?

 

For Thorndike’s cats, the behaviors that opened the door were followed by certain consequences: __________and____________. 

As a result of these consequences, the cat became _____ likely to repeat the effective escape behaviors, which ultimately decreased the animal’s escape latency. 

“Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected to the situation, so that, when it (the situation) recurs, they will be more likely to recur.” Thorndike called this the Law ___ ________.
The environmental stimuli present when the response occurs come to control the escape response. These stimuli are called _______________ and signal when or where reinforcement is available.

Trial and error or insight? 

According to Thorndike, learning involves establishing a connection between the discriminative stimulus and the instrumental response (___-___ connection).

instrumental learning—learning the connection between a behavior and its consequence

            B. Skinner

Responsible for an enormous increase of interest in instrumental/operant conditioning in the 1940s and 1950s.

 How did Skinner’s methods differ from Thorndike’s?

     Discrete-trial procedure vs. free operant procedures

       Response latency vs. response rate

 

II. Behavior, Consequences, & Contingencies

Instrumental/operant contingency:

Behavior Consequence (further, this relationship is contingent)

           A.  Reinforcement & Punishment

reinforcement—the contingency that results when the consequence of a behavior causes the future probability of that behavior to INCREASE

Then what is a reinforcer?

 

punishment—the contingency that results when the consequence of a behavior causes the future rate of the behavior to decrease

Then what is a punisher?

 

positive reinforcement—when reinforcement involves the presentation of a stimulus

positive punishment—when punishment involves the presentation of a stimulus

In this instance, positive does NOT mean __________.

 

negative reinforcement—when the consequence in a reinforcement  contingency is the removal of a stimulus

negative punishment—when the consequence in a punishment situation is the removal of a stimulus

In this instance, negative does NOT mean ___________.

 

4 key questions to help you determine the type of contingency involved.

 

  1. What is the behavior?
  2. What is the consequence of that behavior?
  3. Does the behavior produce (positive contingency, +) or prevent/remove an event (negative contingency, -)?
  4. Does the rate or probability of the behavior increase ( ­, reinforcement contingency) or decrease  ( ¯, punishment contingency) in the future?

B. How might reinforcers and punishers serve an adaptive role in behavior?  

 

            C. Application

Operant conditioning in the treatment of alcohol abuse

One effective treatment for alcohol abuse is the community reinforcement approach (CRA) (Azrin, Sisson, Meyers, & Godley, 1982; Hunt & Azrin, 1973).

The Lovaas treatment program (established in 1960's) uses operant conditioning principles to decrease problem behavior of autistic children and to increase the frequency of appropriate behaviors (e.g., language and social skills).

 

 

 Explain the meaning and importance of the following statements:

(1) We define consequences and contingencies by their effects on behavior, not by what we EXPECT their effects to be.

 

(2) If behavior is followed by a reinforcer, it is the behavior that is reinforced, not the organism.

 

(3) The apparent contingent relationship between behavior and outcomes can lead to behaviors that do not appear to make sense.   Skinner's classic article

II. Instrumental Conditioning Paradigms

            Instrumental vs. Operant Distinction

            A. Runways/Mazes

            B. Escape & Avoidance Paradigms

            C. Operant Procedures

                Bar-Press, Key-Peck, Human Operant Responses

                        Measuring Operant Responses—usually rate at which organisms emit response is the dependent variable in most operant conditioning studies

Cumulative recorder creates a record of __________________ as a function of time.

 

Shaping procedure

Reinforcement of behaviors that are closer and closer approximations of the target response   Shaping in Skinner Box

III. Positive Reinforcement Situations

Presentation of an event that increases the probability of future behavior.

So responding is influenced mainly by characteristics of the reinforcer.

            A. Amount/Magnitude

Generally, responding occurs faster and becomes more accurate as we increase the amount of a reinforcer delivered after each response.

 

E.g., Crespi (1942)—5 groups of rats to run down a straight runway for food

IV: # of food pellets in goal box (1, 4, 16, 64, or 256)  

**Results: After 20 trials, the running speeds of the groups corresponded directly to the number of pellets received.

 

Complicating factor—Some researchers have disagreed as to the definition of reinforcer magnitude.

Several studies have shown that if an experimenter gives two groups the same amount of food for each response, but for one group the food is simply broken into more pieces, the group receiving the most pieces will respond faster.

 

**Quality of reinforcement also matters.

 

We can define quality by assessing how vigorously an organism consumes a reward when it is presented.

 

High quality reinforcers are consumed quickly by organisms, whereas low quality reinforcers are consumed less quickly.

 

            B. Delay of Reinforcement

Generally, the longer that reinforcement is delayed after a response, the poorer the performance of that response.

  ***One reason for this difference might be ____________________? (Why would a 10 second delay matter? What could possibly go on during the 10 s delay?)

  

            C. Contrast Effects—Effects of quantity, quality, and delay of reinforcement on instrumental responding can vary depending on an organism’s past experiences with a particular reinforcer.

Crespi (1942)—Group 1 trained to traverse a runway for large amount of food, Group 2 for a small amount of food

 After several trials, the rats receiving the large reinforcer were running consistently faster than the small-reinforcement group.

 Then switched half of the large-reinf. animals to the small reinforcer being given to the other rats (large-small).

 And switched half the small-reinforcer rats to the large-reinforcer magnitude (small-large).

  

           D. Drive
motivational need or desire for a given reinforcer...How might we manipulate the hunger and thirst drives of rats? What impact would we expect this manipulation to have on behavior (e.g., maze running)?

Hull (1949)--Instrumental performance is determined by level of conditioning AND motivational factors...Instrumental conditioning was represented by Habit Strength (H), which was influenced by # of reinforced training trials and delay of reinforcement. Motivation was represented as Drive (D) (e.g., # hours of food deprivation) and Incentive (K) (e.g., reinforcer amount and quality).
Response strength = H x D x K

          E Schedules of Reinforcement
Intermittent Reinforcement—Often, we can't reinforce each time an appropriate response is emitted. 

What is the effect of such reinforcement inconsistency on behavior?

It is clear that instrumental learning is not dependent on continuous reinforcement. In most species, instrumental responding develops quite efficiently even when reinforcement occurs only intermittently.

                        

Ratio schedules—deliver reinforcer only after a  certain number of responses

                        Fixed Ratio (FR)

                        Variable Ratio (VR)

  Interval schedules—deliver reinforcer only for the first response that occurs after a period of time has elapsed.  

                        Fixed interval (FI)  

                        Variable interval (VI)  

 Self-Control--capacity to inhibit immediate gratification in exchange for a larger reward in the long run

Types of Reinforcement

Primary Reinforcer—events that are capable of producing behavior changes naturally without the benefit of any prior learning.

Secondary Reinforcer—events that function as reinforcers because of their consistent association with one or more primary reinforcers.

Theories of Reinforcement

Generally, theories explaining why reinforcers work fall into one of the following two categories:

1) Stimulus-based theories--something about the particular stimulus makes it reinforcing

--Hull's Drive-Reduction Hypothesis--stimuli that reduce drives are capable of reinforcing emitted behavior

Learning is adaptive & biological in nature
An internal deficiency (need) causes an energized motivation state (a drive, e.g., hunger, thirst). The drive then activates a series of actions (responses) in order to attain a goal (such as food or water). Then when the drive is satisfied, the organism should be relatively inactive until another drive is activated by a need.

One problem---Organisms tend to SEEK stimulation (Kish, 1966)

Conclusion--Events that reduce drives will function as reinforcers, but an event need not reduce a drive in order to function as a reinforcer.

--Incentive motivation--reinforcers increase drive, reinforcers pull the organism toward them and elicit certain behaviors

2) Response-Based Theories of Reinforcement --reinforcement depends on the response made possible by some reinforcer 

Premack Principle (1965, 1971)--Any kind of emitted response can serve to reinforce operant behavior. Specifically, any response that is preferred by an organism can serve to reinforce the performance of a less preferred response....Reinforcement is relative...

F.  Constraints on Response Learning
So as long as we immediately reinforce desired behavior, we should be able to condition any organism to do any behavior, right?

Breland & Breland--"The misbehavior of organisms"

IV. Punishment
Read p. 134 - 139 for assignment 2

V.  Negative Reinforcement Situations

    A. Escape Learning

          Amount of Reinforcement—In escape learning, the amount of reinforcement corresponds to the degree to which the aversive stimulation is reduced after a successful response.

Campbell & Kraeling (1953)—exposed all rats to the same shock intensity in the runway and reduced this shock by varying degrees in the safe box.

 That is, all animals received shock reduction after responding, but most animals still received SOME shock in the safe box.

 The speed of the escape response was a direct function of the degree to which shock was reduced.

 **Escape learning depends more on the amount of negative reinforcement (degree to which aversive stimulation is reduced) than on the intensity of the aversive stimulus per se.

 

Delay of Reinforcementtime between the escape response and the reduction of the aversive stimulation.

 Fowler & Trapold (1962)—exposed rats to shock in an alleyway and required the animals to run to a safe box to escape the shock.

The delay in shock offset varied between 0 and 16 seconds.

 

  Results-- escape speeds decreased as the delay of shock offset increased. 

          B. Avoidance Learning—A stimulus signals the presentation of the aversive event. The organism must respond in the presence of the signal in order to avoid the aversive event.

 

Two stimuli—the signal and the aversive stimulus

The characteristics of both of these stimuli are important determinants of avoidance learning.

 

                   Intensity of Aversive Stimulus--If the intensity of the aversive stimulus (e.g., shock) is too strong, avoidance conditioning is actually slowed

 

                   Signal-Aversive Stimulus Interval--Avoidance learning appears to occur most efficiently when the signal and the aversive stimulus overlap, as is the case in a delayed-conditioning procedure.  

 

                    Duration of the signal before onset of aversive event--In general, signals of longer duration tend to facilitate the learning of the avoidance response (Low & Low, 1962).

                     Termination of the Signal--Even when the organism’s response results in avoidance of the aversive stimulus, the rate of learning depends on whether the response also leads to the termination of the signal.

Kamin (1957) conducted a two-way avoidance experiment using rats. All rats were able to avoid shock by moving to the safe chamber during the signal-shock interval. For one group, the signal terminated as soon as the avoidance response occurred (o-second delay). In the other three groups, the signal ended either 2.5, 5, or 10 seconds after the avoidance response.

 

 Classical Conditioning and Instrumental Conditioning in Avoidance Learning

           A. Watson-Mowrer Two-Process Theory

Mowrer (1947) proposed that avoidance learning involved two processes--(1) classical conditioning and (2) instrumental conditioning.

(Part 1) Dangerous, painful, aversive stimuli (US) cause an innate fear response (UR). Other stimuli present at the time get associated with fear through classical conditioning. When these other stimuli (CSs) are encountered again, they evoke a fear response (CR).

(Part 2) The presence of fear and all of its visceral effects is aversive.  Any response that removes these fear-evoking stimuli will be negatively reinforced. The avoidance response, therefore, is reinforced through instrumental conditioning.

The avoidance paradigm—Solomon and Wynne (1953)

 

 

What was a typical trial like?

What constituted a session? 

Describe the behavior of the dogs.

 

What reinforces avoidance behavior?

It was easy to understand how the escape behavior persisted. The termination of the aversive stimulus would be rewarding. Escape behavior was maintained through negative reinforcement.

But how was the avoidance response maintained? The animal continues to respond even though it no longer receives aversive stimulation, but why?

 

Since the electric shock (US) was painful and produced an innate fear (UR) response (increased heart rate, sweating, etc.), the dark side became a feared stimulus (CS) that led to innate fear (CR) (increased heart rate, sweating, etc.). These visceral responses are unpleasant so any response that caused their reduction or elimination (i.e., escape or avoidance) would be reinforced. Mowrer argued that the termination or reduction of fear stimuli negatively reinforced the avoidance response. 

B. Support for Two-Process Theory

Two-process theory predicts that the avoidance responding will be learned only to the extent that the warning signal terminates when a response is made.

 

Kamin (1957)--trained four groups of rats in a two-chamber avoidance apparatus    

 

The figure shows that a significant amount of avoidance responding occurred in the first group only (response terminates signal and enables animal to avoid shock).

As predicted by two-factor theory, avoidance responding was poor in the group that was able to avoid shock but could not terminate the signal.  

We know that delaying the onset of reinforcement reduces the effectiveness of reward. So it should be possible to reduce the level of reinforcement by introducing a delay between the avoidance response and termination of the feared stimulus. 

4 conditions--After the avoidance response, the CS was terminated

(1) immediately

(2) 2.5 seconds after the response

(3) 5 seconds after the response

(4) or 10 seconds after the response

See Kamin (1957) results graphed above 

 

As predicted, the animals in the zero-delay condition successfully avoided shock on over 80% of the trials. Animals in the 10-second delay condition avoided shock on fewer than 10% of the trials. 

C.  Learned Helplessness
Read p. 146-151 for assignment 2