The Experimental Analysis of Behavior
By B. F. Skinner
The 1957 American Scientist article, reproduced in full
The 1957 American Scientist article, reproduced in full
DOI: 10.1511/2012.94.54
Not so long ago the expression “a science of behavior” would have been regarded as a contradiction in terms. Living organisms were distinguished by the fact that they were spontaneous and unpredictable. If you saw something move without being obviously pushed or pulled, you could be pretty sure it was alive.
This was so much the case that mechanical imitations of living things—singing birds which flapped their wings, figures on a clock tolling a bell—had an awful fascination which, in the age of electronic brains and automation, we cannot recapture or fully understand. One hundred and fifty years of science and invention have robbed living creatures of this high distinction.
Science has not done this by creating truly spontaneous or capricious systems. It has simply discovered and used subtle forces which, acting upon a mechanism, give it the direction and apparent spontaneity which make it seem alive. Similar forces were meanwhile being discovered in the case of the living organism itself. By the middle of the seventeenth century it was known that muscle, excised from a living organism and out of reach of any “will,” would contract if pinched or pricked or otherwise stimulated, and during the nineteenth century larger segments of the organism were submitted to a similar analysis. The discovery of the reflex, apart from its neurological implications, was essentially the discovery of stimuli—of forces acting upon an organism which accounted for part of its behavior.
For a long time the analysis of behavior took the form of the discovery and collection of reflex mechanisms. Early in the present century, the Dutch physiologist Rudolph Magnus [1], after an exhaustive study of the reflexes involved in the maintenance of posture, put the matter this way: when a cat hears a mouse, turns toward the source of the sound, sees the mouse, runs toward it, and pounces, its posture at every stage, even to the selection of the foot which is to take the first step, is determined by reflexes which can be demonstrated one by one under experimental conditions. All the cat has to do is to decide whether or not to pursue the mouse; everything else is prepared for it by its postural and locomotor reflexes.
To pursue or not to pursue is a question, however, which has never been fully answered on the model of the reflex, even with the help of Pavlov’s principle of conditioning. Reflexes—conditioned or otherwise—are primarily concerned with the internal economy of the organism and with maintaining various sorts of equilibrium. The behavior through which the individual deals with the surrounding environment and gets from it the things it needs for its existence and for the propagation of the species cannot be forced into the simple all-or-nothing formula of stimulus and response. Some well-defined patterns of behavior, especially in birds, fish, and invertebrates are controlled by “releasers” which suggest reflex stimuli [2], but even here the probability of occurrence of such behavior varies over a much wider range, and the conditions of which that probability is a function are much more complex and subtle. And when we come to that vast repertoire of “operant” behavior which is shaped up by the environment in the lifetime of the individual, the reflex pattern will not suffice at all.
In studying such behavior we must make certain preliminary decisions. We begin by choosing an organism—one which we hope will be representative but which is first merely convenient. We must also choose a bit of behavior—not for any intrinsic or dramatic interest it may have, but because it is easily observed, affects the environment in such a way that it can be easily recorded, and for reasons to be noted subsequently, may be repeated many times without fatigue. Thirdly, we must select or construct an experimental space which can be well controlled.
These requirements are satisfied by the situation shown in Figure 1. A partially sound-shielded aluminum box is divided into two compartments. In the near compartment a pigeon, standing on a screen floor, is seen in the act of pecking a translucent plastic plate behind a circular opening in the partition. The plate is part of a delicate electric key; when it is pecked, a circuit is closed to operate recording and controlling equipment. Colored lights can be projected on the back of the disk as stimuli. The box is ventilated, and illuminated by a dim ceiling light.
We are interested in the probability that in such a controlled space the organism we select will engage in the behavior we thus record. At first blush, such an interest may seem trivial. We shall see, however, that the conditions which alter the probability, and the processes which unfold as that probability changes, are quite complex. Moreover, they have an immediate, important bearing on the behavior of other organisms under other circumstances, including the organism called man in the everyday world of human affairs.
Probability of responding is a difficult datum. We may avoid controversial issues by turning at once to a practical measure, the frequency with which a response is emitted. The experimental situation shown in Figure 1 was designed to permit this frequency to vary over a wide range. In the experiments to be described here, stable rates are recorded which differ by a factor of about 600. In other experiments, rates have differed by as much as 2000:1. Rate of responding is most conveniently recorded in a cumulative curve. A pen moves across a paper tape, stepping a short uniform distance with each response. Appropriate paper speeds and unit steps are chosen so that the rates to be studied give convenient slopes.
Among the conditions which alter rate of responding are some of the consequences of behavior. Operant behavior usually affects the environment and generates stimuli which “feed back” to the organism. Some feedback may have the effects identified by the layman as reward and punishment. Any consequence of behavior which is rewarding or, more technically, reinforcing, increases the probability of further responding. Unfortunately, a consequence which is punishing has a much more complex result [3]. Pecking the key in our experimental space has certain natural consequences. It stimulates the bird tactually and auditorily, and such stimulation may be slightly reinforcing. We study the effect more expediently, however, by arranging an arbitrary consequence which is clearly so. For example, food is reinforcing to a hungry pigeon (for our present purposes we need not inquire why this is so), and we therefore arrange to present food with a special magazine. When a solenoid is energized, a tray containing a mixture of grains is brought into position in the square opening below the key in Figure 1, where the pigeon has access to the grain for, say, four seconds.
We can demonstrate the effect of operant reinforcement simply by connecting the key which the pigeon pecks to the solenoid which operates the food tray. A single presentation of food, following immediately upon a response, increases the rate with which responses to the key are subsequently emitted so long as the pigeon remains hungry. By reinforcing several responses, we may create a high probability of responding. If the magazine is now disconnected, the rate declines to, and may even go below, its original level. These changes are the processes of operant conditioning and extinction, respectively. More interesting phenomena are generated when responses are merely intermittently reinforced. It is characteristic of everyday life that few of the things we do always “pay off.” The dynamic characteristics of our behavior depend upon the actual schedules of reinforcement.
The effects of intermittent reinforcement have been extensively studied in the laboratory [3] [4]. A common sort of intermittency is based on time. Reinforced responses can be spaced, say, ten minutes apart. When one reinforcement is received, a timer is started which opens the reinforcing circuit for ten minutes; the first response after the circuit is closed is reinforced. When an organism is exposed to this schedule of reinforcement for many hours, it develops a characteristic performance which is related in a rather complex way to the schedule. A short sample of such a performance is shown in Figure 2, obtained with a cumulative recorder. The scales and a few representative speeds are shown in the lower right-hand corner. The experimental session begins at a. The first reinforcement will not occur until ten minutes later, and the bird begins at a very low rate of responding. As the 10-minute interval passes, the rate increases, accelerating fairly smoothly to a terminal rate at reinforcement at b. The rate then drops to zero. Except for a slight abortive start at c, it again accelerates to a high terminal value by the end of the second 10-minute interval. A third fairly smooth acceleration is shown at d. (At e the pen instantly resets to the starting position on the paper.) The over-all pattern of performance on a “fixed interval” schedule is a fairly smoothly accelerating scallop in each interval, the acceleration being more rapid the longer the initial pause. Local effects due to separate reinforcements are evident, however, which cannot be discussed here for lack of space [4]. If the intervals between reinforcements are not fixed, the performance shown in Figure 2 cannot develop.
If the length of interval is varied essentially at random, responding occurs at a single rate represented by a constant slope in the cumulative record. Two examples are shown in Figure 3. In the upper curve, a hungry pigeon is reinforced with grain on a variable-interval schedule, where the mean interval between reinforcements is 3 minutes. Reinforcements occur where marked by pips. In the lower curve a hungry chimpanzee, operating a toggle switch, is reinforced on the same schedule with laboratory food. The over-all rate under variable-interval reinforcement is a function of the mean interval, of the level of food-deprivation, and of many other variables. It tends to increase slowly under prolonged exposure to any one set of conditions. The constant rate itself eventually becomes an important condition of the experiment and resists any change to other values. For this reason the straight lines of Figure 3 are not as suitable for baselines as might be supposed.
Reinforcements may be scheduled with a counter instead of a timer. For example, we may maintain a fixed ratio between responses and reinforcements. In industry this schedule is referred to as piecework or piece-rate pay. Anyone who has seen workers paid on such a schedule is familiar with some features of the performance generated: a high rate is sustained for long periods of time. For this reason, the schedule is attractive to employers, but it is generally recognized that the level of activity generated is potentially dangerous and justified only in seasonal or other periodic employment.
Performances of a pigeon under fixed-ratio reinforcement are shown in Figure 4. In the left-hand record reinforcements occur every 210 responses (at a, b, c, and elsewhere). The over-all rate is high. Most of the pauses occur immediately after reinforcement. At the right is the performance generated when the pigeon pecks the key 900 times for each reinforcement. This unusually high ratio was reached in some experiments in the Harvard Psychological Laboratories by W. H. Morse and R. J. Herrnstein. A short pause after reinforcement is the rule.
A variable-ratio schedule programmed by a counter corresponds to the variable-interval schedule programmed by a timer. Reinforcement is contingent on a given average number of responses but the numbers are allowed to vary roughly at random. We are all familiar with this schedule because it is the heart of all gambling devices and systems. The confirmed or pathological gambler exemplifies the result: a very high rate of activity is generated by a relatively slight net reinforcement. Where the “cost” of a response can be estimated (in terms, say, of the food required to supply the energy needed, or of the money required to play the gambling device), it may be demonstrated that organisms will operate at a net loss.
When the food magazine is disconnected after intermittent reinforcement, many more responses continue to occur than after continuous reinforcement. After certain schedules, the rate may decline in a smoothly accelerated extinction curve. After other schedules, when the rate itself enters prominently into the experimental conditions, it may oscillate widely. The potential responding built up by reinforcement may last a long time. The writer has obtained extinction curves six years after prolonged reinforcement on a variable-ratio schedule [5]. Ratio schedules characteristically produce large numbers of responses in extinction. After prolonged exposure to a ratio of 900:1 (Figure 4) the bird was put in the apparatus with the magazine disconnected. During the first 4-1/2 hours, it emitted 73,000 responses.
Interval and ratio schedules have different effects for several reasons. When a reinforcement is scheduled by a timer, the probability of reinforcement increases during any pause, and first responses after pauses are especially likely to be reinforced. On ratio schedules responses which are part of short runs are likely to be reinforced. Moreover, when a given schedule of reinforcement has had a first effect, the performance which develops becomes itself an important part of the experimental situation. This performance, in combination with the schedule, arranges certain probable conditions at the moment of reinforcement. Sometimes a schedule produces a performance which maintains just those conditions which perpetuate the performance. Some schedules generate a progressive change. Under still other schedules the combination of schedule and performance yields conditions at reinforcement which generate a different performance, which in turn produces conditions at reinforcement which restore the earlier performance.
Charles B. Ferster and the writer have checked this explanation of the effect of schedules by controlling conditions more precisely at the moment of reinforcement [4]. For example, we guaranteed that all reinforced responses would be preceded by pauses instead of making this condition merely probable under an interval schedule. In a variable interval performance, such as that shown in Figure 3, it is not difficult to find responses which are preceded by, say, 3-second pauses. We can arrange that only such responses will be reinforced without greatly disturbing our schedule. When this is done, the slope of the record immediately drops. On the other hand, we may choose to reinforce responses which occur during short rapid bursts of responding, and we then note an immediate increase in rate.
If we insist upon a very long pause, we may be able to reinforce every response satisfying these conditions and still maintain a very low rate. The differential reinforcement of low rates was first studied by Douglas Anger in the Harvard Laboratories. Wilson and Keller at Columbia have reported an independent investigation [6]. Recently W. H. Morse and the writer have studied the effect of relatively long enforced pauses. Figure 5 shows the performance obtained in one such experiment. Any response which followed a pause at least 3 minutes in duration was reinforced. Whenever a response was made before 3 minutes had elapsed, the timer was reset and another 3-minute pause required. Under these conditions a very low stable rate of responding obtains. The figure shows a continuous performance (cut into segments for easier reproduction) in a single experimental session of 143 hours, during which time the pigeon received approximately 250 reinforcements. At no time did it pause for more than 15 minutes, and it seldom paused for more than 5 minutes.
The situation under this schedule is inherently unstable. Rate of responding increases with the severity of food deprivation and decreases as the bird becomes satiated. Let us assume that at some time during the experiment, say, at a in Figure 5, reinforcements are occurring too infrequently to maintain the bird’s body weight. The bird is operating, so to speak, at a loss. The increasing deprivation then increases the rate of responding and makes it even less likely that the pigeon will wait 3 minutes in order to respond successfully for reinforcement. Nothing but starvation lies ahead in that direction. If, on the other hand, the bird is receiving slightly more reinforcements than necessary to maintain body weight, the level of deprivation will be decreased. This will produce a lower rate of responding, which in turn means that the 3-minute pause is more frequently satisfied and reinforcements still more frequently received. In such a case the result is a fully satiated bird, and the experiment must be brought to a close. This actually happened at b in Figure 5 where reinforcements had become so frequent that the bird was rapidly gaining weight. This inherent instability can be corrected by changing the required pause in terms of the organism’s performance. If the over-all rate of reinforcement begins to drift in either direction, the required pause may be appropriately changed. Thus the experiment in Figure 5 could have been continued if at point c, say, the required interval had been increased to 4 minutes. By an appropriate adjustment of the interval, we have been able to keep a pigeon responding continuously for 1500 hours—that is, 24 hours a day, 7 days a week, for approximately 2 months. Pigeon breeders have said that pigeons never sleep (roosting is merely a precautionary device against blind flying), and the statement seems to be confirmed by experiments of the present sort.
By differentially reinforcing high rates of responding, pigeons have been made to respond as rapidly as 10 or 15 responses per second. Here technical problems become crucial. It is not difficult to construct a key which will follow rapid responding, but the topography of the behavior itself changes. The excursions of head and beak become very small, and it is doubtful whether any single “response” can be properly compared with a response at a lower rate.
In their study of different kinds of schedules of reinforcement, Ferster and the writer found that it was possible to set up several performances in a single pigeon by bringing each one of them under stimulus control. Several different colored lights were projected on the translucent key and responses were reinforced on several corresponding schedules. Figure 6 shows a typical performance under such a multiple schedule of reinforcement. When the key was red, the pigeon was reinforced on a 6-minute fixed-interval schedule. The usual interval scallops are seen, as at a and b. When the key was green, the pigeon was reinforced upon completing 60 responses (a fixed ratio of 60:1). The usual ratio high rate is shown as at c and d. When the key was yellow, reinforcements followed a variable-interval schedule where a pause of 6 seconds was required. The resulting low steady performance is shown at e, f, and elsewhere. In one experiment we were able to show nine different performances under the control of nine different patterns on the key.
The experiment may be complicated still further by introducing more than one key and by reinforcing on two or more schedules concurrently. An example of the resulting performances is shown in Figure 7, from some research by Ferster, at the Yerkes Laboratories for Primate Biology at Orange Park, Florida. In Ferster’s experiment, a chimpanzee operates two toggle switches, one with each hand. Responses with the right hand are reinforced on a fixed ratio of approximately 210:1, and the performance recorded from the right toggle switch is shown in the upper part of Figure 7. As usual in many ratio performances, pauses occur after reinforcements. Responses with the left hand are at the same time being reinforced on a variable-interval schedule with a mean interval of 5 minutes, and the performance is shown in the lower part of the figure. There is some interaction between the performances, for reinforcements in the variable-interval record usually correspond to slight pauses in the ratio performance. In general, however, the experiment shows a remarkable independence of two response systems in a single organism.
In speaking about colors projected on the key or the fact that a key is on the right or left, we are, of course, talking about stimuli. Moreover, they are stimuli which act prior to the appearance of a response and thus occur in the temporal order characteristic of the reflex. But they are not eliciting stimuli; they merely modify the probability that a response will occur, and they do this over a very wide range. The general rule seems to be that the stimuli present at the moment of reinforcement produce a maximal probability that the response will be repeated. Any change in the stimulating situation reduces the probability. This relationship is beautifully illustrated in some experiments by Norman Guttman [7] and his colleagues at Duke University on the so called stimulus generalization gradient. Guttman makes use of the fact that, after a brief exposure to a variable-interval schedule, a large number of responses will be emitted by the organism without further reinforcement (the usual extinction curve) and that, while these are being emitted, it is possible to manipulate the stimuli present and to determine their relative control over the response without confusing the issue by further reinforcement. In a typical experiment, for example, a monochromatic light with a wave length of 550 millimicrons was projected on the key during variable-interval reinforcement. During extinction, monochromatic lights from other parts of the visible spectrum were projected on the key for short periods of time, each wavelength appearing many times and each being present for the same total time. Simply by counting the number of responses made in the presence of each wavelength, Guttman and his colleagues have obtained stimulus generalization gradients similar to those shown in Figure 8. The two curves represent separate experiments. Each is an average of measurements made on six pigeons. It will be seen that during extinction responding was most rapid at the original wavelength of 550 millimicrons. A color differing by only 10 millimicrons controls a considerably lower rate of responding. The curves are not symmetrical. Colors toward the red end of the spectrum control higher rates than those equally distant the violet end. With this technique Guttman and his colleagues have studied gradients resulting from reinforcement at two points in the spectrum, gradients surviving after a discrimination has been set up by reinforcing one wavelength and extinguishing another, and so on.
The control of behavior achieved with methods based upon rate of responding has given rise to a new psychophysics of lower organisms. It appears to be possible to learn as much about the sensory processes of the pigeon as from the older “introspective” methods with human subjects. An important new technique of this sort is due to D. S. Blough [8]. His ingenious procedure utilizes the apparatus shown in Figure 9. A pigeon, behaving most of the time in total darkness, thrusts its head through an opening in a partition at a, which provides useful tactual orientation. Through the small opening b, the pigeon can sometimes see a faint patch of light indicated by the word Stimulus. (How this appears to the pigeon is shown at the right.) The pigeon can reach and peck two keys just below the opening b, and it is sometimes reinforced by a food magazine which rises within reach at c. Through suitable reinforcing contingencies Blough conditions the pigeon to peck Key B whenever it can see the light and Key A whenever it cannot. The pigeon is occasionally reinforced for pecking Key A by the presentation of food (in darkness). Blough guarantees that the pigeon cannot see the spot of light at the time this response is made because no light at all is then on the key. By a well established principle of “chaining,” the pigeon is reinforced for pecking Key B by the disappearance of the spot of light. This suffices to keep responses to both keys in strength.
A further fact about the apparatus is that Key B automatically reduces the intensity of the spot of light, while Key A increases it. Suppose, now, that a pigeon is placed in a brightly lighted space for a given interval of time and then put immediately into the apparatus. The spot of light is at an intensity in the neighborhood of the bright adapted threshold. If the pigeon can see the spot, it pecks Key B until it disappears. If it cannot see the spot, it pecks Key A until it appears. In each case it then shifts to the other key. During an experimental session of one hour or more, it holds the spot of light very close to its threshold value, occasionally being reinforced with food. The intensity of the light is recorded automatically. The result is the “dark-adaptation curve” for the pigeon’s eye. Typical curves show a break as the dark adaptation process shifts from the cone elements in the retina to the rods.
By repeating the experiment with a series of monochromatic lights, Blough has been able to construct spectral sensitivity curves for the pigeon which are as precise as those obtained with the best human observers. An example is shown in Figure 10, where data for the pigeon are compared with data for an aphakic human [9]—one who has had a crystalline lens removed for medical reasons. Such a person sees violet light more sensitively than normal subjects because the light is not absorbed by the lens. Even with this advantage the human observer is no more sensitive to light at the violet end of the spectrum than the pigeon. The discontinuities in the photopic curves (the lower set of open circles) of the pigeon appear to be real. The surprising correspondence in the scotopic curves (after dark adaptation, and presumably mediated by the rods) is remarkable when we recall that the avian and mammalian eye parted company in the evolutionary scale of things many millions of years ago.
So far our data have been taken from the pleasanter side of life—from behavior which produces positive consequences. There are important consequences of another sort. Much of what we do during the day is done not because of the positive reinforcements we receive but because of aversive consequences we avoid. The whole field of escape, avoidance, and punishment is an extensive one, but order is slowly being brought into it. An important contribution has been the research of Murray Sidman [10] on avoidance behavior. In the Sidman technique, a rat is placed in a box the floor of which is an electric grid through which the rat can be shocked. The pattern of polarity of the bars of the grid is changed several times per second so that the rat cannot find bars of the same sign to avoid the shock. In a typical experiment a shock occurs every 20 seconds unless the rat presses the lever, but such a response postpones the shock for a full 20 seconds. These circumstances induce a rat to respond steadily to the lever, the only reinforcement being the postponement of shock. The rat must occasionally receive a shock—that is, it must allow 20 seconds to pass without a response—if the behavior is to remain in strength. By varying the intervals between shocks, the time of postponement, and various kinds of warning stimuli, Sidman has revealed some of the important properties of this all-too-common form of behavior.
A sample of behavior which W. H. Morse and the writer obtained with the Sidman procedure is shown in Figure 11. Here both the interval between shocks and the postponement time were 8 seconds. (White space has been cut out of the record and the separate segments brought together to facilitate reproduction.) The records report a 7-hour experimental session during which about 14,000 responses were emitted. Occasional shocks are indicated by the downward movements of the pen (not to be confused with the fragments of the reset line). A significant feature of the performance is the warm-up at a. When first put into the apparatus the rat “takes” a number of shocks before entering upon the typical avoidance pattern. This occurs whenever a new session is begun. It may indicate that an emotional condition is required for successful avoidance behavior. The condition disappears between sessions and must be reinstated. The figure shows considerable variation in over-all rate and many local irregularities. At times small groups of shocks are taken, suggesting a return to the warm-up condition.
The consequences of behavior, whether positive or negative, and the control acquired by various stimuli related to them do not exhaust the variables of which behavior is a function. Others lie in the field commonly called motivation. Food is a reinforcement only to a hungry organism. In practice this means an organism whose body weight has been reduced substantially below its value under free feeding. Reinforcing stimuli are found in other motivational areas. Responding to a key can be reinforced with water when the organism is deprived of water, with sexual contact when the organism has been sexually deprived, etc. The level of deprivation is in each case an important condition to be investigated. How does food deprivation increase the rate of eating or of engaging in behavior reinforced with food? How does satiation have the opposite effect? The first step toward answering such questions is an empirical study of rate of responding as a function of deprivation. An analysis of the internal mechanisms responsible for the relations thus discovered may require techniques more appropriately employed in other scientific disciplines.
An example of how the present method may be applied to a problem in motivation is an experiment by Anliker and Mayer [11] on the familiar and important problem of obesity. Obese animals eat more than normal, but just how is their ingestive behavior disrupted? Anliker and Mayer have studied several types of normal and obese mice. There are strains of mice in which the abnormality is hereditary: some members of a litter simply grow fat. A normal mouse may be made obese by poisoning it with goldthioglucose or by damaging the hypothalamus. The food getting behavior of all these types of obese mice can be observed in the apparatus shown in Figure 12. A fat mouse is shown depressing a horizontal lever which projects from the partition in the box. On a fixed-ratio schedule, every 25th response produces a small pellet of food, delivered by the dispenser seen behind the partition. A supply of water is available in a bottle.
Each mouse was studied continuously for several days. The resulting cumulative curves (Figure 13) show striking differences among the patterns of ingestion. Curve C shows normal cyclic changes in rate. The nonobese mouse eats a substantial part of its daily ration in a single period (as at a and 6), and for the rest of each day responds only at a low over-all rate. The result is a wave-like cumulative curve with 24 hour cycles. A mouse of the same strain made obese by goldthioglucose poisoning does not show this daily rhythm but continues to respond at a fairly steady rate (Curve A). The slope is no higher than parts of Curve C, but the mechanism which turns off ingestive behavior in a normal mouse appears to be inoperative. Curve B is a fairly similar record produced by a mouse of the same strain made obese by a hypothalamic lesion. Curves D and E are for litter mates from a strain containing a hereditary-obese factor. E is the performance of the normal member. Curve D, showing the performance of the obese member, differs markedly from Curves A and B. The hereditary obese mouse eats at a very high rate for brief periods, which are separated by pauses of the order of one or two hours. A different kind of disturbance in the physiological mechanism seems to be indicated.
Williams and Teitelbaum [12] have recently produced a fourth kind of obese animal, with an apparatus in which a rat must eat a small amount of liquid food to avoid a shock. The avoidance contingencies specified by Sidman and illustrated in Figure 11 are used to induce the rat to ingest unusually large amounts of even unpalatable food. A condition which may be called “behavioral obesity” quickly develops.
Other powerful variables which affect operant behavior are found in the field of pharmacology. Some drugs which affect behavior—alcohol, caffeine, nicotine, and so on—were discovered by accident and have had a long history. Others have been produced explicitly to yield such effects. The field is an active one (partly because of the importance of pharmacotherapy in mental illness) and available compounds are multiplying rapidly. Most of the behavioral drugs now available have effects which would be classified in the fields of motivation and emotion. There is no reason, however, why the effects of various contingencies of reinforcement could not be simulated by direct chemical action—why “intelligence” could not be facilitated or confusion or mental fatigue reduced. In any case, the behavior generated by various contingencies of reinforcement (including the control of that behavior via stimuli) are the base lines against which motivational and emotional effects are felt. The present technique for the study of operant behavior offers a quantitative, continuous record of the behavior of an individual organ ism, which is already being widely used—in industry as well as the research laboratory—in screening psychopharmacological compounds and investigating the nature of pharmacological effects.
An example is some research by Peter B. Dews [13], of the Department of Pharmacology of the Harvard Medical School. Dews has studied the effect of certain sedatives on the pigeon’s performance under a multiple fixed-interval fixed-ratio schedule. A standard base line obtained in a short daily experimental session is shown in the upper half of Figure 14. The pigeon is reinforced on a fixed-interval schedule when the key is red and on a fixed-ratio schedule when the key is green, the two schedules being presented in the order: one interval, one ratio, two intervals, ten ratios, two intervals, four ratios. In addition to the usual characteristics of the multiple performance, this brief program shows local effects which add to its usefulness as a base line. For example, the period of slow responding after reinforcement is greater when the preceding reinforcement has been on a ratio schedule—that is, the scallops at a and b are shallower than those at c and d. The effect of moderate doses of barbiturates, bromides, and other sedatives under a multiple fixed-interval fixed-ratio schedule is to destroy the interval performance while leaving the ratio performance essentially untouched. The lower half of Figure 14 was recorded on the day following the upper half. Three milligrams of chlorpromazine had been injected 2.5 hours prior to the experiment. The tranquilizing effect of chlorpromazine develops only with repeated doses; what is shown here is the immediate effect of a dose of this magnitude, which is similar to that of a sedative.. It will be seen that the ratios survive (at e, /, and g) but that the interval performances are greatly disturbed. There is responding where none is expected, as at A, but not enough where a rapid rate usually obtains. This fact provides a useful screening test, but it also throws important light on the actual nature of sedation. The difference between intervals and ratios may explain some instances in which sedatives appear to have inconsistent effects on human subjects.
The interval performance is also damaged by chlorpromazine in a different type of compound schedule. Ferster and the writer have studied the effect of concurrent schedules in which two or more control ling circuits set up reinforcements independently. In one experiment a rat was reinforced with food at fixed intervals of 10 minutes and also by the avoidance of shock, where a shock occurred every 20 seconds unless postponed for 20 seconds by a response to a lever. The normal result of this concurrent schedule is shown in the upper part of Figure 15. When the rat is “working for food and to avoid a shock,” its performance suggests the usual interval scallop tilted upward so that instead of pausing after reinforcement, the rat responds at a rate sufficient to avoid most shocks. A one-milligram dose of chlorpromazine immediately before the experiment has the effect shown in the lower part of the figure. The interval performance is eliminated, leaving the slow steady responding characteristic of avoidance conditioning.
Drugs which alter emotional conditions may be studied by examining the effect of the emotional variable upon operant behavior. An example is the condition usually called “anxiety.” Many years ago Estes and the writer [14] showed that the normal performance under fixed-interval reinforcement was suppressed by a stimulus which characteristically preceded a shock. In our experiment, a rat was reinforced on a fixed interval schedule until a stable base line developed. A stimulus was then introduced for 3 minutes and followed by a shock to the feet of the rat. In later presentations the stimulus began to depress the rate of responding—an effect comparable to the way in which “anxiety” interferes with the daily behavior of a man. Hunt and Brady [is] have shown that some of the “treatments” for human anxiety (for example, electroconvulsive shock) temporarily eliminate the conditioned suppression in such experiments. Brady has recently applied this technique to the study of tranquilizing drugs. In his experiment, a rat is reinforced on a variable interval schedule until responding stabilizes at a constant intermediate rate. Stimuli are then presented every 10 minutes. Each stimulus lasts for 3 minutes and is followed by a shock. Conditioned suppression soon appears. In Figure 16 each simple arrow shows the onset of the stimulus. In order to isolate the performance in the presence of the stimulus, the record is displaced downward. In the saline control, shortly after the onset of the stimulus, the rate falls to zero, as shown by the horizontal portions of the displaced segments. As soon as the shock is received (at the broken arrows, where the pen returns to its normal position), responding begins almost immediately at the normal rate. The base line between stimuli is not smooth because a certain amount of chronic anxiety develops under these circumstances. A suitable dose of a stimulant such as amphetamine has the effect of increasing the over-all rate, as seen in the middle part of Figure 16. The sup pressing stimulus is, if anything, more effective. A course of treatment with reserpine, another tranquilizer, has the effect of slightly depressing the over-all rate but restoring responding during the formerly suppressing stimulus. Thus, in the lower part of Figure 16, the slopes of the displaced segments of the record are of the same order as the over-all record itself. The reserpine has eliminated an effect which, from a similarity of inciting causes, we may perhaps call anxiety.
Another field in which important variables affecting behavior are studied is neurology. Performances under various schedules of reinforcement supply base lines which are as useful here as in the field of psychopharmacology. The classical pattern of research is to establish a performance containing features of interest, then to remove or damage part of the nervous system, and later to have another look at the behavior. The damaged performance shows the effect of the lesion and helps in inferring the contribution of the area to normal behavior.
The procedure is, of course, negative. Another possibility is that neurological conditions may be arranged which will have a positive effect. A step in this direction has been taken by James Olds [17] with his discovery that weak electrical stimulation of certain parts of the brain, through permanently implanted electrodes, has an effect similar to that of positive reinforcement. In one of Olds’ experiments, a rat presses a lever to give itself mild electrical stimulation in the anterior hypothalamus. When every response is so “reinforced,” behavior is sustained in strength for long periods of time. One of Olds’ results is shown in Figure 17. The electrical “reinforcement” was begun shortly after noon. The rat responded at approximately 2000 responses per hour throughout the day and night until the following noon. There are only three or four brief pauses during this period. When the experiment was continued the following day, however, the rat fell asleep and slept for 20 hours. Then it awoke and began again at approximately the same rate. Although there remain some puzzling differences between behavior so reinforced and behavior reinforced with food, Olds’ discovery is an important step toward our understanding of the physiological mechanisms involved in the operation of the environmental variable. A similar reinforcing effect of brain stimulation has been found in cats by Sidman, Brady, Boren, and Conrad [18] and in monkeys by Lilly, of the National Institutes of Health, and Brady, in the laboratories of the Walter Reed Army Institute of Research.
What about man? Is rate of responding still an orderly and meaningful datum here, or is human behavior the exception in which spontaneity and caprice still reign? In watching experiments of the sort described above, most people feel that they could “figure out” a schedule of reinforcement and adjust to it more efficiently than the experimental organism. In saying this, they are probably overlooking the clocks and calendars, the counters and the behavior of counting, with which man has solved the problem of intermittency in his environment. But if a pigeon is given a clock or a counter, it works more efficiently [19], and without these aids man shows little if any superiority.
Parallels have already been suggested between human and infrahuman behavior in noting the similarity of fixed-ratio schedules to piece-rate pay and of variable ratios to the schedules in gambling devices. These are more than mere analogies. Comparable effects of schedules of reinforcement in man and the other animals are gradually being established by direct experimentation. An example is some work by James Holland [20] at the Naval Research Laboratories on the behavior of observing. We often forget that looking at a visual pattern or listening to a sound is itself behavior, because we are likely to be impressed by the more important behavior which the pattern or sound controls. But any act which brings an organism into contact with a discriminative stimulus, or clarifies or intensifies its effect, is reinforced by this result and must be explained in such terms. Unfortunately mere “attending” (as in reading a book or listening to a concert) has dimensions which are difficult to study. But behavior with comparable effects is sometimes accessible, such as turning the eyes toward a page, tilting a page to bring it into better light, or turning up the volume of a phonograph. Moreover, under experimental conditions, a specific response can be reinforced by the production or clarification of a stimulus which controls other behavior. The matter is of considerable practical importance. How, for example, can a radar operator or other “lookout” be kept alert? The answer is: by reinforcing his looking behavior.
Holland has studied such reinforcement in the following way. His human subject is seated in a small room before a dial. The pointer on the dial occasionally deviates from zero, and the subject’s task is to restore it by pressing a button. The room is dark, and the subject can see the dial only by pressing another button which flashes a light for a fraction of a second. Pressing the second button is, then, an act which presents to the subject a stimulus which is important because it controls the behavior of restoring the pointer to zero.
Holland has only to schedule the deviations of the pointer to produce changes in the rate of flashing the light comparable to the performances of lower organisms under comparable schedules. In Figure 18, for example, the upper curve shows a pigeon’s performance on a fairly short fixed-interval. Each interval shows a rather irregular curvature as the rate passes from a low value after reinforcement to a high, fairly constant, terminal rate. In the lower part of the figure is one of Holland’s curves obtained when the pointer deflected from zero every three minutes. After a few hours of exposure to these conditions, the subject flashed the light (“looked at the pointer”) only infrequently just after a deflection, but as the interval passed, his rate accelerated, sometimes smoothly, sometimes abruptly, to a fairly constant terminal rate. (An interesting feature of this curve is the tendency to “run through” the reinforcement and to continue at a high rate for a few seconds after reinforcement before dropping to the low rate from which the terminal rate then emerges. Examples of this are seen at a, b, and c. Examples in the case of the pigeon are also seen at d and e. In their study of schedules, Ferster and the writer had investigated this effect in detail long before the human curves were obtained.)
Other experiments on human subjects have been conducted in the field of psychotic behavior. In a project at the Behavior Research Laboratories of the Metropolitan State Hospital, in Waltham, Massachusetts, a psychotic subject spends one or more hours each day in a small room containing a chair and an instrument panel as seen in Figure 19. At the right of the instrument board is a small compartment (a) into which reinforcers (candy, cigarettes, coins) are dropped by an appropriate magazine. The board contains a plunger (b), similar to that of a vending machine. The controlling equipment behind a series of such rooms is shown in Figure 20. Along the wall at the left, as at a, are seen four magazines, which can be loaded with various objects. Also seen are periscopes (as at b) through which the rooms can be observed through one-way lenses. At the right are cumulative recorders (as at c) and behind them panels bearing the controlling equipment which arranges schedules.
It has been found that even deteriorated psychotics of long standing can, through proper reinforcement, be induced to pull a plunger for a variety of reinforcers during substantial daily experimental sessions and for long periods of time. Schedules of reinforcement have the expected effects, but the fact that these organisms are sick is also apparent. In Figure 21, for example, the record at A shows a “normal” human performance on a variable-interval schedule where the subject (a hospital attendant) is reinforced with nickels on an average of once per minute. A straight line, similar to the records of the pigeon and chimpanzee in Figure 3, is obtained. Records B, C, and D are the performances of three psychotics on the same schedule working for the same reinforcers. Behavior is sustained during the session (as it is during many sessions for long periods of time), but there are marked deviations from straight lines. Periods of exceptionally rapid responding alternate with pauses or periods at a very low rate.
That a schedule is nevertheless effective in producing a characteristic performance is shown by Figure 22. A fixed-ratio performance given by a pigeon under conditions in which there is substantial pausing after reinforcement is shown at A. In spite of the pauses, the general rule holds: as soon as responding begins, the whole ratio is quickly run off. Fixed-ratio curves for two psychotic subjects, both severely ill, are shown at B and C. Only small ratios can be sustained (40 and 20, respectively), and pauses follow all reinforcements. Nevertheless, the performance is clearly the result of a ratio schedule: once responding begins, the complete ratio is run off.
It is unfortunate that a presentation of this sort must be confined to mere examples. Little more can be done than to suggest the range of application of the method and the uniformity of results over a fairly wide range of species. The extent to which we are moving toward a unified formulation of this difficult material cannot be properly set forth. Perhaps enough has been said, however, to make one point—that in turning to probability of response or, more immediately, to frequency of responding we find a datum which behaves in an orderly fashion under a great variety of conditions. Such a datum yields the kind of rigorous analysis which deserves a place in the natural sciences. Several features should not be overlooked. Most of the records reproduced here report the behavior of single individuals; they are not the statistical product of an “average organism.” Changes in behavior are followed continuously during substantial experimental sessions. They often reveal changes occurring within a few seconds which would be missed by any procedure which merely samples behavior from time to time. The properties of the changes seen in the cumulative curves cannot be fully appreciated in the non-instrumental observation of behavior. The reproducibility from species to species is a product of the method. In choosing stimuli, responses, and reinforcers appropriate to the species being studied, we eliminate the sources of many species differences. What emerges are dynamic properties of behavior, often associated with the central nervous system.
Have we been guilty of an undue simplification of conditions in order to obtain this level of rigor? Have we really “proved” that there is comparable order outside the laboratory? It is difficult to be sure of the answers to such questions. Suppose we are observing the rate at which a man sips his breakfast coffee. We have a switch concealed in our hand, which operates a cumulative recorder in another room. Each time our subject sips, we close the switch. It is unlikely that we shall record a smooth curve. At first the coffee is too hot and sipping is followed by aversive consequences. As it cools, positive reinforcers emerge, but satiation sets in. Other events at the breakfast table intervene. Sipping eventually ceases not because the cup is empty but because the last few drops are cold.
But although our behavioral curve will not be pretty, neither will the cooling curve for the coffee in the cup. In extrapolating our results to the world at large, we can do no more than the physical and biological sciences in general. Because of experiments performed under laboratory conditions, no one doubts that the cooling of the coffee in the cup is an orderly process, even though the actual curve would be very difficult to explain. Similarly, when we have investigated behavior under the advantageous conditions of the laboratory, we can accept its basic orderliness in the world at large even though we cannot there wholly demonstrate law.
In turning from an analysis of this sort many familiar aspects of human affairs take on new significance [21]. Moreover, as we might expect, scientific analysis gives birth to technology. The insight into human behavior gained from research of this sort has already proved effective in many areas. The application to personnel problems in industry, to psychotherapy, to “human relations” in general, is clear. The most exciting technological extension at the moment appears to be in the field of education. The principles emerging from this analysis, and from a study of verbal behavior based upon it, are already being applied in the design of mechanical devices to facilitate instruction in reading, spelling, and arithmetic in young children, and in routine teaching at the college level.
In the long run one may envisage a fundamental change in government itself, taking that term in the broadest possible sense. For a long time men of good will have tried to improve the cultural patterns in which they live. It is possible that a scientific analysis of behavior will provide us at last with the techniques we need for this task—with the wisdom we need to build a better world and through it better men.
Click "American Scientist" to access home page
American Scientist Comments and Discussion
To discuss our articles or comment on them, please share them and tag American Scientist on social media platforms. Here are links to our profiles on Twitter, Facebook, and LinkedIn.
If we re-share your post, we will moderate comments/discussion following our comments policy.