Neurobiological Basis of Drug Reward and Reinforcement

Introduction

Drug use disorders involve a number of factors including genetic and environmentally influenced predispositions, the actions of the drugs themselves, the immediate environment, and the neurobiological mechanisms that promote and support drug actions and addiction. This chapter deals mostly with the latter aspect of drug use, abuse, and addiction, as we explore the ways in which the brain is built to adapt to environmental circumstances, and how these aspects of neural function can promote the continued use and abuse of certain drugs and ultimately promote disorders related to these drugs. We then consider the mechanisms through which drugs of abuse interact with the brain systems that promote maladaptive drug use and addiction.

Drug use disorders have been defined in several different ways, most of which stress the habitual or compulsive nature of addictive behavior; the physical, psychological, and social damage produced by the behavior; and the trauma associated with cessation of the behavior. Drug use disorders also share many features in common with disorders related to natural biological drives (e.g., sex and food consumption), physical activities (e.g., excessive exercise), and relatively benign drugs (e.g., caffeine) (discussed in Brunton et al. and Koob and Le Moal ). Drugs of abuse with harmful effects are the main focus of the present discussion. In addition, excessive use of drugs may lead to health and psychological problems even in the absence of an agreed-upon definition of use disorders. Thus, it is important to understand the neural mechanisms that contribute to prolonged and maladaptive drug use.

The Diagnostic and Statistical Manual of Mental Disorders , Fifth Edition (DSM-5), covers substance-related and addictive disorders. It should be noted that addiction per se is not a diagnosis recommended in the manual, as the term substance use disorder is preferred. Within this classification, the manual defines substance use disorder as a “…cluster of cognitive, behavioral, and physiological symptoms indicating that the individual continues using the substance despite significant substance-related problems.” Common features of these disorders are cognitive, affective, sleep, and behavioral changes that center on use or cessation of use of the addictive substance or action (sometimes referred to as a habit). Central features of the use disorder are risky drug use, craving, social impairment, continued use despite negative consequences, and the possible negative consequences of cessation of drug use. All of these aspects of substance use and addictive disorders can be seen to relate to innate brain mechanisms underlying processes referred to as reinforcement and/or reward. In this context, it is important to discuss current ideas about the neural mechanisms of reinforcement and reward before discussing the impact of drugs of abuse on these processes.

Experimental Psychology Concepts of Reward and Reinforcement

An important aspect of the use and abuse of a wide range of drugs is their reinforcing properties. A reinforcer is defined in experimental psychology as a substance or stimulus presented following a behavior that increases the incidence of the behavior above baseline levels. As Skinner wrote:

The operation of reinforcement is defined as the presentation of a certain kind of stimulus in a temporal relation with either a stimulus or a response. A reinforcing stimulus is defined as such by its power to produce the resulting change [in the response]. There is no circularity about this; some stimuli are found to produce the change, others not, and they are classified as reinforcing and non-reinforcing accordingly.

It is worth highlighting the dual use of the term stimulus by Skinner to refer to both the result of the action (the reinforcer), and a stimulus within the environment that can become associated with the response and the reinforcer. One definition of reinforcement, although not Skinner’s position, is that it involves a strengthening of the ability of stimuli to elicit responses (the so-called stimulus-response model ), while a somewhat looser definition is a strengthening of the ability of the environment in general, including some neural activity within the animal itself and the animal’s past history in that environmental context, to elicit the response.

The concept of reinforcement is best known from the work of Konorski and Skinner on what is now called operant or instrumental conditioning. However, the term reinforcement has also been used in the context of Pavlovian, or classical, conditioning. One use of this term is that presentation of an unconditioned stimulus subsequent to the conditioned stimulus reinforces the ability of the conditioned stimulus to elicit a conditioned response. This term has also been used to refer to the effects of stimuli that predict the value of a rewarding stimulus presented prior to presentation of food or another naturally desirable outcome. For example, in the paradigm used by Schultz, responding for food (licking) as well as neuronal activity related to stimulus presentation and responding can be measured. Dayan and Balleine provided a nice discussion of the distinctions between reinforcement in the context of Pavlovian and instrumental conditioning.

Two forms of reinforcement, termed positive and negative, have also been postulated. Positive reinforcement refers to the process in which delivery of a desirable consequence increases the incidence of the behavior. This is easily understood in the context of the instrumental or operant conditioning paradigm in which delivery of palatable food will increase bar pressing by a rodent or key pecking by a pigeon. Negative reinforcement occurs when the performance of an action results in omission or avoidance of an undesirable stimulus (e.g., foot shock), and the incidence of the behavior increases as a result of this learning process. The initial phases of learning some skills, such as swimming, involve what might be termed negative reinforcement, as the skill helps to reduce the undesirable effects of the environment. Many investigators do not subscribe to the idea that positive and negative reinforcement are distinct processes, as both types of reinforcement basically refer to something that increases the incidence of a given behavior. However, negative reinforcement is a useful concept when measuring stimulus-behavior relationships, as it describes a condition in which increasing behavior leads to omission/avoidance of a stimulus. Learning in everyday life will often involve both positive and negative reinforcement.

Two other terms that have come to be used in the context of instrumental learning and addiction are punishment and reward. Consideration of the conditions that promote cessation of behavior led to the definition of the undesirable outcome as punishment, although the role of punishment has been hotly debated. The term reward was not so readily accepted by early behaviorists but has come into common use as a reference to the desirable outcome in an instrumental learning paradigm. The terms reward and positive reinforcement are often used interchangeably, but as we will discuss, these terms can be used to refer to different processes that control instrumental learning of actions, including drug self-administration.

Studies conducted over the last few decades have led to the refinement of the concepts of instrumental conditioning, reward, and reinforcement based on the role of the outcome produced by a particular behavior in conditioning paradigms. Dickinson, Balleine, and others have shown that responses developed under certain types of conditioning schedules will rapidly diminish if the value of the outcome is decreased or if receipt of the outcome is no longer contingent on making the response. This learning of action-outcome contingencies is best achieved with training schedules where the outcome is easily predictable and the probability of obtaining the outcome is enhanced with increased rates of responding (e.g., fixed or random ratio schedules). In this case, the outcome has been termed to have a rewarding action, based on its intrinsic value to the organism at the time of testing and association with the instrumental action itself.

In contrast, training with schedules where predictability is poorer and increasing rates do not increase probability of successful outcomes (e.g., random interval schedules) produces responding that is insensitive to outcome devaluation or noncontingent presentation of the outcome. This stimulus-response type of conditioning can also occur with extensive training using schedules with higher predictability (discussed in Yin and Knowlton ). In the case of stimulus-response learning, the association is made between antecedent environmental stimuli and the subsequent response, with the outcome serving as a reinforcer regardless of the immediate value of this outcome to the animal. As you can see, this is closer to the classical definitions of reinforcement favored by stimulus-response theorists, Donahoe, and perhaps Skinner. It should also be noted that White drew a similar distinction between reward and reinforcement, albeit with a more traditional behaviorist emphasis on the definitions of these terms. Other investigators have defined reward in terms of positive reinforcement in combination with positive hedonic value, an idea that suggests more overlap between the two processes. Although the separate definitions of reward and reinforcement in this context may be debated, there is strong evidence for the two instrumental conditioning processes themselves. Thus the differentiation of the roles of stimuli/environment and outcome in the two different learning processes is important, and separate discussion of reward and reinforcement in these contexts is useful.

Before we consider how reward and reinforcement contribute to addiction it is worth discussing the adaptive purpose of these neural systems. Behaviors that lead to enhanced survival and/or reproduction are necessary for propagation of genes and species. Innate feeding, reproductive, and harm-avoidance behaviors exist in all animals, but learning about features of the environment is necessary to obtain the opportunity to express these innate behaviors. Pavlovian conditioning is one such learning process whereby performance of something approximating an innate or reflexive behavior can come to be elicited by stimuli that were originally neutral with respect to predicting a particular outcome (e.g., obtaining food or avoiding harm). Instrumental conditioning adds another layer of sophistication to this process. Animals with this capacity can learn to perform new actions and new sets of actions to obtain a positive consequence or avoid punishment. Both types of learning have obvious adaptive utility, as the animal can now integrate complex features of the world and new behavioral strategies into maintaining safety, as well as the quest for food and mating partners. The power of the neural mechanisms involved in reward and reinforcement likely derives from this relationship to survival and reproductive success.

However, there is the possibility that reward and reinforcement mechanisms will not always be used for adaptive purposes. One such example is the phenomenon of self-starvation. Animals that are trained to perform an intracranial self-stimulation task, described later, will perform this task at the expense of sufficient eating if access to food is time-restricted. Similar self-starvation is observed if animals are given the opportunity to run on a wheel when on a limited food access schedule. This particular form of self-starvation has been considered as a model of human anorexia nervosa, which itself is clearly an example of maladaptive behavior involving the brain systems we will consider. Stimuli that originally signal a positive outcome can change their predictive value (a certain location may contain food at one time and a predator at another). Furthermore, stimuli or substances that interact with the neural mechanisms involved in reinforcement may come to have reinforcing value even when they are not coupled to a favorable outcome, or even when they are associated with harmful results. Most drugs of abuse can act in this manner, and can lead to reinforcement of what we might call maladaptive behaviors. In the remainder of this chapter we will consider the brain circuitry and cellular and molecular mechanisms involved in reinforcement. Consideration of this topic will also entail some discussion of the experimental techniques used to uncover these mechanisms.

Neurotransmitters and Neural Circuitry: Involvement in Different Aspects of Reward, Reinforcement, and Addiction

Although the concept of instrumental learning had begun to crystallize by the early 1930s mainly due to the work of Konorski and Skinner, little was known about the neural circuits involved in this behavior. Konorski and Divac both obtained evidence from studies involving lesions of the caudate nucleus implicating this part of the striatum in instrumental conditioning. The discovery by Olds and Milner of intracranial self-stimulation provided an important clue as to the importance of at least one pathway within the basal ganglia. In the original intracranial self-stimulation paradigm, the animal was implanted with an electrode that could stimulate fibers in the medial forebrain bundle. Investigator-initiated stimulation at this site led the animal to repeat the behaviors that were ongoing at the time when the stimulus was delivered. Thus activation of this neural pathway was in and of itself rewarding or reinforcing. It was later discovered that a key set of axons within the medial forebrain bundle supplied dopaminergic afferents to the forebrain regions known collectively as the striatum. This finding stimulated work on the role of dopamine in brain mechanisms of reward and addiction that has continued to this day.

The dopaminergic pathways in the brain are now well known. The somata of the majority of neurons that use dopamine as a neurotransmitter are concentrated in contiguous ventral midbrain structures called the substantia nigra pars compacta (SNc), or A-9 nucleus, and the ventral tegmental area, or A-10 nucleus. The neurons in these two regions project to different striatal subregions and other forebrain targets. Neurons from the substantia nigra pars compacta primarily innervate the dorsal striatum (the caudate and putamen nuclei in primates). In contrast, dopaminergic neurons from the ventral tegmental area (VTA) project strongly to the ventral portion of the striatum, particularly a striatal subregion called the nucleus accumbens that sits in the ventromedial region of the striatum. Neurons within the VTA also send dopaminergic afferents to the prefrontal, orbitofrontal (insular), and cingulate cortices, with more minor projections to other cortical regions such as the limbic cortical subregions.

The initial data suggesting that dopaminergic neuronal activity is crucial for intracranial self-stimulation was later supplemented by the finding that intracranial self-stimulation could be produced by stimulation within the ventral midbrain regions where the dopaminergic neurons reside. Intracranial self-stimulation is supported by stimulation in the VTA, as well as at subregions of the SNc. These studies did not rule out the possibility that stimulation of fibers that originated elsewhere and passed through the ventral midbrain contributed to intracranial self-stimulation. Nonetheless, the combination of these findings with those findings that dopaminergic manipulations alter intracranial self-stimulation strongly implicated dopamine coming from ventral midbrain neurons in the mechanisms that underlie reward and reinforcement during this process.

The focus on dopamine in the context of reward and reinforcement often overshadows the role of other neurotransmitters. Indeed, dopamine is a modulatory neurotransmitter that in and of itself is not capable of strong excitation or inhibition of neurons within this circuitry. Furthermore, there is evidence indicating that dopaminergic transmission is not required for certain aspects of behavior that are thought to involve reward or reinforcement. For example, gene-targeted mice that lack dopamine are still able to learn the location of food, but appear to require dopamine to express the learned behavior. This finding and similar data from other studies seems to indicate that dopamine is necessary for the motivational aspects of reward seeking or incentive salience. In addition, there is evidence that neurochemical lesions of the dopaminergic system do not eliminate self-administration of drugs of abuse such as heroin and ethanol suggesting that dopaminergic transmission may not be necessary for all of the rewarding or reinforcing effects of drugs of abuse. Thus, we need to consider the role of other neurotransmitters in reward, reinforcement, and addiction. An exhaustive description of all the neurotransmitters involved in these processes is beyond the scope of the present chapter. Instead, the role of particular neurotransmitters with intriguing roles in the brain reward/reinforcement circuitry are discussed.

Within the central nervous system, the neurotransmitters glutamate and γ-aminobutyric acid (GABA) are responsible for the majority of fast synaptic transmission. Glutamate directly excites neurons via the activation of ligand-gated cation channel-type receptors, whereas GABA activates anion-preferring channels and generally has an inhibitory action. Both of these neurotransmitters have been implicated in brain mechanisms of reward, reinforcement, and drug actions.

One approach that has been used to examine the role of glutamate and GABA in reward, reinforcement, and addiction-related behaviors is blockade of receptors with specific antagonists, usually injected into a specific brain area. These approaches have proven to be effective in altering behavior and have implicated certain subtypes of ionotropic glutamate receptors in reward and addiction-related behaviors. However, it is sometimes difficult to discern the specific behavioral role of glutamate and its receptors using antagonist blockade, as antagonists of ionotropic glutamate receptor will almost certainly decrease neuronal activity and disrupt circuit activity. Thus, the antagonist effect may not necessarily reflect a need for activation of the receptor so much as the necessity of activity of a particular set of neurons. The opposite case often exists for GABAergic activity, as blockade of GABA receptors, GABA _A receptors in particular, tends to increase neuronal activity and may stimulate circuitry. For these reasons, much recent research on the roles of GABA and glutamate in reinforcement, reward, and addiction has focused on the role of particular glutamate receptor and GABA _A receptor subunit proteins.

The ligand-gated ion channels that mediate fast excitatory and inhibitory synaptic transmission are multimeric proteins that can be formed by numerous subunits and subunit combinations. Chronic exposure to addictive drugs alters the expression of particular ionotropic GABA and glutamate receptor subunits. Manipulating subunit expression can subtly alter receptor function without eliminating receptor activity. This has allowed investigators to explore the roles of these receptors in drug- and addiction-related behaviors without major disruption of the activity of neurons within the reward/reinforcement circuitry. This line of research has been boosted immensely by techniques for transgenic receptor expression and gene-targeted receptor modification and disruption (i.e., so-called knock in and knockout techniques). Transgenic and gene-targeting mice that express higher or lower amounts of a desired receptor subunit are quite useful, as are mice that express a slightly mutated version of a receptor. Viral-based gene overexpression, often involving microinjection of constructs into specific brain regions, is also being widely used to enhance protein function in neurons within reward/reinforcement circuitry. Development of new techniques to alter receptor expression or structure using the Talen and clusters of regularly interspaced short palindromic repeats (CRISPR) techniques has enhanced the rate at which gene-targeted mice and viruses can be developed and employed. Altering expression of the GluR1 alpha-amino-3-hydroxy-5-methylisoxazole-4-propionic acid (AMPA) receptor subunit in gene-targeted mice alters instrumental learning and reduces morphine dependence and sensitization. Altered acute responses to drugs of abuse, as well as changes in ethanol tolerance and dependence, have been observed in mice in which GABA _A receptors have been altered by gene targeting of alpha2, alpha5, and delta subunits. However, one caveat that must be added to this discussion is that neuronal activity has not been measured in vivo in the animals used in these studies, and thus, the extent to which subunit loss alters circuit activity has yet to be determined in any of the aforementioned experimental models.

The neuromodulatory transmitter serotonin (or 5-hydroxytryptamine) can influence the brain reward and reinforcement circuitry, in part through actions on dopaminergic neurons. Serotonin can also influence goal-directed and habitual behavior through its role in control of affect and in impulsivity. The likelihood of choosing new actions without strong outcome control has been linked to disorders of brain serotonergic systems. Impulsivity is responsive to some treatments aimed at the serotonergic system. Because of the disregard for outcomes, impulsive responding may be a first step in the process leading to stimulus control of behavior and maladaptive habits. Measures of impulsivity in animal models have been suggested to predict a pattern of addiction-like drug taking in rodents. Serotonin levels in several brain regions are elevated during administration of psychostimulant drugs such as amphetamine and cocaine, and there is evidence that excessive serotonergic transmission contributes to the addictive effects of these drugs, in addition to the well-characterized role of dopamine in these processes.

Opioid neuropeptides are widely distributed in the brain, including in the action/reinforcement/reward circuitry discussed at present. These peptides, enkephalins, and endorphins, in particular, are perhaps best known for their roles in analgesia. However, it is now well established that opioid peptide production and release is increased in response to stressful stimuli and other environmental challenges. In addition, brain opioid systems have been implicated in mechanisms of reward, particularly in relation to food and drugs of abuse. Studies of food-related behaviors generally indicate that opioid peptides signal something about the hedonic value or desirability of the food, sometimes called “liking.” Opiate drugs that act as agonists at mu- and delta-type opiate receptors are self-administered in instrumental paradigms (reviewed in Gratton ), and opiate agonists produce decreases in the threshold for intracranial self-stimulation. Opiate antagonists can also influence intracranial self-stimulation. These findings indicate that activation of the brain opioid system has rewarding effects.

The brain endocannabinoid system has also begun to receive a great deal of attention as a mediator of instrumental learning and addiction. Endocannabinoids are lipid metabolites that act on the cannabinoid receptors, the receptors originally discovered as mediators of the psychoactive effects of drugs such as marijuana and hashish. In the brain, endocannabinoid agonists act mainly through the cannabinoid-1 (CB1) receptor to produce short- and long-lasting synaptic plasticity. The role of the brain endocannabinoid systems in responses to a variety of drugs of abuse is a fascinating topic that has received a great deal of attention in recent years, and this subject is discussed later in this chapter. Recent studies using instrumental conditioning techniques indicate that CB1 receptors play a role in the transition from action-outcome to stimulus-response (habit) learning. Thus the endocannabinoid system may play an important role in reinforcement-based instrumental learning. It is not yet clear if alterations in dopaminergic transmission or effects on other neurotransmitter systems are involved in this habit-promoting effect of endocannabinoids. The CB1 receptor is highly expressed throughout the brain circuitry thought to mediate instrumental conditioning. Within these circuits, CB1 receptors are expressed on axon terminals of glutamatergic and GABAergic neurons (reviewed in Lovinger ), and may well regulate release of other neurotransmitters including catecholamines. Thus there are many possible sites where endocannabinoid-dependent synaptic plasticity may play a role in this type of learning and in addiction.

The foregoing discussion should make it clear that to better understand the neuronal mechanisms contributing to reward, reinforcement, and addiction we need to understand more fully the brain circuits involved in the control of actions and the instrumental learning of actions and association of actions with stimuli. We must also gain a better understanding of the roles of particular neurotransmitters and receptors in different parts of these circuits. The forebrain, in conjunction with the ventral midbrain, can be conceptualized as a series of parallel cortico-basal-ganglia-cortex circuits that can also be serially interconnected (see Yin and Knowlton for review). The ultimate function of these circuits is to modify cortical and brainstem output to control the selection, initiation, and timing of actions to produce effective integrated behaviors. Neurons and synapses within these circuits can undergo plastic changes that are thought to contribute to learning of new actions and association of actions with conditioned stimuli.

In an admittedly simplistic scheme, this circuitry can be separated into at least three parallel circuits ( Fig. 16.1 ). (More circuits have been suggested, and undoubtedly further subdivisions will emerge based on the complex afferent and efferent connectivity of the striatum. ) Each of the circuits consists of a cortical component, a striatal component, downstream basal ganglia components, and a thalamic component. The sensorimotor circuit comprises the primary and secondary sensory and motor cortices and the SNc, which project to the putamen (the dorsolateral striatum in rodents), which then projects to the motor regions of the globus pallidus, ultimately influencing the ventral thalamus and closing the loop back at the sensory and motor cortices (see Fig. 16.1A ). The associative circuitry involves similar connections between associative areas of the cortex (including the prefrontal and parietal regions), the SNc, the caudate nucleus (the dorsomedial striatum in rodents), associative regions of the pallidum, and the mediodorsal and ventral thalamus (see Fig. 16.1B ). The limbic circuitry involves the limbic cortices (including not only neocortical prefrontal and temporal areas, but also archicortical regions such as the hippocampus and basolateral amygdala), the VTA, the ventral striatum/accumbens, the ventral pallidum, and the mediodorsal thalamus (see Fig. 16.1C ). One can even consider connections within the amygdala to have a similar organization, with the cortical component being the basolateral amygdala, the VTA projections providing the dopaminergic modulatory input, the striatal components being the central amygdala and bed nucleus of the stria terminals, and downstream targets leading ultimately to cortical outputs (see Fig. 16.1D ). Evidence for interconnections among the circuits at the level of striatonigral-striatal projections can help to coordinate the different systems. To fully understand reward and reinforcement-dependent learning and resultant behavioral output in the mammalian brain, it is necessary to consider all of the components in this circuitry.

Recent studies have begun to shed light on the role of these different forebrain circuits in instrumental and Pavlovian conditioning, and the ideas generated from these studies are now being applied to examination of drug actions. Based on excitotoxic lesioning and local pharmacological manipulations, evidence has accumulated that the associative circuit involving the dorsomedial striatum and associated circuitry, including the basolateral amygdala, has key roles in action-outcome learning. Afferent inputs from the prefrontal cortex to neurons in the dorsomedial striatum provide one source of input containing information relevant to action selection, and the cingulate cortex may provide input about discriminative stimuli. The dopaminergic input from the substantia nigra may provide information about reward value. The contribution of action-outcome learning to drug taking is easy to conceptualize. Intrinsically rewarding effects of drugs likely control behavior even in recreational or social users seeking the euphoric effects of cocaine and amphetamine or the anxiety-reducing effects of alcohol. Indeed, studies in rat indicate that cocaine-seeking behavior, measured as an instrumental response normally associated with drug availability, is rapidly lost with devaluation under certain conditioning regimens. It is not yet clear if action-outcome contingencies continue to drive drug seeking and self-administration after long-term drug use and in addicted individuals. It is tempting to speculate that addiction involves a shift in behavioral control from action-outcome/reward to stimulus-controlled/reinforcement mechanisms such as those described in the next few paragraphs. Of interest, Pelloux et al. have shown that rats given limited experience with cocaine seeking and taking will readily suppress seeking responses when intermittent punishment is given, while prolonged exposure to this paradigm reveals a subgroup of rats that will not show this punishment-suppression effect. Furthermore, rats allowed to orally self-administer cocaine continued to show instrumental responses associated with the drug even after cocaine devaluation. Thus evidence is developing that prolonged exposure to psychostimulants can lead to a shift from action-outcome to stimulus-driven behavior.

The sensorimotor circuit involving the dorsolateral striatum appears to play a prominent role in stimulus-response or habit learning. In this circuit the neocortical components and the dorsolateral striatum process information about the relationship between stimulus presentation and response performance, with the dopaminergic inputs from the substantia nigra (and the ventral tegmental area to some extent) providing a reinforcing signal to promote the stimulus-response association. It has been suggested that the role of dopamine is required for the initial stages of this association, but that behaviors become ingrained and resistant to dopaminergic manipulations once the stimulus-response association is formed and habitual behavior is in place. Ultimately, output from the motor cortex and thalamus is important for behavioral performance, and thus, this circuit can produce relatively straightforward throughput from sensory input to motor output. Habitual responding has been postulated to contribute to drug-taking behavior, such that when an individual is in the proper environment with the drug available, the actions involved in drug administration will be automatized and will often continue regardless of the specific outcome of drug usage. There is emerging evidence that this sort of responding may contribute to cocaine- and alcohol-related behaviors in rodents (see also Sampson et al. ) However, to date the role of stimulus-response associations in drug administration and relapse has not yet been thoroughly examined and fully dissociated from the stimulus-dependent forms of learning thought to be mediated by the limbic circuit (described in subsequent text).

Among the roles of the limbic circuit is the integration of information for Pavlovian and instrumental conditioning in a type of learning called Pavlovian-instrumental transfer. In this circuit, limbic neocortical areas such as the ventral prefrontal cortex provide information relevant to task outcomes to the nucleus accumbens. The basolateral amygdala provides input on reward and appetitive incentive value to the accumbens, where it is combined with the other cortical information. Dopaminergic inputs to the basolateral amygdala, limbic neocortex, and ventral tegmental area also provide information about reward value, while the orbitofrontal cortex may provide information important about the relationship of particular stimuli to task outcomes. The role of the hippocampus and other limbic cortical regions that project to the nucleus accumbens is less clear. The net result is development of associations between environmental stimuli and task outcome (sometimes called stimulus-outcome learning), through which discrete stimuli gain control over particular instrumental responses. In this way the Pavlovian association of the stimulus transfers to the performance of the instrumental response. A role for this type of learning within the context of addiction is easy to postulate. It has long been thought that stimuli that are associated with, and predictive of, drug administration (e.g., needles, liquor bottles) can stimulate drug seeking and taking. Indeed, there is experimental evidence that this sort of cue-induced relapse and drug craving can be induced in both humans and experimental animals.

The characterization of the limbic circuit as the mediator of reward and/or reinforcement is an idea that has captured the imagination of neurobiologists and addiction researchers. However, it is now becoming clear that the circuitry that includes the dorsal striatum has an equally important role in these processes (see Yin et al. for review). In addition to the studies mentioned that implicated the associative and sensorimotor circuits in action-outcome and stimulus-response learning, there is also evidence that dopaminergic innervation of the dorsal striatum plays important roles in instrumental learning. Stimulation of dopaminergic neurons in the substantia nigra pars compacta supports intracranial self-stimulation, as mentioned earlier. Furthermore, activation of substantia nigra neurons with intracranial self-stimulation–inducing patterns enhances learning and striatal synaptic plasticity. An elegant series of studies by the Palmiter laboratory indicate a key role for dorsal striatal dopamine in instrumental learning and performance. Using dopamine restored in the dorsal striatum of mice that have been engineered to lack the neurotransmitter, these investigators have shown that food-seeking and instrumental learning/performance were rescued. Thus full neurochemical integration within the dorsal striatum is all that is needed for proper motivational signaling and instrumental performance. This is not to say that the limbic circuitry does not have reward-related functions, but rather that an intact limbic circuit may not be necessary for proper learning and performance of a purely instrumental task.

In recent years, researchers have also focused on the circuitry involved in generating undesirable effects that contribute to drug taking and relapse, and the effects of the drugs themselves on this circuitry (reviewed in Koob and Koob and Le Moal ). There is evidence for reduction in the positive hedonic effects of drugs after sustained self-administration, and negative consequences of drug use and withdrawal increase with repeated use and withdrawal. ^a

a References 56, 97, 100, 102, 108, 118.

The amygdala and associated structures appear to have prominent roles in this scenario. The amygdala has generally been thought of as a brain region involved in the processing of information related to emotion, and the role of the amygdala in anxiety and responses to stress is widely known. However, one can also view the role of the amygdala as providing a neural index of the incentive value of a particular stimulus or event. In this context, the amygdala plays roles in both reward and reinforcement processes as defined earlier. Furthermore, it is now clear that the structure we call the amygdala can be subdivided based on cytoarchitecture and afferent/efferent connections. Two well-characterized amygdalar subregions are the basolateral and central nuclei. The basolateral amygdala is an archicortical structure containing mainly glutamatergic projection neurons and a small number of GABAergic interneurons. The basolateral amygdala innervates other structures within the amygdala, but also has connections with parts of the prefrontal cortex, and the dorsomedial and ventral regions of the striatum. Input to the basolateral amygdala from areas such as the ventral tegmental area and the locus coeruleus provides information about arousal and motivational state ; thus, one possible role for this brain region is to integrate information necessary for a reward signal and relay that information to the associative circuit involved in action-outcome learning. The central amygdala is similar in cytoarchitecture to the striatum, having a large proportion of GABAergic projection neurons. This structure receives excitatory input from the basolateral amygdala, neocortical, and paleocortical regions, as well as information about motivational state via neuromodulatory regions such as the hypothalamus. Output from the central amygdala is sent to the bed nucleus of the stria terminalis, hypothalamus, and other subcortical regions, as well as to the substantia nigra and ventral tegmental area, where it can influence the circuitry involved in stimulus-response and stimulus-outcome learning. In addition, the amygdala has emerged as a brain region with important roles in conditioned responses related to the rewarding effects of drugs studied using the conditioned place preference task described in the following section. The amygdala interconnections with the bed nucleus of the stria terminalis, a subcortical nucleus with a striatal-like organization (i.e., populated predominantly by GABAergic projection neurons), have generated a great deal of interest, as the bed nucleus of the stria terminalis has been implicated in the actions of drugs of abuse as well as in drug self-administration and relapse.

Clearly, a better understanding of the brain regions in involved in learning and control of behavior involving reward and reinforcement is emerging. In addition, methodology is emerging that will help define the roles of brain circuitry and circuit physiology in behavior, and to refine our behavioral models based on neuroscientific findings. One of the challenges in addiction research in the coming years will be to determine how the function of these brain circuits contributes to responses to drugs of abuse, maladaptive use of the drugs, and addiction.

Models of Drug Use and Drug Addiction

Examination of the neural basis of drug actions, drug use, and addiction has relied to a great extent on development of laboratory animal models. A great deal of progress has been made with this approach. However, it has proven difficult to model all aspects of drug actions and addiction. For example, how does one assess euphoria or craving in an animal that is incapable of verbal self-report. Progress was slow at times for development of reasonable models of self-administration for drugs such as alcohol, cannabinoids, and nicotine. Agreeing on a universal definition of addiction and developing an animal model thereof has also proven to be difficult. Nonetheless, several decades of research have led to the development of a variety of behavioral tests that assay various aspects of drug action, drug use, and addiction ( Table 16.1 ). These techniques continue to be refined and combined with new techniques for neuroscientific investigation to provide more complete information about relevant neural mechanisms. The following discussion will describe some of these animal models, with an emphasis on models of drug reward, reinforcement, and addiction.

Table 16.1

Models of Drug Reward and Reinforcement: Relation to Phenotypes of Human Drug Use, Dependence, and Addiction.

Model	Human Drug Use Phenotype
Simple operant self-administration	Hedonic value, liking/wanting
Devaluation	Goal-directed versus habitual responding
Intracranial self-stimulation threshold changes	Hedonic value, anhedonia
Conditioned place preference/aversion	Reinforcement, incentive sensitization, resistance to negative outcome
Progressive ratio breakpoint	Hedonic value, compulsivity
Behavioral cost	Hedonic value, compulsivity
Response persistence without drug	Compulsivity
Punished responding/pairing with undesirable tastant	Compulsivity, resistance to negative outcome
Cue-induced reinstatement	Craving, incentive sensitization
Secondary reinforcement	Craving, habitual responding
Psychomotor stimulation/sensitization	Incentive sensitization
Incubation	Craving, incentive sensitization

A seemingly direct way to measure the reinforcing effects of a substance is to determine whether delivery of the drug itself will support learning or continued performance of a particular action or set of actions. This so-called self-administration paradigm has been used to examine the reinforcing actions of many drugs of abuse in a variety of animal models, and in general all of these drugs have been found to support self-administration under at least one schedule of drug administration. Comparisons of self-administration in humans and laboratory animals have indicated similarities that auger well for the experimental use of these procedures. The general procedure is to train the animal to press a lever or nose-poke an object in order to receive the drug either by oral, intravenous, or intracranial routes of administration. Using the basic instrumental training schedule, animals can also be tested in a final short extinction session in which no drug is available to see if they perform the operant behavior. This helps to assess the drug-seeking behavior without any interference from neural actions of the drug itself (e.g., depressant effects that reduce rate of responding). Variations of this basic procedure include the use of secondary reinforcers (e.g., stimuli paired with the opportunity for drug self-administration that come to elicit behavior themselves), and use of a progressive ratio schedule in which animals must increase their responses exponentially with each trial in order to continue drug delivery. In this latter procedure, the investigator assesses the breakpoint, which is the response requirement beyond which the subject will no longer work for the drug. This procedure can be used to determine the relative reinforcing efficacy of a particular drug. This approach has the advantage of direct measurement of the animal’s willingness to use the drug. However, there are some drawbacks to self-administration techniques. For example, self-administration leading to high levels of drug in the brain that impair subsequent performance of the actions needed for further drug taking (reviewed in Hemby et al. ). Oral self-administration of drugs such as ethanol brings into play factors such as taste that affect the willingness of certain animals to ingest the desired drug. Use of instrumental self-administration procedures also necessitates consideration of separate neural control of drug seeking and drug taking. Nonetheless, self-administration procedures are arguably the most direct measure of use and abuse, particularly given the variety of procedures that have been developed using instrumental paradigms. Self-administration procedures also allow investigators to examine the effects of treatments on drug use in preclinical assays.

In light of the previous discussion of reinforcement and reward, or action-outcome and habit learning, it seems important to evaluate which of these modes of behavior actually drives drug self-administration. As mentioned in the preceding text, investigators have found evidence that drugs of abuse promote habit learning (see also Samson ). Another common variant of the drug self-administration procedure is cue-induced drug-related responding and reinstatement of this responding and/or drug taking. It has clearly been demonstrated that cues signaling the opportunity to respond instrumentally and obtain a drug can come to elicit responding in the absence of the drug, and especially robustly when the drug has been omitted for long periods. This procedure involves a component of stimulus-outcome learning or Pavlovian-instrumental transfer with the cue serving as the Pavlovian conditioned stimulus. Indeed, this type of conditioning has become pretty much the standard in instrumental self-administration procedures, as some explicitly paired cue, most often a light, is included in most such studies. This may be one reason for the large number of studies implicating the aforementioned limbic circuitry in drug-seeking behavior, as this circuitry appears to have important roles in stimulus-outcome learning. There is certainly some heuristic value to such studies in the context of human addiction, as it is easy to imagine how environmental stimuli that signal drug availability might trigger drug seeking and relapse.

Other surrogate measures of the rewarding effects of drugs of abuse have been developed. Drawing on the intracranial self-stimulation paradigm discussed earlier, investigators have examined the ability of drugs of abuse to shift the threshold stimulus intensities needed to support self-stimulation. Several abused drugs produce a leftward shift in the stimulus-response curve or increase rates of responding, indicating that they enhance the reinforcing properties of intracranial self-stimulation (see Wise for review). Drugs with this sort of action include those that are strongly self-administered such as cocaine and amphetamine, as well as other drugs of abuse, although studies of ethanol have yielded mixed results that might be explained by variables such as route of drug administration. This technique can reveal indirectly the rewarding or reinforcing effects of drugs, but thus far the emphasis has mainly been on the effects of investigator-administered drugs and involvement of the limbic circuitry. It would be interesting to see this line of research extended to include more self-administration/intracranial self-stimulation studies and examination of different circuitry and component brain regions.

Recent studies have focused on identifying behaviors that might be indicative of an addictive phenotype in experimental animals. One approach has been to develop a battery of tests designed to measure continued drug seeking and taking under conditions where these behaviors become increasingly difficult and costly. Deroche-Gamonet et al. have developed a three-test battery consisting of: (1) measuring the progressive ratio breakpoint mentioned earlier; (2) measuring the persistence of instrumental responding on a previously cocaine-associated manipulandum even when a signal indicates no drug availability; and (3) determining whether cocaine self-administration will continue even when associated with electric foot shock (a paradigm also used in Pelloux et al. and Vanderschuren and Everitt ). Of interest, Wolffgramm and Heyne and Petry and Heyman have used a conceptually similar approach with alcohol. Wolffgramm and Heyne provided the alcohol in a solution with a normally aversive tastant, and they found that this procedure decreased drinking in animals that had short-term alcohol drinking experience, while drinking was maintained at much higher levels in animals that had been drinking alcohol for a long period (at least 9 months). Petry and Heyman steadily increased the behavioral cost necessary to obtain an alcohol-containing solution and found that rats with experience drinking alcohol maintained their drinking despite the increasing cost, while similar effects were not observed with palatable nutrient-containing solutions. These sorts of techniques are now being used to examine factors that predispose animals to uncontrolled/compulsive drug self-administration. Everitt and coworkers have determined that what appears to be impulsive responding in a five-choice serial reaction time test is predictive of later abusive drug use in this paradigm. This paradigm has been used by investigators to identify subgroups of rats that are especially vulnerable to what might be termed addiction. It is interesting to note that only a relatively modest subgroup of rats given extensive self-administration experience show maintained responding in the second and third tests and also show high breakpoints in test 1. It is hoped that this approach will provide a powerful tool for identifying genetic, neuronal, and circuit differences that contribute to enhanced susceptibility to drug addiction.

Although this approach has some face validity, it is not clear that a model of all aspects of addiction can be developed in rodents. Most rodent drug self-administration paradigms use operant responding for drug delivery, and often with intravenous drug administration directly contingent upon the operant action. The measurements in such experiments are generally number of lever presses and number of drug infusions. However, because infusion will occur following the prescribed number of lever presses there is no way to separate drug seeking (i.e., operant responding) from drug taking (infusion). Thus it is unclear if the different manipulations are altering the operant responses or the drug control of these responses. Investigators have tried to separate the seeking and taking aspects in operant self-administration (SA) procedures using second order schedules and oral drug taking (especially for ethanol). The two aspects can be controlled separately, indicating a confound in interpreting effects of experimental manipulations in operant SA studies. Another problem is the method of scoring in such studies. The protocol is designed to identify the top scorers in a particular test within a given cohort of a given rodent strain. This system generally ignores important genetic and cohort effects that influence behavior toward drugs of abuse. Indeed, C57Bl6J mice will more readily self-administer several drugs of abuse in comparison to other mouse strains, and the genes that regulate these differences are being characterized. Thus, even the top scorers from these other strains may not reach the mean level achieved by mice from the more self-administration-prone strain. It is difficult to see how one can label mice as “addiction-prone” when there are large numbers of mice from another strain that show more severe drug-related behaviors. There are additional problems in the case of alcohol, where self-administration often involves oral intake. In general, it has been difficult to induce rodents to drink to the same blood alcohol levels and levels of intoxication achieved by humans. New techniques have been developed for increasing rodent alcohol intake, and older techniques are being revisited. However, it is important to consider other animals that show excessive alcohol intake, such as nonhuman primates. Overall, development of a single rodent model that captures all aspects of addition is an overly ambitious undertaking.

An alternative experimental approach is to examine key phenotypic behaviors associated with drug use disorders and attempt to determine the molecules, cells, and circuits that control these behaviors. For example, phenotypes such as excessive drug intake following abstinence, cue-induced reinstatement of drug self-administration, incubation of increased lever-pressing, or outcome-resistant drug intake can all be coupled with neurophysiological measurements and manipulation of particular brain circuits to better understand the neural underpinnings of different facets of the response to drugs of abuse and drug-seeking/taking behaviors (see Table 16.1 ). The ultimate goal of this research is to develop therapies aimed at these brain components to reduce harmful aspects of drug use disorders.

One important phenotype is the rewarding effects of the drug itself. Investigators have long used Pavlovian conditioning to examine whether drug administration can be used to produce a conditioned place preference in an animal. In this paradigm, the animal is given the drug paired with one of two or three chambers in an apparatus, and then tested later for location preference. In a final drug-free test, the animal is then free to choose a location in which to spend the trial. If more time is spent in a particular location, this is thought to indicate that the drug paired with this location has a preferred or rewarding effect. This technique has the advantage that the animals are not subjected to drug intoxication at the time of testing, so there is little chance of impairment of behavior by the drug itself. However, it must be stressed that the location is a conditioned stimulus and not a primary reward or reinforcer of any kind in this paradigm. Thus the technique does not measure these functions per se, that is, the animal does not have to repeat an action to obtain an outcome, and thus, it is at best a surrogate measure of the underlying construct and one that may be subject to influence by properties of the environment or drug that are not directly related to its reinforcing effects. Nonetheless, strong progress has been made in identifying the neural mechanisms underlying conditioned place preference, and there is considerable overlap with mechanisms implicated in reward circuitry.

The psychomotor stimulant effects of drugs have also been proposed to provide a measure of drug reinforcement, reward, and addiction. Administration of many drugs will produce forward locomotion and it has been theorized that this represents an operant approach response indicative of positive reinforcement by the drug. That forward locomotion is elicited by stimulation of the medial forebrain bundle, the site where stimulation yields intracranial self-stimulation, was also advanced as evidence that the mechanisms underlying this locomotion are linked to positive reinforcement. However, it is possible that locomotor activation is merely an adjunct consequence of drug exposure and medial forebrain bundle stimulation. The circuitry that controls performance of voluntary actions overlaps extensively with that involved in reinforcement, reward, habit formation, and addiction. Thus, it is possible that drug actions produce separate effects that both influence locomotion and drug reward or drug seeking, but that these effects are separable. Indeed, elegant studies showed just such a separation for regions of the ventral tegmental area and nucleus accumbens implicated in cocaine- and opiate-induced locomotor stimulation and conditioned place preference or self-administration. In addition, mice that lack dopamine show a nearly complete loss of morphine-stimulated locomotion but continue to show morphine-induced conditioned place preference. Locomotor stimulation and reward/reinforcement can also be separated pharmacologically. In the case of alcohol, stimulation of forward locomotion is inconsistent in rats, and locomotor depressant effects are most often observed, but rats clearly show other signs of ethanol reward and reinforcement (see Koob for examples). Furthermore, Risinger et al. and Sanchez et al. found differences in genetic factors underlying ethanol-induced locomotor stimulation and conditioned place preference for ethanol. Thus, it is not clear that forward locomotion is a good proxy for the actual reinforcing effects of the drug.

The idea of sensitization, an increase in frequency and intensity of a behavior elicited by a stimulus or treatment, has also figured prominently in models of drug abuse and addiction. Repeated administration of certain drugs of abuse, psychostimulants in particular, elicits successively larger increases in locomotor activity in rodents. It has been speculated that this locomotor sensitization is a result of the underlying neuroadaptive processes that contribute to addiction following repeated drug exposure. However, it is still not clear that the locomotor-stimulating effects are related to reward or reinforcement per se, for the same reasons discussed in the preceding paragraphs. In one sense, however, drug seeking and self-administration must involve some form of sensitization, as these behaviors involve increases in responding elicited by the drug or drug-related environments or cues. One theory advanced to account for this aspect of drug-related behavior is the incentive-sensitization model. This theory provides a reasonable explanation for the willingness of addicts to expend a great deal of energy and engage in new behaviors to obtain drugs, and also can explain the greater motivation of animals to work for previously used drugs in tasks such as the progressive ratio/breakpoint paradigm mentioned in the preceding text. Other behavioral measures in laboratory animals provide evidence for enhanced incentive to seek and use drugs. For example, conditioned place preference and cue-induced reinstatement of drug seeking indicate that the motivational value of previously neutral stimuli is enhanced when these stimuli are associated with drugs of abuse (reviewed in Robinson and Berridge ). Thus although simple locomotor sensitization may provide only limited information about drug effects on the brain reward/reinforcement system, the concept of sensitization is important within this context.

A role for negative reinforcement in addiction is also easily conceptualized, and experimental models based on this idea have been developed to provide information on important drug abuse-related phenotypes and neural mechanisms. Drugs such as benzodiazepines have known anxiolytic properties and thus reduce an aversive state. The psychostimulants produce acute mood elevation that may provide temporary relief from negative affect (although these drugs are by no means effective antidepressants). Thus negative reinforcement may be a strong driving force for acute drug use.

Relief of the negative symptoms encountered during drug withdrawal can also be characterized as a negative reinforcing component of addiction. Withdrawal following chronic use of different drugs of abuse produces symptoms ranging from heightened anxiety and irritability (benzodiazepines, alcohol) and dysphoria and depression (psychostimulants) to severe physiological symptoms such as abdominal cramps (heroin). Withdrawal from drugs is associated with higher thresholds for intracranial self-stimulation in experimental animals, indicating dysphoria associated with this state. Relief from these symptoms has been postulated to drive relapse to drug use. Indeed, animals made dependent on drugs will increase self-administration and drug-related instrumental responding following drug withdrawal (reviewed in Koob and Le Moal ). This sort of reinstatement responding has also been observed in alcohol-dependent animals and is referred to as the “alcohol deprivation effect” (reviewed in Spanagel ). Animals that have undergone conditioned aversion in which withdrawal is rapidly induced and paired with previously neutral stimuli show reinstatement of heroin self-administration and elevation of intracranial self-stimulation thresholds, as if the aversive effects of withdrawal were reducing rewarding drug effects while driving relapse. It is easy to imagine how this withdrawal-relief model can explain relapse after full-blown symptoms have begun. However, it is not so clear that this model can explain continuous drug use in the absence of withdrawal sufficient to produce symptoms. The ability of this model to explain relapse long after the cessation of withdrawal symptoms is also not as clear. Processes such as incubation, as lasting neuroadaptation that leads to greater drug seeking after prolonged abstinence, may more readily account for this type of relapse. Several brain regions within the associative and limbic circuitry have been implicated in the incubation process. In addition, lasting recruitment of drug effects on brain systems involved in stress responding has been suggested to underlie the long-term susceptibility to relapse.

The concept of negative reinforcement is also built into addiction theories based on the Opponent-Process idea. These theories essentially propose that net emotional state is the result of competition between emotions (e.g., elation vs. fear), and that changes in the competitive balance over time can lead to changes in net emotion and behavior. Within the context of addiction one can easily envision that the euphoric high achieved just after administration of a drug like cocaine can dissipate and be replaced by depression as the neurochemical effects of the drug wear off. With repeated drug use, the euphoria becomes less pronounced as the depression is enhanced, and the user ends up taking the drug to relieve the depression, which could be termed a negative reinforcement model. This process has been modeled with cocaine and heroin self-administration, and it was found that both intracranial self-stimulation thresholds and cocaine or heroin self-administration escalated after several cycles of self-administration and withdrawal. Koob, LeMoal, and coworkers have extended these ideas to include the concept of allostasis in which the emotional set-point resulting from the new balance of opponent processes is altered toward a more depressed level with repeated drug use. The addict ends up using the drug to maintain this new set-point, often relieving more adverse emotional symptoms. Experimental models such as withdrawal-induced excessive self-administration have been used in conjunction with neurochemical approaches to implicate the extended amygdala and associated brain regions in these allostatic changes and the accompanying negative reinforcement driving drug taking and relapse (reviewed in Koob and Koob and Le Moal ). Brain systems for responding to environmental stress and internal anxiety may provide the aversive effects that interact with brain reward/reinforcement circuitry to drive drug use in these models.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Tags: Addiction Medicine: Science and Practice

Jan 19, 2020 | Posted by drzezo in PATHOLOGY & LABORATORY MEDICINE | Comments Off

Basicmedical Key

Fastest Basicmedical Insight Engine

Neurobiological Basis of Drug Reward and Reinforcement

Introduction

Experimental Psychology Concepts of Reward and Reinforcement

Neurotransmitters and Neural Circuitry: Involvement in Different Aspects of Reward, Reinforcement, and Addiction

Models of Drug Use and Drug Addiction

Like this:

Related

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

Basicmedical Key

Fastest Basicmedical Insight Engine

Neurobiological Basis of Drug Reward and Reinforcement

Introduction

Experimental Psychology Concepts of Reward and Reinforcement

Neurotransmitters and Neural Circuitry: Involvement in Different Aspects of Reward, Reinforcement, and Addiction

Models of Drug Use and Drug Addiction

Share this:

Like this:

Related

Related posts:

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree