Reinforcement Learning

Why do dopaminergic medications sometimes enhance and sometimes impair cognitive and behavioral function across species and disorders? Motivated by earlier work linking dopaminergic signals to “reward prediction errors”, we developed an explicit computational model of corticostriatal circuitry and its modulation by these dopaminergic signals in distinct populations of striatal neurons containing D1 and D2 dopamine receptors. The model tied together multiple anatomical, electrophysiological and pharmacological findings across species, and made a core novel prediction: that prospective positive and negative outcomes of alternative actions are represented separately and that dopamine levels shift the balance to differentiate between reward values or cost values. This basic prediction has been supported by experimental tasks in our lab and several others using pharmacological manipulations of dopamine (in healthy participants and in various patients), PET imaging of D1 and D2 binding, fMRI, and tyrosine depletion. Moreover, although the models were originally inspired based on data from animal models, they have reciprocally inspired rodent researchers to test and validate key model predictions using genetic engineering methods, showing that the two striatal pathways are separately needed (both necessary and sufficient) to induce learning from positive and negative outcomes respectively. Via collaborative work we have also shown in rodents that, as predicted by the model, dopamine depletion progressively induces avoidance, providing a synthesis between the learning and motivational theories of dopamine.

More recent work in the lab explores how such reinforcement learning processes can scale up to more sophisticated hierarchical learning problems, where the “actions” that are used to guide behavior and  learning are more abstract, based on nested frontostriatal circuits.    The lab further studies the role of information seeking in the explore/exploit dilemma that pervades all RL problems and how it is solved by humans via targeted task designs and models thereof.