The process of learning requires the sophisticated ability to constantly update our expectations of future rewards so we may make accurate predictions about those rewards in the face of a changing environment. Although exactly how the brain orchestrates this process remains unclear, a new study by researchers at the California Institute of Technology (Caltech) suggests that a combination of two distinct learning strategies guides our behavior.
One accepted learning strategy, called model-free learning, relies on trial-and-error comparisons between the reward we expect in a given situation and the reward we actually get. The result of this comparison is the generation of a “reward prediction error,” which corresponds to that difference. For example, a reward prediction error might correspond to the difference between the projected monetary return on a financial investment and our real earnings.
In the second mechanism, called model-based learning, the brain generates a cognitive map of the environment that describes the relationship between different situations. “Model-based learning is associated with the generation of a ‘state prediction error,’ which represents the brain’s level of surprise in a new situation given its current estimate of the environment,” says Jan Gläscher, a postdoctoral scholar at Caltech and the lead author of the study.
Eighteen participants were scanned using functional magnetic resonance imaging as they learned the task. The brain scans showed the distinctive, previously characterized neural signature of reward prediction error — generated during model-free learning — in an area in the middle of the brain called the ventral striatum. During model-based learning, however, the neural signature of a state prediction error appeared in two different areas on the surface of the brain in the cerebral cortex: the intraparietal sulcus and the lateral prefrontal cortex.
These observations suggest that two unique types of error signals are computed in the human brain, occur in different brain regions, and may represent separate computational strategies for guiding behavior. “A model-free system operates very effectively in situations that are highly automated and repetitive — for example, if I regularly take the same route home from work,” Gläscher says, “whereas a model-based system, although requiring much greater brain-processing power, is able to adapt flexibly to novel situations, such as needing to find a new route following a roadblock.”
For those interested, the actual paper is:
"States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning", by Jan P. Glascher, Nathaniel Daw, Peter Dayan and John P. O’Doherty.