COGS 100 Lecture Notes - Lecture 1: Reinforcement Learning, Temporal Difference Learning, Dopaminergic
Document Summary
Learning & surprise: our brains job is to make decisions that minimize reinforcement per unit time, rescorla model works if life is divisible into discrete trials which there is always a reward (+ or - ). Credit assignment problem (rat maze: choose left or right turns of maze (2 decisions, 3 decisions), get r faster with shorter course in maze, no r value for the first decision. If the rat goes to the end, how does it know which turns are good or bad they only know the last turn is good. Estimated value at time t: compare with rw equation. Learning is from not just rewards, but from expectations. An odor is presented to rats that is somewhat predictive of a future r and different odours are associated with r at different delays: reward delivery times respond later and later, amount of activation of dopamine neurons.