COGS 100 Lecture Notes - Lecture 1: Reinforcement Learning, Temporal Difference Learning, Dopaminergic

44 views4 pages

Document Summary

Learning & surprise: our brains job is to make decisions that minimize reinforcement per unit time, rescorla model works if life is divisible into discrete trials which there is always a reward (+ or - ). Credit assignment problem (rat maze: choose left or right turns of maze (2 decisions, 3 decisions), get r faster with shorter course in maze, no r value for the first decision. If the rat goes to the end, how does it know which turns are good or bad they only know the last turn is good. Estimated value at time t: compare with rw equation. Learning is from not just rewards, but from expectations. An odor is presented to rats that is somewhat predictive of a future r and different odours are associated with r at different delays: reward delivery times respond later and later, amount of activation of dopamine neurons.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents