COGSCI 200 Lecture Notes - Lecture 15: Temporal Difference Learning, Reinforcement Learning, B. F. Skinner

56 views2 pages

harlequingnu150

13 Jun 2018

School

Department

Course

Professor

For unlimited access to Class Notes, a Class+ subscription is required.

Reinforcement Learning

Reinforcement

I. BF Skinner: instrumental conditioning→ All of psychology is based on rewards for actions

A. But, should a rat be rewarded for accidentally finding the cheese?

II. We need a way to “intelligently” learn sequences of actions

A. Temporal difference (TD) learning accomplishes this!

The Calendar Problem

Goal: For each month, predict the cumulative expected reward you will get in all subsequent months of the year

(ending in Dec)

These predictions are called “states values”, abbreviated “V”

The “true” state values for this version of the problem are shown above.

Your goal is to use experience to learn the true state values (or get reasonably close to them)

Temporal Difference Learning

1. Initialize all state values to 0

2. Do the following each time you transition from state s to state s’

a. Calculate the prediction error: [R(s’) + V(s’)] - V(s)

b. Update the value of V(s) (get from video)

*If you follow this rule .. ?

REVIEW

Review Temporal Difference Learning

I. Components of Reinforcement Learning Problem:

States, rewards, actions Q values

- Q values are predictions of the future cumulative reward you will get if you start in state

s and take action a and behave “optimally” thereafter

II. Calculate Prediction Error: prediction error = [R(s’) + Q(s’, a’)] - Q(s,a)

Update the values of Q(s,a): Q(s,a) ← Q(s,a) + (a * prediction error)

III. Directions:

find more resources at oneclass.com

Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

We need a way to intelligently learn sequences of actions: temporal difference (td) learning accomplishes this! Goal: for each month, predict the cumulative expected reward you will get in all subsequent months of the year (ending in dec) These predictions are called states values , abbreviated v . The true state values for this version of the problem are shown above. Your goal is to use experience to learn the true state values (or get reasonably close to them) Initialize all state values to 0: do the following each time you transition from state s to state s", calculate the prediction error: [r(s") + v(s")] - v(s, update the value of v(s) (get from video) Review temporal difference learning: components of reinforcement learning problem: Q values are predictions of the future cumulative reward you will get if you start in state s and take action a and behave optimally thereafter.

COGSCI 200 Lecture Notes - Lecture 15: Temporal Difference Learning, Reinforcement Learning, B. F. Skinner

Document Summary

Get access

Related Documents

COGS 110 Lecture Notes - Lecture 16: Correlation Does Not Imply Causation, Reinforcement Learning

Psychology 1000 Lecture Notes - Lecture 19: Reinforcement, Inverse Relation

DST 614 Lecture Notes - Lecture 11: Operant Conditioning Chamber, Cognitive Map, Organism