Anonymous
Not logged in
Talk
Contributions
Create account
Log in
CS Wiki
Search
틀:MDP와 Q 러닝
From CS Wiki
Namespaces
Template
Discussion
More
More
Page actions
Read
Edit source
History
항목
MDP
Q 러닝
결정 과정
전이확률T(s’,a,s) 계산
미래값(Q) 계산
정책(Policy)
π(s) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑇(𝑠’, 𝑎, 𝑠)
π(s) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑄(𝑠, 𝑎)
최적 값
수렴 시까지 V(s)수행
Q 테이블 업데이트
Navigation
Navigation
Main page
Recent changes
Advertisements
Wiki tools
Wiki tools
Special pages
Page tools
Page tools
User page tools
More
What links here
Related changes
Printable version
Permanent link
Page information
Page logs