基于iLSTD(λ)的Actor-Critic学习
首发时间:2009-09-07
摘要:尽管策略梯度强化学习算法具有较好的收敛性,但是在梯度估计过程中方差过大,影响了算法的性能。为了提高策略梯度算法的收敛速度和梯度估计的精度,本文综合利用AC框架、函数逼近器、资格迹和iLSTD算法的不同特性,提出了一种新的策略梯度算法-基于iLSTD(λ)的AC算法。10×10格子世界问题的仿真结果验证了所提算法的有效性和可行性。
关键词: 策略梯度 函数逼近器 Actor-Critic学习 策略评估
For information in English, please click here
Actor-Critic Learning Based On iLSTD(λ)
Abstract:Policy gradient reinforcement learning algorithms have better convergence, but larger variance of policy gradient estimation influences the performance of algorithms. In order to improve the convergence speed of policy gradient algorithms and the precision of gradient estimation, a new AC algorithm based on iLSTD(λ) are proposed by making use of characteristics of AC framework, function approximator, eligibility trace and iLSTD(λ) algorithm .Simulation results of a 10×10 grid world verify the feasibility and validity of the proposed AC learning algorithm.
Keywords: Policy gradient Function approximator Actor-Critic learning Policy evaluation
论文图表:
引用
No.3498249043412523****
同行评议
共计0人参与
勘误表
基于iLSTD(λ)的Actor-Critic学习
评论
全部评论0/1000