全增量式自然梯度Actor-Critic学习算法
首发时间:2009-09-07
摘要:尽管策略梯度强化学习算法具有较好的收敛性,但是在梯度估计过程中方差过大,影响了算法的性能。为了提高策略梯度算法的收敛速度和梯度估计的精度,结合资格迹和折扣回报模型,对现有的全增量式自然梯度AC算法进行了扩展,提出了改进的全增量式自然梯度AC算法。10×10格子世界问题的仿真结果验证了本文所提算法的有效性和可行性。
关键词: 策略梯度 函数逼近器 Actor-Critic学习 自然梯度 策略评估
For information in English, please click here
Incremental Natural-Gradient Actor-Critic Algorithms
Abstract:Policy gradient reinforcement learning algorithms have better convergence, but larger variance of policy gradient estimation influences the performance of algorithms. In order to improve the convergence speed of policy gradient algorithms and the precision of gradient estimation, four kinds of improved incremental Natural-Gradient Actor-Critic algorithms based on eligibility trace and discounted reward model are proposed by extending existed incremental Natural-Gradient Actor-Critic algorithms. Simulation results of a 10×10 grid world verify the feasibility and validity of the proposed Actor-Critic learning algorithms.
Keywords: Policy gradient Function approximator Actor-Critic learning Natural gradient Policy evaluation
基金:
论文图表:
引用
No.3497449043412523****
同行评议
共计0人参与
勘误表
全增量式自然梯度Actor-Critic学习算法
评论
全部评论0/1000