全增量式自然梯度Actor-Critic学习算法

冯涣婷; 程玉虎; 王雪松

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
同行评议
相关论文
评论

全增量式自然梯度Actor-Critic学习算法

首发时间：2009-09-07

冯涣婷 ¹ 程玉虎 ¹ 王雪松 ¹

1、中国矿业大学信息与电气工程学院

摘要：尽管策略梯度强化学习算法具有较好的收敛性，但是在梯度估计过程中方差过大，影响了算法的性能。为了提高策略梯度算法的收敛速度和梯度估计的精度，结合资格迹和折扣回报模型，对现有的全增量式自然梯度AC算法进行了扩展，提出了改进的全增量式自然梯度AC算法。10×10格子世界问题的仿真结果验证了本文所提算法的有效性和可行性。

关键词：策略梯度函数逼近器 Actor-Critic学习自然梯度策略评估

For information in English, please click here

Incremental Natural-Gradient Actor-Critic Algorithms

Feng Huanting ¹ Cheng Yuhu ¹ Wang Xuesong ¹

1、China University of Mining and Technology

Abstract：Policy gradient reinforcement learning algorithms have better convergence, but larger variance of policy gradient estimation influences the performance of algorithms. In order to improve the convergence speed of policy gradient algorithms and the precision of gradient estimation, four kinds of improved incremental Natural-Gradient Actor-Critic algorithms based on eligibility trace and discounted reward model are proposed by extending existed incremental Natural-Gradient Actor-Critic algorithms. Simulation results of a 10×10 grid world verify the feasibility and validity of the proposed Actor-Critic learning algorithms.

Keywords： Policy gradient Function approximator Actor-Critic learning Natural gradient Policy evaluation

基金：

论文图表：

引用

导出参考文献

.txt

.ris

.doc

冯涣婷，程玉虎，王雪松. 全增量式自然梯度Actor-Critic学习算法[EB/OL]. 北京：中国科技论文在线 [2009-09-07]. https://www.paper.edu.cn/releasepaper/content/200909-190.

No.3497449043412523****

同行评议

共计0人参与

全部评论

0/1000

论文编号	200909-190
论文题目	全增量式自然梯度Actor-Critic学习算法
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.