基于因果图的FMDP问题的任务层次图的动态优化方法
首发时间:2012-03-31
摘要:分层强化学习(Hierarchical Reinforcement Learning, HRL)是解决强化学习维数灾难问题的一种重要方法。分层强化学习的一个重要问题是任务的层次图需要由设计者根据专家知识事先给定。目前,关于MAXQ自动分层方法主要有HI-MAT方法,但是HI-MAT得到的任务图依赖于观察到的一条成功路径,从而得到任务图结构空间中和这条轨迹一致的任务图,这样容易陷入局部最优。本文提出一种基于目标环境因果图(Causal Graph, CG)的MAXQ任务图的进化方法。根据目标环境的因果图调整对任务图层次空间搜索方向,从而加快搜索并得到更优化的结果。本方法使用了遗传编程(Genetic Programming, GP)进化算法,其遗传算子(主要包括交叉、变异运算)运算时保持任务图中被调整结点的相关终止谓词所包含的状态变量在因果图中的因果依赖性,以此在加快学习速度的过程中,改善任务图的适应性。实验结果表明了进化的任务图的优越性。
关键词: 复杂系统 分层强化学习 遗传编程 因果图 任务层次图
For information in English, please click here
Causal graph based dynamic optimization of hierarchies for factored MDPs
Abstract:Hierarchical reinforcement learning (HRL) is a well-practiced method to alleviate the complexity of search space for reinforcement learning. The key of HRL is to build a task hierarchy, such that learning can be conducted locally within each level. In practice, we cannot always rely on human to discover task hierarchies, where as the task hierarchies induced by computers are sometimes ineffective. In this paper, we present an approach based on casual graph named AEHM (Auto-adjustment and Evolution of Hierarchy for MAXQ) to improve the task hierarchies automatically generated by computers. Experimental results show that the resulting task hierarchies are more effective for reinforcement leaning.
Keywords: Complex Systems Hierarchical Reinforcement Learning Genetic Programming Casual Graph Task Hierarchy
基金:
论文图表:
引用
No.****
同行评议
共计0人参与
勘误表
基于因果图的FMDP问题的任务层次图的动态优化方法
评论
全部评论0/1000