A uni edapproach to Markov decision problems andperformance sensitivity analysis with discountedandaverage criteria: multichain cases ，成果详细信息-中国科技论文在线

郭先平

48浏览
0点赞
0收藏
0分享
148下载
0评论
引用

期刊论文

A uni edapproach to Markov decision problems andperformance sensitivity analysis with discountedandaverage criteria: multichain cases

郭先平， Xi-Ren Caoa ， Xianping Guob

Automatica 40(2004) 1749-1759，-0001，（）：

URL:

摘要/描述

We propose a uni edframework to Markov decision problems andperformance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-di1erence formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted-andaverage-reward MDPs can be establishedusing the performance-di1erence formulas in a simple andintuitive way; andthe performance-gradient formulas together with stochastic approximation may leadto new optimization schemes. This sensitivity basedpoint of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36(2000)771).

关键词: Policy iteration ， Potentials ， Perturbation analysis ， Performance sensitivity ， Reinforcement learning

问答

暂无问题，成为第一个提问者

我要提问全部问题

【免责声明】以下全部内容由[郭先平]上传于[2006年10月12日 02时16分46秒]，版权归原创者所有。本文仅代表作者本人观点，与本网站无关。本网站对文中陈述、观点判断保持中立，不对所包含内容的准确性、可靠性或完整性提供任何明示或暗示的保证。请读者仅作参考，并请自行承担全部责任。

我要评论

全部评论 共 0 条

本学者其他成果

同领域成果