郭先平，学者主页-中国科技论文在线

郭先平

从事马尔可夫决策过程(英文缩写为MDP)和随机动态对策的研究

个性化签名

TA的关注(0) 关注TA的(0)

留言板

暂无留言

主页成果学术会议学者精选辑更多功能敬请期待

姓名：郭先平
目前身份：
担任导师情况：
学位：
学术头衔：

博士生导师，优秀教师/优秀教育工作者
职称：-
学科领域：

数理统计学
研究兴趣：从事马尔可夫决策过程(英文缩写为MDP)和随机动态对策的研究

个人简介

郭先平,教授，博士生导师。主要从事马尔可夫决策过程(英文缩写为MDP)和随机动态对策的研究。在离散时间MDP, 连续时间MDP, 以及连续时间Markov随机对策的最优性条件、计算方法和最优性特征的理论及应用的研究中取得系列重要的研究进展。其专著《马尔可夫决策过程》（与侯振挺教授合作）“填补了中国在此领域的空白”; 见《科学通报》(1999, No.1)书评。近年来已发表学术论文40余篇。其中，第一作者30余篇, SCI收录论文20余篇。这些论文已被多次引用且发表在《Ann. Appl. Probab.》，《IEEE Trans. Autom. Control》，《SIAM J. Optim.》，《SIAM J. Control Optim.》，《Math. Oper. Res.》，《J. Appl. Probab》，《Automatica》，《Bernoulli》，《Acta Appl. Math.》, 《Stoch. Anal. Appl.》, 《Math. Meth. Oper.Res.》和《科学通报》等著名期刊上。本人的研究成果主要体现在: (1) 创立了离散时间非平稳MDP平均模型的最优方程,否定了著名学者的相关论点； (2) 用新的方法实质性推进了连续时间MDP理论和应用研究的新进展, 回答了著名学者的有关问题；(3)首次建立了连续时Markov随机对策的概率论基础和新的最优性条件，丰富了随机对策的研究内容. 还主持国家自然科学基金和省部属基金10余项；并于2003年入选教育部“优秀年青年教师资助计划”, 2004年入选教育部“新世纪优秀人才支持计划”,曾多次到国内外合作研究和学术交流。

主页访问

3913
关注数

0
成果阅读

1209
成果数

13

TA的成果

上传时间

2006-10-12

【期刊论文】AVERAGE OPTIMALITY FOR MARKOV DECISION PROCESSES IN BOREL SPACES: A NEW CONDITION AND APPROACH

郭先平， XIANPING GUO， Zhongshan University QUANXIN ZHU， South China Normal University

J. Appl. Prob. 43, 318-334(2006)，-0001，（）：

-1年11月30日

摘要

In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We ﬁrst provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufﬁcient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known' optimality inequality approach' widely used in Markov decision processes. Finally, we illustrate our results in two examples.

Discrete-time Markov decision process， average expected criterion， average optimality inequality， optimal stationary policy

64浏览
0点赞
0收藏
0分享
252下载
0评论
引用

上传时间

2006-10-12

【期刊论文】NONZERO-SUM GAMES FOR CONTINUOUS-TIME MARKOV CHAINS WITH UNBOUNDED DISCOUNTED PAYOFFS

郭先平， XIANPING GUO， CINVESTAV and Zhongshan University ON

J. Appl. Prob. 42, 303-320(2005)，-0001，（）：

-1年11月30日

摘要

In this paper, we study two-person nonzero-sum games for continuous-time Markov chains with discounted payoff criteria and Borel action spaces. The transition rates are possibly unbounded, and the payoff functions might have neither upper nor lower bounds. We give conditions that ensure the existence of Nash equilibria in stationary strategies. For the zero-sum case, we prove the existence of the value of the game, and also provide a recursive way to compute it, or at least to approximate it. Our results are applied to a controlled queueing system. We also show that if the transition rates are uniformly bounded, then a continuous-time game is equivalent, in a suitable sense, to a discrete-time Markov game.

Nonzero-sum game， discounted payoff criterion， Nash equilibrium， controlled Q-process

251浏览
0点赞
0收藏
0分享
142下载
0评论
引用

上传时间

2006-10-12

【期刊论文】A uni edapproach to Markov decision problems andperformance sensitivity analysis with discountedandaverage criteria: multichain cases

郭先平， Xi-Ren Caoa， Xianping Guob

Automatica 40(2004) 1749-1759，-0001，（）：

-1年11月30日

摘要

We propose a uni edframework to Markov decision problems andperformance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-di1erence formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted-andaverage-reward MDPs can be establishedusing the performance-di1erence formulas in a simple andintuitive way; andthe performance-gradient formulas together with stochastic approximation may leadto new optimization schemes. This sensitivity basedpoint of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36(2000)771).

Policy iteration， Potentials， Perturbation analysis， Performance sensitivity， Reinforcement learning

48浏览
0点赞
0收藏
0分享
148下载
0评论
引用

上传时间

2006-10-12

【期刊论文】Continuous-Time Controlled Markov Chains with Discounted Rewards

郭先平， XIANPING GUO and ON

Acta Applicandae Mathematicae 79: 195-216, 2003.，-0001，（）：

-1年11月30日

摘要

This paper studies denumerable state continuous-time controlled Markov chains with the discounted reward criterion and a Borel action space. The reward and transition rates are unbounded, and the reward rates are allowed to take positive or negative values. First, we present new conditions for a nonhomogeneous Q (t)-process to be regular. Then, using these conditions, we give a new set of mild hypotheses that ensure the existence of-optimal (0) stationary policies. We also present a' martingale characterization' of an optimal stationary policy. Our results are illustrated with controlled birth and death processes.

continuous-time controlled Markov chains,， unbounded reward and transition rates,， discounted criterion,， optimal stationary policies,， martingale characterization.，

85浏览
0点赞
0收藏
0分享
195下载
0评论
引用

上传时间

2006-10-12

【期刊论文】CONTINUOUS-TIME CONTROLLED MARKOV CHAINS1

郭先平， BY XIANPING Gu AND ONESIMO HERNANDEZ-LERMA

The Annals of Applied Probability 2003, V01. 13, No.1, 363-388，-0001，（）：

-1年11月30日

摘要

This paper concerns studies on continuous-time controlled Markov chains, that is, continuous-time Markov decision processes with a denumer-able state space, with respect to the discounted cost criterion. The cost and transition rates are allowed to be unbounded and the action set is a Borel space. We first study control problems in the class of determini stic station-ary policies and give very weak conditions under which the exi stence of ε-optimal (ε≥0) policies is proved using the construction of a minimum Q-process. Then we further consider control problems in the class of ran-domized Markov policies for (1) regular and (2) nonregular Q-processes. To study case (1), first we present a new necessary and sufficient condition for a nonhomogeneous Q-process to be regular. This regularity condition, together with the extended generator of a nonhomogeneous Markov process. is used to prove the existence of e-optimal stationary policies. Our results for case (1) are illustrated by a Schl6gl model with a controlled diffusion. F0r case (2), we obtain a similar result using Kolmogorov' s forward equation for the min. imum Q-process and we also present an example in which our assumptions are satisfied, but those used in the previous literature fail to hold.

and phrases.， Nonhomogeneous continuous-time Markov chains,， controlled Q-pro-cesses,， unbounded cost and transition rates,， discounted criterion,， optimal stationary policies.，

135浏览
0点赞
0收藏
0分享
157下载
0评论
引用

上传时间

2006-10-12

【期刊论文】ZERO-SUM GAMES FOR CONTINUOUS-TIME MARKOV CHAINS WITH UNBOUNDED TRANSITION AND AVERAGE PAYOFF RATES

郭先平， XIANPING GUO， * ONÉSIMO HERNÁNDEZ-LERMA∗∗

J. Appl. Prob. 40, 327-345(2003)，-0001，（）：

-1年11月30日

摘要

This paper is a ﬁrst study of two-person zero-sum games for denumerable continuous-time Markov chains determined by given transition rates, with an average payoff criterion. The transition rates are allowed to be unbounded, and the payoff rates may have neither upper nor lower bounds. In the spirit of the' drift and monotonicity' conditions for continuoustime Markov processes, we give conditions on the controlled system' s primitive data under which the existence of the value of the game and a pair of strong optimal stationary strategies is ensured by using the Shapley equations. Also, we present a' martingale characterization' of a pair of strong optimal stationary strategies. Our results are illustrated with a birth-and-death game.

Zero-sum game， controlled Q-process， average payoff criterion， pairs of optimal stationary strategies， martingale characterization

114浏览
0点赞
0收藏
0分享
103下载
0评论
引用

上传时间

2006-10-12

【期刊论文】LIMITING AVERAGE CRITERIA FOR NONSTATIONARY MARKOV DECISION PROCESSES

郭先平， XIANPING GUO

SIAM J. OPTIM. Vol. 11, No.4, pp. 1037-1053，-0001，（）：

-1年11月30日

摘要

This paper deals with the so-called limiting average criteria for nonstationary Markov decision processes with (possibly unbounded) rewards and Borel state space. A new set of conditions is provided, under which the existence of both a solution to the optimality equations and the limiting average

nonstationary Markov decision processes,， limiting average criteria,， optimality equations,， limiting average

70浏览
0点赞
0收藏
0分享
85下载
0评论
引用

上传时间

2006-10-12

【期刊论文】Brief Paper Minimaxcontrol for discrete-time time-varying stochastic systems

郭先平， Xianping Guoa， d， Wen Yub， Xiaoou Lic

Automatica 38(2002)1991-1998，-0001，（）：

-1年11月30日

摘要

This paper gives a self-contained presentation of minimaxcontrol for discrete-time time-varying stochastic systems under 3nite-and in3nite-horizon expected total cost performance criteria. Suitable conditions for the existence of minimax strategies are proposed. Also, we prove that the values of the 3nite-horizon problem converge to the values of the in3nite-horizon problems. Moreover, for 3nite-horizon problems an algorithm of calculation of minimaxstrategies is developed and tested by using time-varying stochastic systems.

Minimaxtechniques， Time-varying system， Stochastic system

142浏览
0点赞
0收藏
0分享
82下载
0评论
引用

上传时间

2006-10-12

【期刊论文】A Note on Optimality Conditions for Continuous-Time Markov Decision Processes With Average Cost Criterion

郭先平， Xianping Guo and Ke Liu

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO.12, DECEMBER 2001，-0001，（）：

-1年11月30日

摘要

This note deals with continuous-time Markov decision processes with a denumerable state space and the average cost criterion. The transition rates are allowed to be unbounded, and the action set is a Borel space. We give a new set of conditions under which the existence of optimal stationary policies is ensured by using the optimality inequality. Our results are illustrated with a controlled queueing model. Moreover, we use an example to show that our conditions do not imply the existence of a solution to the optimality equations in the previous literature.

Average cost criterion,， continuous-time Markov decision processes (， MDPs)， ,， optimal stationary policies,， optimality inequality.，

73浏览
0点赞
0收藏
0分享
98下载
0评论
引用

上传时间

2006-10-12

【期刊论文】Drift and Monotonicity Conditions for Continuous-Time Controlled Markov Chains With an Average Criterion

郭先平， Xianping Guo and On

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO.2, FEBRUARY 2003，-0001，（）：

-1年11月30日

摘要

In this paper, we give conditions for the existence of average optimal policies for continuous-time controlled Markov chains with a denumerable state-space and Borel action sets. The transition rates are allowed to be unbounded, and the reward/cost rates may have neither upper nor lower bounds. In the spirit of the" drift and monotonicity" conditions for continuous-time Markov processes, we propose a new set of conditions on the controlled process' primitive data under which the existence of optimal (deterministic) stationary policies in the class of randomized Markov policies is proved using the extended generator approach instead of Kolmogorov' s forward equation used in the previous literature, and under which the convergence of a policy iteration method is also shown. Moreover, we use a controlled queueing system to show that all of our conditions are satisfied, whereas those in the previous literature fail to hold.

Average (， or ergodic)， reward/， cost criterion,， continuous-time controlled Markov chains (， or continuous-time Markov decision processes)， ,， drift and monotonicity conditions,， optimal stationary policy,， unbounded transition and reward/， cost rates.，

70浏览
0点赞
0收藏
0分享
87下载
0评论
引用

上传时间

2006-10-12

【期刊论文】AVERAGE OPTIMALITY FOR MARKOV DECISION PROCESSES IN BOREL SPACES: A NEW CONDITION AND APPROACH

郭先平， XIANPING GUO， Zhongshan University QUANXIN ZHU， South China Normal University

J. Appl. Prob. 43, 318-334(2006)，-0001，（）：

-1年11月30日

摘要

Discrete-time Markov decision process， average expected criterion， average optimality inequality， optimal stationary policy

64浏览
0点赞
0收藏
0分享
252下载
0评论
引用

上传时间

2006-10-12

【期刊论文】Continuous-Time Controlled Markov Chains with Discounted Rewards

郭先平， XIANPING GUO and ON

Acta Applicandae Mathematicae 79: 195-216, 2003.，-0001，（）：

-1年11月30日

摘要

continuous-time controlled Markov chains,， unbounded reward and transition rates,， discounted criterion,， optimal stationary policies,， martingale characterization.，

85浏览
0点赞
0收藏
0分享
195下载
0评论
引用

上传时间

2006-10-12

【期刊论文】CONTINUOUS-TIME CONTROLLED MARKOV CHAINS1

郭先平， BY XIANPING Gu AND ONESIMO HERNANDEZ-LERMA

The Annals of Applied Probability 2003, V01. 13, No.1, 363-388，-0001，（）：

-1年11月30日

摘要

and phrases.， Nonhomogeneous continuous-time Markov chains,， controlled Q-pro-cesses,， unbounded cost and transition rates,， discounted criterion,， optimal stationary policies.，

135浏览
0点赞
0收藏
0分享
157下载
0评论
引用

上传时间

2006-10-12

【期刊论文】A uni edapproach to Markov decision problems andperformance sensitivity analysis with discountedandaverage criteria: multichain cases

郭先平， Xi-Ren Caoa， Xianping Guob

Automatica 40(2004) 1749-1759，-0001，（）：

-1年11月30日

摘要

Policy iteration， Potentials， Perturbation analysis， Performance sensitivity， Reinforcement learning

48浏览
0点赞
0收藏
0分享
148下载
0评论
引用

上传时间

2006-10-12

【期刊论文】NONZERO-SUM GAMES FOR CONTINUOUS-TIME MARKOV CHAINS WITH UNBOUNDED DISCOUNTED PAYOFFS

郭先平， XIANPING GUO， CINVESTAV and Zhongshan University ON

J. Appl. Prob. 42, 303-320(2005)，-0001，（）：

-1年11月30日

摘要

Nonzero-sum game， discounted payoff criterion， Nash equilibrium， controlled Q-process

251浏览
0点赞
0收藏
0分享
142下载
0评论
引用

上传时间

2006-10-12

【期刊论文】ZERO-SUM GAMES FOR CONTINUOUS-TIME MARKOV CHAINS WITH UNBOUNDED TRANSITION AND AVERAGE PAYOFF RATES

郭先平， XIANPING GUO， * ONÉSIMO HERNÁNDEZ-LERMA∗∗

J. Appl. Prob. 40, 327-345(2003)，-0001，（）：

-1年11月30日

摘要

Zero-sum game， controlled Q-process， average payoff criterion， pairs of optimal stationary strategies， martingale characterization

114浏览
0点赞
0收藏
0分享
103下载
0评论
引用

上传时间

2006-10-12

【期刊论文】A Note on Optimality Conditions for Continuous-Time Markov Decision Processes With Average Cost Criterion

郭先平， Xianping Guo and Ke Liu

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO.12, DECEMBER 2001，-0001，（）：

-1年11月30日

摘要

Average cost criterion,， continuous-time Markov decision processes (， MDPs)， ,， optimal stationary policies,， optimality inequality.，

73浏览
0点赞
0收藏
0分享
98下载
0评论
引用

上传时间

2006-10-12

【期刊论文】Drift and Monotonicity Conditions for Continuous-Time Controlled Markov Chains With an Average Criterion

郭先平， Xianping Guo and On

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO.2, FEBRUARY 2003，-0001，（）：

-1年11月30日

摘要

70浏览
0点赞
0收藏
0分享
87下载
0评论
引用

上传时间

2006-10-12

【期刊论文】LIMITING AVERAGE CRITERIA FOR NONSTATIONARY MARKOV DECISION PROCESSES

郭先平， XIANPING GUO

SIAM J. OPTIM. Vol. 11, No.4, pp. 1037-1053，-0001，（）：

-1年11月30日

摘要

nonstationary Markov decision processes,， limiting average criteria,， optimality equations,， limiting average

70浏览
0点赞
0收藏
0分享
85下载
0评论
引用

上传时间

2006-10-12

【期刊论文】OPTIMAL CONTROL OF ERGODIC CONTINUOUS-TIME MARKOV CHAINS WITH AVERAGE SAMPLE-PATH REWARDS∗

郭先平， XIANPING GUO† AND XI-REN CAO‡

SIAM J. CONTROL OPTIM. Vol. 44, No.1, pp. 29-48，-0001，（）：

-1年11月30日

摘要

In this paper we study continuous-time Markov decision processes with the average sample-path reward (ASPR) criterion and possibly unbounded transition and reward rates. We propose conditions on the system' s primitive data for the existence of-ASPR-optimal (deterministic) stationary policies in a class of randomized Markov policies satisfying some additional continuity assumptions. The proof of this fact is based on the time discretization technique, the martingale stability theory, and the concept of potential. We also provide both policy and value iteration algorithms for computing, or at least approximating, the-ASPR-optimal stationary policies. We illustrate with examples our main results as well as the dierence between the ASPR and the average expected reward criteria.

average sample-path reward,， continuous-time Markov chain,， optimal stationary policy,， policy and value iteration algorithms

60浏览
0点赞
0收藏
0分享
82下载
0评论
引用