郭先平
从事马尔可夫决策过程(英文缩写为MDP)和随机动态对策的研究
个性化签名
- 姓名:郭先平
- 目前身份:
- 担任导师情况:
- 学位:
-
学术头衔:
博士生导师, 优秀教师/优秀教育工作者
- 职称:-
-
学科领域:
数理统计学
- 研究兴趣:从事马尔可夫决策过程(英文缩写为MDP)和随机动态对策的研究
郭先平,教授,博士生导师。主要从事马尔可夫决策过程(英文缩写为MDP)和随机动态对策的研究。在离散时间MDP, 连续时间MDP, 以及连续时间Markov随机对策的最优性条件、计算方法和最优性特征的理论及应用的研究中取得系列重要的研究进展。其专著《马尔可夫决策过程》(与侯振挺教授合作)“填补了中国在此领域的空白”; 见《科学通报》(1999, No.1)书评。近年来已发表学术论文40余篇。其中,第一作者30余篇, SCI收录论文20余篇。 这些论文已被多次引用且发表在《Ann. Appl. Probab.》,《IEEE Trans. Autom. Control》, 《SIAM J. Optim.》,《SIAM J. Control Optim.》,《Math. Oper. Res.》,《J. Appl. Probab》,《Automatica》,《Bernoulli》,《Acta Appl. Math.》, 《Stoch. Anal. Appl.》, 《Math. Meth. Oper.Res.》和《科学通报》等著名期刊上。 本人的研究成果主要体现在: (1) 创立了离散时间非平稳MDP平均模型的最优方程,否定了著名学者的相关论点; (2) 用新的方法实质性推进了连续时间MDP理论和应用研究的新进展, 回答了著名学者的有关问题;(3)首次建立了连续时Markov随机对策的概率论基础和新的最优性条件,丰富了随机对策的研究内容. 还主持国家自然科学基金和省部属基金10余项;并于2003年入选教育部“优秀年青年教师资助计划”, 2004年入选教育部“新世纪优秀人才支持计划”,曾多次到国内外合作研究和学术交流。
-
主页访问
3913
-
关注数
0
-
成果阅读
1209
-
成果数
13
【期刊论文】AVERAGE OPTIMALITY FOR MARKOV DECISION PROCESSES IN BOREL SPACES: A NEW CONDITION AND APPROACH
郭先平, XIANPING GUO, Zhongshan University QUANXIN ZHU, South China Normal University
J. Appl. Prob. 43, 318-334(2006),-0001,():
-1年11月30日
In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We first provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufficient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known' optimality inequality approach' widely used in Markov decision processes. Finally, we illustrate our results in two examples.
Discrete-time Markov decision process, average expected criterion, average optimality inequality, optimal stationary policy
-
64浏览
-
0点赞
-
0收藏
-
0分享
-
252下载
-
0评论
-
引用
【期刊论文】NONZERO-SUM GAMES FOR CONTINUOUS-TIME MARKOV CHAINS WITH UNBOUNDED DISCOUNTED PAYOFFS
郭先平, XIANPING GUO, CINVESTAV and Zhongshan University ON
J. Appl. Prob. 42, 303-320(2005),-0001,():
-1年11月30日
In this paper, we study two-person nonzero-sum games for continuous-time Markov chains with discounted payoff criteria and Borel action spaces. The transition rates are possibly unbounded, and the payoff functions might have neither upper nor lower bounds. We give conditions that ensure the existence of Nash equilibria in stationary strategies. For the zero-sum case, we prove the existence of the value of the game, and also provide a recursive way to compute it, or at least to approximate it. Our results are applied to a controlled queueing system. We also show that if the transition rates are uniformly bounded, then a continuous-time game is equivalent, in a suitable sense, to a discrete-time Markov game.
Nonzero-sum game, discounted payoff criterion, Nash equilibrium, controlled Q-process
-
251浏览
-
0点赞
-
0收藏
-
0分享
-
142下载
-
0评论
-
引用
郭先平, Xi-Ren Caoa, Xianping Guob
Automatica 40(2004) 1749-1759,-0001,():
-1年11月30日
We propose a uni edframework to Markov decision problems andperformance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-di1erence formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted-andaverage-reward MDPs can be establishedusing the performance-di1erence formulas in a simple andintuitive way; andthe performance-gradient formulas together with stochastic approximation may leadto new optimization schemes. This sensitivity basedpoint of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36(2000)771).
Policy iteration, Potentials, Perturbation analysis, Performance sensitivity, Reinforcement learning
-
48浏览
-
0点赞
-
0收藏
-
0分享
-
148下载
-
0评论
-
引用
【期刊论文】Continuous-Time Controlled Markov Chains with Discounted Rewards
郭先平, XIANPING GUO and ON
Acta Applicandae Mathematicae 79: 195-216, 2003.,-0001,():
-1年11月30日
This paper studies denumerable state continuous-time controlled Markov chains with the discounted reward criterion and a Borel action space. The reward and transition rates are unbounded, and the reward rates are allowed to take positive or negative values. First, we present new conditions for a nonhomogeneous Q (t)-process to be regular. Then, using these conditions, we give a new set of mild hypotheses that ensure the existence of-optimal (0) stationary policies. We also present a' martingale characterization' of an optimal stationary policy. Our results are illustrated with controlled birth and death processes.
continuous-time controlled Markov chains,, unbounded reward and transition rates,, discounted criterion,, optimal stationary policies,, martingale characterization.,
-
85浏览
-
0点赞
-
0收藏
-
0分享
-
195下载
-
0评论
-
引用
【期刊论文】CONTINUOUS-TIME CONTROLLED MARKOV CHAINS1
郭先平, BY XIANPING Gu AND ONESIMO HERNANDEZ-LERMA
The Annals of Applied Probability 2003, V01. 13, No.1, 363-388,-0001,():
-1年11月30日
This paper concerns studies on continuous-time controlled Markov chains, that is, continuous-time Markov decision processes with a denumer-able state space, with respect to the discounted cost criterion. The cost and transition rates are allowed to be unbounded and the action set is a Borel space. We first study control problems in the class of determini stic station-ary policies and give very weak conditions under which the exi stence of ε-optimal (ε≥0) policies is proved using the construction of a minimum Q-process. Then we further consider control problems in the class of ran-domized Markov policies for (1) regular and (2) nonregular Q-processes. To study case (1), first we present a new necessary and sufficient condition for a nonhomogeneous Q-process to be regular. This regularity condition, together with the extended generator of a nonhomogeneous Markov process. is used to prove the existence of e-optimal stationary policies. Our results for case (1) are illustrated by a Schl6gl model with a controlled diffusion. F0r case (2), we obtain a similar result using Kolmogorov' s forward equation for the min. imum Q-process and we also present an example in which our assumptions are satisfied, but those used in the previous literature fail to hold.
and phrases., Nonhomogeneous continuous-time Markov chains,, controlled Q-pro-cesses,, unbounded cost and transition rates,, discounted criterion,, optimal stationary policies.,
-
135浏览
-
0点赞
-
0收藏
-
0分享
-
157下载
-
0评论
-
引用
郭先平, XIANPING GUO, * ONÉSIMO HERNÁNDEZ-LERMA∗∗
J. Appl. Prob. 40, 327-345(2003),-0001,():
-1年11月30日
This paper is a first study of two-person zero-sum games for denumerable continuous-time Markov chains determined by given transition rates, with an average payoff criterion. The transition rates are allowed to be unbounded, and the payoff rates may have neither upper nor lower bounds. In the spirit of the' drift and monotonicity' conditions for continuoustime Markov processes, we give conditions on the controlled system' s primitive data under which the existence of the value of the game and a pair of strong optimal stationary strategies is ensured by using the Shapley equations. Also, we present a' martingale characterization' of a pair of strong optimal stationary strategies. Our results are illustrated with a birth-and-death game.
Zero-sum game, controlled Q-process, average payoff criterion, pairs of optimal stationary strategies, martingale characterization
-
114浏览
-
0点赞
-
0收藏
-
0分享
-
103下载
-
0评论
-
引用
【期刊论文】LIMITING AVERAGE CRITERIA FOR NONSTATIONARY MARKOV DECISION PROCESSES
郭先平, XIANPING GUO
SIAM J. OPTIM. Vol. 11, No.4, pp. 1037-1053,-0001,():
-1年11月30日
This paper deals with the so-called limiting average criteria for nonstationary Markov decision processes with (possibly unbounded) rewards and Borel state space. A new set of conditions is provided, under which the existence of both a solution to the optimality equations and the limiting average
nonstationary Markov decision processes,, limiting average criteria,, optimality equations,, limiting average
-
70浏览
-
0点赞
-
0收藏
-
0分享
-
85下载
-
0评论
-
引用
【期刊论文】Brief Paper Minimaxcontrol for discrete-time time-varying stochastic systems
郭先平, Xianping Guoa, d, Wen Yub, Xiaoou Lic
Automatica 38(2002)1991-1998,-0001,():
-1年11月30日
This paper gives a self-contained presentation of minimaxcontrol for discrete-time time-varying stochastic systems under 3nite-and in3nite-horizon expected total cost performance criteria. Suitable conditions for the existence of minimax strategies are proposed. Also, we prove that the values of the 3nite-horizon problem converge to the values of the in3nite-horizon problems. Moreover, for 3nite-horizon problems an algorithm of calculation of minimaxstrategies is developed and tested by using time-varying stochastic systems.
Minimaxtechniques, Time-varying system, Stochastic system
-
142浏览
-
0点赞
-
0收藏
-
0分享
-
82下载
-
0评论
-
引用
郭先平, Xianping Guo and Ke Liu
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO.12, DECEMBER 2001,-0001,():
-1年11月30日
This note deals with continuous-time Markov decision processes with a denumerable state space and the average cost criterion. The transition rates are allowed to be unbounded, and the action set is a Borel space. We give a new set of conditions under which the existence of optimal stationary policies is ensured by using the optimality inequality. Our results are illustrated with a controlled queueing model. Moreover, we use an example to show that our conditions do not imply the existence of a solution to the optimality equations in the previous literature.
Average cost criterion,, continuous-time Markov decision processes (, MDPs), ,, optimal stationary policies,, optimality inequality.,
-
73浏览
-
0点赞
-
0收藏
-
0分享
-
98下载
-
0评论
-
引用
郭先平, Xianping Guo and On
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO.2, FEBRUARY 2003,-0001,():
-1年11月30日
In this paper, we give conditions for the existence of average optimal policies for continuous-time controlled Markov chains with a denumerable state-space and Borel action sets. The transition rates are allowed to be unbounded, and the reward/cost rates may have neither upper nor lower bounds. In the spirit of the" drift and monotonicity" conditions for continuous-time Markov processes, we propose a new set of conditions on the controlled process' primitive data under which the existence of optimal (deterministic) stationary policies in the class of randomized Markov policies is proved using the extended generator approach instead of Kolmogorov' s forward equation used in the previous literature, and under which the convergence of a policy iteration method is also shown. Moreover, we use a controlled queueing system to show that all of our conditions are satisfied, whereas those in the previous literature fail to hold.
Average (, or ergodic), reward/, cost criterion,, continuous-time controlled Markov chains (, or continuous-time Markov decision processes), ,, drift and monotonicity conditions,, optimal stationary policy,, unbounded transition and reward/, cost rates.,
-
70浏览
-
0点赞
-
0收藏
-
0分享
-
87下载
-
0评论
-
引用