中国科技论文在线

上传时间

2006年10月12日

【期刊论文】AVERAGE OPTIMALITY FOR MARKOV DECISION PROCESSES IN BOREL SPACES: A NEW CONDITION AND APPROACH

郭先平， XIANPING GUO， Zhongshan University QUANXIN ZHU， South China Normal University

J. Appl. Prob. 43, 318-334(2006)，-0001，（）：

-1年11月30日

In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We ﬁrst provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufﬁcient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known' optimality inequality approach' widely used in Markov decision processes. Finally, we illustrate our results in two examples.

Discrete-time Markov decision process， average expected criterion， average optimality inequality， optimal stationary policy

64浏览
0点赞
0收藏
0分享
252下载
0

引用

上传时间

2006年10月12日

【期刊论文】NONZERO-SUM GAMES FOR CONTINUOUS-TIME MARKOV CHAINS WITH UNBOUNDED DISCOUNTED PAYOFFS

郭先平， XIANPING GUO， CINVESTAV and Zhongshan University ON

J. Appl. Prob. 42, 303-320(2005)，-0001，（）：

-1年11月30日

摘要

In this paper, we study two-person nonzero-sum games for continuous-time Markov chains with discounted payoff criteria and Borel action spaces. The transition rates are possibly unbounded, and the payoff functions might have neither upper nor lower bounds. We give conditions that ensure the existence of Nash equilibria in stationary strategies. For the zero-sum case, we prove the existence of the value of the game, and also provide a recursive way to compute it, or at least to approximate it. Our results are applied to a controlled queueing system. We also show that if the transition rates are uniformly bounded, then a continuous-time game is equivalent, in a suitable sense, to a discrete-time Markov game.

Nonzero-sum game， discounted payoff criterion， Nash equilibrium， controlled Q-process

251浏览
0点赞
0收藏
0分享
142下载
0

引用

上传时间

2006年10月12日

【期刊论文】A uni edapproach to Markov decision problems andperformance sensitivity analysis with discountedandaverage criteria: multichain cases

郭先平， Xi-Ren Caoa， Xianping Guob

Automatica 40(2004) 1749-1759，-0001，（）：

-1年11月30日

摘要

We propose a uni edframework to Markov decision problems andperformance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-di1erence formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted-andaverage-reward MDPs can be establishedusing the performance-di1erence formulas in a simple andintuitive way; andthe performance-gradient formulas together with stochastic approximation may leadto new optimization schemes. This sensitivity basedpoint of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36(2000)771).

Policy iteration， Potentials， Perturbation analysis， Performance sensitivity， Reinforcement learning

48浏览
0点赞
0收藏
0分享
148下载
0

引用

上传时间

2006年10月12日

【期刊论文】Continuous-Time Controlled Markov Chains with Discounted Rewards

郭先平， XIANPING GUO and ON

Acta Applicandae Mathematicae 79: 195-216, 2003.，-0001，（）：

-1年11月30日

摘要

This paper studies denumerable state continuous-time controlled Markov chains with the discounted reward criterion and a Borel action space. The reward and transition rates are unbounded, and the reward rates are allowed to take positive or negative values. First, we present new conditions for a nonhomogeneous Q (t)-process to be regular. Then, using these conditions, we give a new set of mild hypotheses that ensure the existence of-optimal (0) stationary policies. We also present a' martingale characterization' of an optimal stationary policy. Our results are illustrated with controlled birth and death processes.

continuous-time controlled Markov chains,， unbounded reward and transition rates,， discounted criterion,， optimal stationary policies,， martingale characterization.，

85浏览
0点赞
0收藏
0分享
195下载
0

引用

上传时间

2006年10月12日

【期刊论文】CONTINUOUS-TIME CONTROLLED MARKOV CHAINS1

郭先平， BY XIANPING Gu AND ONESIMO HERNANDEZ-LERMA

The Annals of Applied Probability 2003, V01. 13, No.1, 363-388，-0001，（）：

-1年11月30日

摘要

This paper concerns studies on continuous-time controlled Markov chains, that is, continuous-time Markov decision processes with a denumer-able state space, with respect to the discounted cost criterion. The cost and transition rates are allowed to be unbounded and the action set is a Borel space. We first study control problems in the class of determini stic station-ary policies and give very weak conditions under which the exi stence of ε-optimal (ε≥0) policies is proved using the construction of a minimum Q-process. Then we further consider control problems in the class of ran-domized Markov policies for (1) regular and (2) nonregular Q-processes. To study case (1), first we present a new necessary and sufficient condition for a nonhomogeneous Q-process to be regular. This regularity condition, together with the extended generator of a nonhomogeneous Markov process. is used to prove the existence of e-optimal stationary policies. Our results for case (1) are illustrated by a Schl6gl model with a controlled diffusion. F0r case (2), we obtain a similar result using Kolmogorov' s forward equation for the min. imum Q-process and we also present an example in which our assumptions are satisfied, but those used in the previous literature fail to hold.

and phrases.， Nonhomogeneous continuous-time Markov chains,， controlled Q-pro-cesses,， unbounded cost and transition rates,， discounted criterion,， optimal stationary policies.，

135浏览
0点赞
0收藏
0分享
157下载
0

引用