已为您找到该学者13条结果 成果回收站
【期刊论文】AVERAGE OPTIMALITY FOR MARKOV DECISION PROCESSES IN BOREL SPACES: A NEW CONDITION AND APPROACH
郭先平, XIANPING GUO, Zhongshan University QUANXIN ZHU, South China Normal University
J. Appl. Prob. 43, 318-334(2006),-0001,():
-1年11月30日
In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We first provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufficient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known' optimality inequality approach' widely used in Markov decision processes. Finally, we illustrate our results in two examples.
Discrete-time Markov decision process, average expected criterion, average optimality inequality, optimal stationary policy
-
64浏览
-
0点赞
-
0收藏
-
0分享
-
252下载
-
0
-
引用
【期刊论文】NONZERO-SUM GAMES FOR CONTINUOUS-TIME MARKOV CHAINS WITH UNBOUNDED DISCOUNTED PAYOFFS
郭先平, XIANPING GUO, CINVESTAV and Zhongshan University ON
J. Appl. Prob. 42, 303-320(2005),-0001,():
-1年11月30日
In this paper, we study two-person nonzero-sum games for continuous-time Markov chains with discounted payoff criteria and Borel action spaces. The transition rates are possibly unbounded, and the payoff functions might have neither upper nor lower bounds. We give conditions that ensure the existence of Nash equilibria in stationary strategies. For the zero-sum case, we prove the existence of the value of the game, and also provide a recursive way to compute it, or at least to approximate it. Our results are applied to a controlled queueing system. We also show that if the transition rates are uniformly bounded, then a continuous-time game is equivalent, in a suitable sense, to a discrete-time Markov game.
Nonzero-sum game, discounted payoff criterion, Nash equilibrium, controlled Q-process
-
251浏览
-
0点赞
-
0收藏
-
0分享
-
142下载
-
0
-
引用
郭先平, Xi-Ren Caoa, Xianping Guob
Automatica 40(2004) 1749-1759,-0001,():
-1年11月30日
We propose a uni edframework to Markov decision problems andperformance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-di1erence formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted-andaverage-reward MDPs can be establishedusing the performance-di1erence formulas in a simple andintuitive way; andthe performance-gradient formulas together with stochastic approximation may leadto new optimization schemes. This sensitivity basedpoint of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36(2000)771).
Policy iteration, Potentials, Perturbation analysis, Performance sensitivity, Reinforcement learning
-
48浏览
-
0点赞
-
0收藏
-
0分享
-
148下载
-
0
-
引用
【期刊论文】Continuous-Time Controlled Markov Chains with Discounted Rewards
郭先平, XIANPING GUO and ON
Acta Applicandae Mathematicae 79: 195-216, 2003.,-0001,():
-1年11月30日
This paper studies denumerable state continuous-time controlled Markov chains with the discounted reward criterion and a Borel action space. The reward and transition rates are unbounded, and the reward rates are allowed to take positive or negative values. First, we present new conditions for a nonhomogeneous Q (t)-process to be regular. Then, using these conditions, we give a new set of mild hypotheses that ensure the existence of-optimal (0) stationary policies. We also present a' martingale characterization' of an optimal stationary policy. Our results are illustrated with controlled birth and death processes.
continuous-time controlled Markov chains,, unbounded reward and transition rates,, discounted criterion,, optimal stationary policies,, martingale characterization.,
-
85浏览
-
0点赞
-
0收藏
-
0分享
-
195下载
-
0
-
引用
【期刊论文】CONTINUOUS-TIME CONTROLLED MARKOV CHAINS1
郭先平, BY XIANPING Gu AND ONESIMO HERNANDEZ-LERMA
The Annals of Applied Probability 2003, V01. 13, No.1, 363-388,-0001,():
-1年11月30日
This paper concerns studies on continuous-time controlled Markov chains, that is, continuous-time Markov decision processes with a denumer-able state space, with respect to the discounted cost criterion. The cost and transition rates are allowed to be unbounded and the action set is a Borel space. We first study control problems in the class of determini stic station-ary policies and give very weak conditions under which the exi stence of ε-optimal (ε≥0) policies is proved using the construction of a minimum Q-process. Then we further consider control problems in the class of ran-domized Markov policies for (1) regular and (2) nonregular Q-processes. To study case (1), first we present a new necessary and sufficient condition for a nonhomogeneous Q-process to be regular. This regularity condition, together with the extended generator of a nonhomogeneous Markov process. is used to prove the existence of e-optimal stationary policies. Our results for case (1) are illustrated by a Schl6gl model with a controlled diffusion. F0r case (2), we obtain a similar result using Kolmogorov' s forward equation for the min. imum Q-process and we also present an example in which our assumptions are satisfied, but those used in the previous literature fail to hold.
and phrases., Nonhomogeneous continuous-time Markov chains,, controlled Q-pro-cesses,, unbounded cost and transition rates,, discounted criterion,, optimal stationary policies.,
-
135浏览
-
0点赞
-
0收藏
-
0分享
-
157下载
-
0
-
引用