Number of the records: 1
Risk-Sensitive Optimality in Markov Games
- 1.
SYSNO ASEP 0480036 Document Type C - Proceedings Paper (int. conf.) R&D Document Type Conference Paper Title Risk-Sensitive Optimality in Markov Games Author(s) Sladký, Karel (UTIA-B) RID
Martínez Cortés, V. M. (MX)Number of authors 2 Source Title Proceedings of the 35th International Conference Mathematical Methods in Economics (MME 2017). - Hradec Králové : University of Hradec Králové, 2017 - ISBN 978-80-7435-678-0 Pages s. 684-689 Number of pages 6 s. Publication form Online - E Action MME 2017. International Conference Mathematical Methods in Economics /35./ Event date 13.09.2017 - 15.09.2017 VEvent location Hradec Králové Country CZ - Czech Republic Event type EUR Language eng - English Country CZ - Czech Republic Keywords two-person Markov games ; communicating Markov chains ; risk-sensitive optimality ; dynamic programming Subject RIV AH - Economics OECD category Applied Economics, Econometrics R&D Projects GA13-14445S GA ČR - Czech Science Foundation (CSF) Institutional support UTIA-B - RVO:67985556 UT WOS 000427151400117 Annotation The article is devoted to risk-sensitive optimality in Markov games. Attention is focused on Markov games evolving on communicating Markov chains with two-players with opposite aims. Considering risk-sensitive optimality criteria means that total reward generated by the game is evaluated by exponential utility function with a given risk-sensitive coefficient. In particular, the first player (resp. the secondplayer) tries to maximize (resp. minimize) the long-run risk sensitive average reward. Observe that if the second player is dummy, the problem is reduced to finding optimal policy of the Markov decision chain with the risk-sensitive optimality. Recall that for the risk sensitivity coefficient equal to zero we arrive at traditional optimality criteria. In this article, connections between risk-sensitive and risk-neutral Markov decisionchains and Markov games models are studied using discrepancy functions. Explicit formulae for bounds on the risk-sensitive average long-run reward are reported. Policy iteration algorithm for finding suboptimal policies of both players is suggested. The obtained results are illustrated on numerical example. Workplace Institute of Information Theory and Automation Contact Markéta Votavová, votavova@utia.cas.cz, Tel.: 266 052 201. Year of Publishing 2018
Number of the records: 1
