Počet záznamů: 1
Risk-Sensitive Optimality in Markov Games
- 1.
SYSNO ASEP 0480036 Druh ASEP C - Konferenční příspěvek (mezinárodní konf.) Zařazení RIV D - Článek ve sborníku Název Risk-Sensitive Optimality in Markov Games Tvůrce(i) Sladký, Karel (UTIA-B) RID
Martínez Cortés, V. M. (MX)Celkový počet autorů 2 Zdroj.dok. Proceedings of the 35th International Conference Mathematical Methods in Economics (MME 2017). - Hradec Králové : University of Hradec Králové, 2017 - ISBN 978-80-7435-678-0 Rozsah stran s. 684-689 Poč.str. 6 s. Forma vydání Online - E Akce MME 2017. International Conference Mathematical Methods in Economics /35./ Datum konání 13.09.2017 - 15.09.2017 Místo konání Hradec Králové Země CZ - Česká republika Typ akce EUR Jazyk dok. eng - angličtina Země vyd. CZ - Česká republika Klíč. slova two-person Markov games ; communicating Markov chains ; risk-sensitive optimality ; dynamic programming Vědní obor RIV AH - Ekonomie Obor OECD Applied Economics, Econometrics CEP GA13-14445S GA ČR - Grantová agentura ČR Institucionální podpora UTIA-B - RVO:67985556 UT WOS 000427151400117 Anotace The article is devoted to risk-sensitive optimality in Markov games. Attention is focused on Markov games evolving on communicating Markov chains with two-players with opposite aims. Considering risk-sensitive optimality criteria means that total reward generated by the game is evaluated by exponential utility function with a given risk-sensitive coefficient. In particular, the first player (resp. the secondplayer) tries to maximize (resp. minimize) the long-run risk sensitive average reward. Observe that if the second player is dummy, the problem is reduced to finding optimal policy of the Markov decision chain with the risk-sensitive optimality. Recall that for the risk sensitivity coefficient equal to zero we arrive at traditional optimality criteria. In this article, connections between risk-sensitive and risk-neutral Markov decisionchains and Markov games models are studied using discrepancy functions. Explicit formulae for bounds on the risk-sensitive average long-run reward are reported. Policy iteration algorithm for finding suboptimal policies of both players is suggested. The obtained results are illustrated on numerical example. Pracoviště Ústav teorie informace a automatizace Kontakt Markéta Votavová, votavova@utia.cas.cz, Tel.: 266 052 201. Rok sběru 2018
Počet záznamů: 1