Number of the records: 1
Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies
- 1.
SYSNO ASEP 0503817 Document Type C - Proceedings Paper (int. conf.) R&D Document Type Conference Paper Title Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies Author(s) Kárný, Miroslav (UTIA-B) RID, ORCID
Hůla, František (UTIA-B)Number of authors 2 Source Title Proceedings of the 11th International Conference on Agents and Artificial Intelligence, 2. - Setúbal : SciTePress, 2019 / Rocha A. ; Steels L. ; van den Herik J. - ISBN 978-989-758-350-6 Pages s. 857-864 Number of pages 8 s. Publication form Print - P Action International Conference on Agents and Artificial Intelligence Event date 19.02.2019 - 21.02.2019 VEvent location Praha Country CZ - Czech Republic Event type WRD Language eng - English Country PT - Portugal Keywords exploitation ; exploration ; adaptive systems ; Bayesian estimation ; fully probabilistic design ; Markov decision process Subject RIV BC - Control Systems Theory OECD category Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) R&D Projects GA16-09848S GA ČR - Czech Science Foundation (CSF) GA18-15970S GA ČR - Czech Science Foundation (CSF) Institutional support UTIA-B - RVO:67985556 EID SCOPUS 85064837601 DOI 10.5220/0007587208570864 Annotation Adaptive decision making learns an environment model serving a design of a decision policy. The policy-generated actions influence both the acquired reward and the future knowledge. The optimal policy properly balances exploitation with exploration. The inherent dimensionality curse of decision making under incomplete knowledge prevents the realisation of the optimal design. This has stimulated repetitive attempts to reach this balance at least approximately. Usually, either: (a) the exploitative reward is enriched by a part reflecting the exploration quality and a feasible approximate certainty-equivalent design is made, or (b) an explorative random noise is added to the purely exploitative actions. This paper avoids the inauspicious (a) and improves (b) by employing the non-standard fully probabilistic design (FPD) of decision policies, which naturally generates random actions. Monte-Carlo experiments confirm its achieved quality. The quality stems from methodological contributions, which include: (i) an improvement of the relation between FPD and standard Markov decision processes, (ii) a design of an adaptive tuning of an FPD-parameter. The latter also suits for the tuning of the temperature in both simulated annealing and Boltzmann’s machine. Workplace Institute of Information Theory and Automation Contact Markéta Votavová, votavova@utia.cas.cz, Tel.: 266 052 201. Year of Publishing 2020
Number of the records: 1