Number of the records: 1  

Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies

  1. 1.
    SYSNO ASEP0503817
    Document TypeC - Proceedings Paper (int. conf.)
    R&D Document TypeConference Paper
    TitleBalancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies
    Author(s) Kárný, Miroslav (UTIA-B) RID, ORCID
    Hůla, František (UTIA-B)
    Number of authors2
    Source TitleProceedings of the 11th International Conference on Agents and Artificial Intelligence, 2. - Setúbal : SciTePress, 2019 / Rocha A. ; Steels L. ; van den Herik J. - ISBN 978-989-758-350-6
    Pagess. 857-864
    Number of pages8 s.
    Publication formPrint - P
    ActionInternational Conference on Agents and Artificial Intelligence
    Event date19.02.2019 - 21.02.2019
    VEvent locationPraha
    CountryCZ - Czech Republic
    Event typeWRD
    Languageeng - English
    CountryPT - Portugal
    Keywordsexploitation ; exploration ; adaptive systems ; Bayesian estimation ; fully probabilistic design ; Markov decision process
    Subject RIVBC - Control Systems Theory
    OECD categoryComputer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
    R&D ProjectsGA16-09848S GA ČR - Czech Science Foundation (CSF)
    GA18-15970S GA ČR - Czech Science Foundation (CSF)
    Institutional supportUTIA-B - RVO:67985556
    EID SCOPUS85064837601
    DOI10.5220/0007587208570864
    AnnotationAdaptive decision making learns an environment model serving a design of a decision policy. The policy-generated actions influence both the acquired reward and the future knowledge. The optimal policy properly balances exploitation with exploration. The inherent dimensionality curse of decision making under incomplete knowledge prevents the realisation of the optimal design. This has stimulated repetitive attempts to reach this balance at least approximately. Usually, either: (a) the exploitative reward is enriched by a part reflecting the exploration quality and a feasible approximate certainty-equivalent design is made, or (b) an explorative random noise is added to the purely exploitative actions. This paper avoids the inauspicious (a) and improves (b) by employing the non-standard fully probabilistic design (FPD) of decision policies, which naturally generates random actions. Monte-Carlo experiments confirm its achieved quality. The quality stems from methodological contributions, which include: (i) an improvement of the relation between FPD and standard Markov decision processes, (ii) a design of an adaptive tuning of an FPD-parameter. The latter also suits for the tuning of the temperature in both simulated annealing and Boltzmann’s machine.
    WorkplaceInstitute of Information Theory and Automation
    ContactMarkéta Votavová, votavova@utia.cas.cz, Tel.: 266 052 201.
    Year of Publishing2020
Number of the records: 1  

  This site uses cookies to make them easier to browse. Learn more about how we use cookies.