License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.06051.8
URN: urn:nbn:de:0030-drops-6372
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2006/637/
Go to the corresponding Portal |
Ryabko, Daniil ;
Hutter, Marcus
Learning in Reactive Environments with Arbitrary Dependence
Abstract
In reinforcement learning the task
for an agent is to attain the best possible asymptotic reward
where the true generating environment is unknown but belongs to a
known countable family of environments.
This task generalises the sequence prediction problem, in which
the environment does not react to the behaviour of the agent.
Solomonoff induction solves the sequence prediction problem
for any countable class of measures; however, it is easy to see
that such result is impossible for reinforcement learning - not any
countable class of environments can be learnt.
We find some sufficient conditions
on the class of environments under
which an agent exists which attains the best asymptotic reward
for any environment in the class. We analyze how tight these conditions are and how they
relate to different probabilistic assumptions known in
reinforcement learning and related fields, such as Markov
Decision Processes and mixing conditions.
BibTeX - Entry
@InProceedings{ryabko_et_al:DagSemProc.06051.8,
author = {Ryabko, Daniil and Hutter, Marcus},
title = {{Learning in Reactive Environments with Arbitrary Dependence}},
booktitle = {Kolmogorov Complexity and Applications},
pages = {1--15},
series = {Dagstuhl Seminar Proceedings (DagSemProc)},
ISSN = {1862-4405},
year = {2006},
volume = {6051},
editor = {Marcus Hutter and Wolfgang Merkle and Paul M.B. Vitanyi},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2006/637},
URN = {urn:nbn:de:0030-drops-6372},
doi = {10.4230/DagSemProc.06051.8},
annote = {Keywords: Reinforcement learning, asymptotic average value, self-optimizing policies, (non) Markov decision processes}
}
Keywords: |
|
Reinforcement learning, asymptotic average value, self-optimizing policies, (non) Markov decision processes |
Collection: |
|
06051 - Kolmogorov Complexity and Applications |
Issue Date: |
|
2006 |
Date of publication: |
|
31.07.2006 |