Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

Hannes Eriksson; Tommy Tram; Debabrota Basu; Mina Alibeigi; Christos Dimitrakakis

doi:10.5555/3635637.3662902

Communication Dans Un Congrès Année : 2024

Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

(1, 2) , (2, 3) , (4, 5, 6) , (2) , (7, 8)

1
2
3
4
5
6
7
8

Hannes Eriksson

Fonction : Auteur

Chalmers University of Technology [Gothenburg, Sweden]

Zenseact AB

Tommy Tram

Fonction : Auteur

Zenseact AB

Chalmers University of Technology [Göteborg]

Debabrota Basu

Fonction : Auteur
PersonId : 742129
IdHAL : debabrota-basu

Scool

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Université de Lille

Mina Alibeigi

Fonction : Auteur

Zenseact AB

Christos Dimitrakakis

Fonction : Auteur

Université de Neuchâtel = University of Neuchatel

University of Oslo

Résumé

In this paper, we study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP. We refer to it as \textit{Model Transfer Reinforcement Learning (MTRL)} problem. First, we formulate MTRL for discrete MDPs and Linear Quadratic Regulators (LQRs) with continuous state actions. Then, we propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings. In the first stage, MLEMTRL uses a \textit{constrained Maximum Likelihood Estimation (MLE)}-based approach to estimate the target MDP model using a set of known MDP models. In the second stage, using the estimated target MDP model, MLEMTRL deploys a model-based planning algorithm appropriate for the MDP class. Theoretically, we prove worst-case regret bounds for MLEMTRL both in realisable and non-realisable settings. We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance depending on the similarity of the available MDPs and the target MDP.

Mots clés

Reinforcement Learning RL Markov decision process MDP Maximum likelihood estimation Transfer learning TL Realizability Linear quadratic regulator LQR

Domaines

Apprentissage [cs.LG] Intelligence artificielle [cs.AI] Recherche opérationnelle [math.OC] Systèmes et contrôle [cs.SY]

Debabrota Basu : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04260795

Soumis le : jeudi 26 octobre 2023-16:24:22

Dernière modification le : mercredi 20 novembre 2024-11:52:18

Dates et versions

hal-04260795 , version 1 (26-10-2023)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-04260795 , version 1
ARXIV : 2302.09273
DOI : 10.5555/3635637.3662902

Citer

Hannes Eriksson, Tommy Tram, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis. Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer. 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2024, Auckland, New Zealand. pp.516-524, ⟨10.5555/3635637.3662902⟩. ⟨hal-04260795⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 TDS-MACS UNIV-LILLE CRISTAL-SCOOL ANR PEPR_IA

64 Consultations

0 Téléchargements

Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager