Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs

Chao Han; Debabrota Basu; Michael Mangan; Eleni Vasilaki; Aditya Gilra

Pré-Publication, Document De Travail Année : 2024

Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs

(1) , (2, 3, 4, 5, 6) , (7) , (7) , (8)

1
2
3
4
5
6
7
8

Chao Han

Fonction : Auteur

University Hospital LMU Munich

Debabrota Basu

Fonction : Auteur
PersonId : 742129
IdHAL : debabrota-basu

Scool

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Inria Lille - Nord Europe

Université de Lille

Centrale Lille

Michael Mangan

Fonction : Auteur

University of Sheffield [Sheffield]

Eleni Vasilaki

Fonction : Auteur

University of Sheffield [Sheffield]

Aditya Gilra

Fonction : Auteur

Centrum Wiskunde & Informatica

Résumé

Learning representations of underlying environmental dynamics from partial observations is a critical challenge in machine learning. In the context of Partially Observable Markov Decision Processes (POMDPs), state representations are often inferred from the history of past observations and actions. We demonstrate that incorporating future information is essential to accurately capture causal dynamics and enhance state representations. To address this, we introduce a Dynamical Variational Auto-Encoder (DVAE) designed to learn causal Markovian dynamics from offline trajectories in a POMDP. Our method employs an extended hindsight framework that integrates past, current, and multi-step future information within a factored-POMDP setting. Empirical results reveal that this approach uncovers the causal graph governing hidden state transitions more effectively than history-based and typical hindsight-based models.

Mots clés

Reinforcement Leaning RL Partially observable Markov decision process POMDP Factored Partially Observable Markov Decision Process FPOMDP Causal Inference Variation autoencoder Causal structure learning Dynamical system

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Systèmes et contrôle [cs.SY] Systèmes dynamiques [math.DS]

Debabrota Basu : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04785076

Soumis le : vendredi 15 novembre 2024-11:36:10

Dernière modification le : samedi 16 novembre 2024-03:30:12

Dates et versions

hal-04785076 , version 1 (15-11-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04785076 , version 1
ARXIV : 2411.07832

Citer

Chao Han, Debabrota Basu, Michael Mangan, Eleni Vasilaki, Aditya Gilra. Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs. 2024. ⟨hal-04785076⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 TDS-MACS UNIV-LILLE CRISTAL-SCOOL ANR PEPR_IA

0 Consultations

0 Téléchargements

Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager