Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering

We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Question Answering about named Entities (KVQAE). KVQAE is a recently introduced task that consists in answering questions about named entities grounded in a visual context using a Knowledge Base. Therefore, the interaction between the modalities is paramount to retrieve information and must be captured with complex fusion models. As these models require a lot of training data, we design this pre-training task from existing work in textual Question Answering. It consists in considering a sentence as a pseudo-question and its context as a pseudo-relevant passage and is extended by considering images near texts in multimodal documents. Our method is applicable to different neural network architectures and leads to a 9% relative-MRR and 15% relative-F1 gain for retrieval and reading comprehension, respectively, over a no-pre-training baseline.

Mots clés

Visual Question Answering Pre-training Multimodal Fusion

Domaines

Recherche d'information [cs.IR] Multimédia [cs.MM] Machine Learning [stat.ML]

Fichier principal

ecir-2023-vf-authors.pdf (3.67 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité

Paul Lerner : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03933089

Soumis le : vendredi 15 décembre 2023-18:05:29

Dernière modification le : mardi 3 septembre 2024-11:16:05

Dates et versions

hal-03933089 , version 1 (10-01-2023)

hal-03933089 , version 2 (15-12-2023)

Licence

Paternité

Identifiants

HAL Id : hal-03933089 , version 2

Citer

Paul Lerner, Olivier Ferret, Camille Guinaudeau. Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering. European Conference on Information Retrieval (ECIR 2023), Apr 2023, Dublin, Ireland. ⟨hal-03933089v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA CNRS INRIA CENTRALESUPELEC DRT GENCI CEA-UPSAY UNIV-PARIS-SACLAY LIST ANR LISN GS-COMPUTER-SCIENCE GS-SPORT-HUMAN-MOVEMENT LISN-TLP

449 Consultations

134 Téléchargements