What is best for Spoken Language Understanding: Small but Task-dependant Embeddings or Huge but Out-of-domain Embeddings? - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

What is best for Spoken Language Understanding: Small but Task-dependant Embeddings or Huge but Out-of-domain Embeddings?

Résumé

Word embeddings are shown to be a great asset for several Natural Language and Speech Processing tasks. While they are already evaluated on various NLP tasks, their evaluation on spoken or natural language understanding (SLU) is less studied. The goal of this study is two-fold: firstly, it focuses on semantic evaluation of common word embeddings ap- proaches for SLU task; secondly, it investigates the use of two different data sets to train the embeddings: small and task-dependent corpus or huge and out-of-domain corpus. Experiments are carried out on 5 benchmark corpora (ATIS, SNIPS, SNIPS70, M2M, MEDIA), on which a relevance ranking was proposed in the literature. Interestingly, the per- formance of the embeddings is independent of the difficulty of the corpora. Moreover, the embeddings trained on huge and out-of-domain corpus yields to better results than the ones trained on small and task-dependent corpus.
Fichier principal
Vignette du fichier
isSLUEmb-6.pdf (427.86 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02503694 , version 1 (10-03-2020)

Identifiants

Citer

Sahar Ghannay, Antoine Neuraz, Sophie Rosset. What is best for Spoken Language Understanding: Small but Task-dependant Embeddings or Huge but Out-of-domain Embeddings?. 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020), May 2020, Barcelona, Spain. pp.8114-8118, ⟨10.1109/ICASSP40776.2020.9053278⟩. ⟨hal-02503694⟩
115 Consultations
282 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More