Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging - Traitement du Langage Parlé
Communication Dans Un Congrès Année : 2022

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

Résumé

Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many lowresource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-ofthe-art for unsupervised POS tagging of low resource languages.
Fichier principal
Vignette du fichier
2210.09840.pdf (527.74 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03832874 , version 1 (28-10-2022)

Identifiants

  • HAL Id : hal-03832874 , version 1

Citer

Ayyoob Imani, Silvia Severini, Masoud Jalili Sabet, François Yvon, Hinrich Schütze. Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging. Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Dec 2022, Abu Dhabi, United Arab Emirates. ⟨hal-03832874⟩
226 Consultations
56 Téléchargements

Partager

More