Towards Multilingual Interlinear Morphological Glossing - Traitement du Langage Parlé
Communication Dans Un Congrès Année : 2023

Towards Multilingual Interlinear Morphological Glossing

Shu Okabe
François Yvon

Résumé

Interlinear Morphological Glosses are annotations produced in the context of language documentation. Their goal is to identify morphs occurring in an L1 sentence and to explicit their function and meaning, with the further support of an associated translation in L2. We study here the task of automatic glossing, aiming to provide linguists with adequate tools to facilitate this process. Our formalisation of glossing uses a latent variable Conditional Random Field (CRF), which labels the L1 morphs while simultaneously aligning them to L2 words. In experiments with several under-resourced languages, we show that this approach is both effective and data-efficient and mitigates the problem of annotating unknown morphs. We also discuss various design choices regarding the alignment process and the selection of features. We finally demonstrate that it can benefit from multilingual (pre-)training, achieving results which outperform very strong baselines.
Fichier principal
Vignette du fichier
2023.findings-emnlp.396.pdf (354.39 Ko) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte
licence

Dates et versions

hal-04357157 , version 1 (21-12-2023)

Licence

Identifiants

Citer

Shu Okabe, François Yvon. Towards Multilingual Interlinear Morphological Glossing. 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Dec 2023, Singapore, Singapore. pp.5958-5971, ⟨10.18653/v1/2023.findings-emnlp.396⟩. ⟨hal-04357157⟩
229 Consultations
181 Téléchargements

Altmetric

Partager

More