Pull your treebank up by its own bootstraps

Ziqian Peng; Kim Gerdes; Kirian Guiller

Communication Dans Un Congrès Année : 2022

Pull your treebank up by its own bootstraps

(1) , (1) , (2)

1
2

Ziqian Peng

Fonction : Auteur
PersonId : 1298309
IdHAL : ziqian-peng

Laboratoire Interdisciplinaire des Sciences du Numérique

Kim Gerdes

Fonction : Auteur
PersonId : 1184591

Laboratoire Interdisciplinaire des Sciences du Numérique

Kirian Guiller

Fonction : Auteur

Modèles, Dynamiques, Corpus

Résumé

We analyze the performance of recent neural syntactic parsers in the task of bootstrapping a treebank, i.e. training and analyzing iteratively in order to enhance speed and quality of the human syntactic analysis. By conducting an extensive and heuristically guided search in the vast grid of options (parser, embedding, configuration, epochs, batch size, size of training set, annotation scheme, language, evaluation method…), we determine the best performing parser configurations: UDify and Trankit share the podium depending on the size of the training set. We also show how these results are integrated into the annotation tool ArboratorGrew, and we propose some preliminary measures that allow predicting the quality of the parse for a new language.

Mots clés

treebanks annotation syntactic parsers neural networks bootstrapping underresourced languages.

Domaines

Informatique et langage [cs.CL]

Fichier principal

504.pdf (2.47 Mo)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Yannick Parmentier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03846834

Soumis le : lundi 14 novembre 2022-18:56:42

Dernière modification le : vendredi 22 novembre 2024-10:12:12

Dates et versions

hal-03846834 , version 1 (14-11-2022)

Identifiants

HAL Id : hal-03846834 , version 1

Citer

Ziqian Peng, Kim Gerdes, Kirian Guiller. Pull your treebank up by its own bootstraps. Journées Jointes des Groupements de Recherche Linguistique Informatique, Formelle et de Terrain (LIFT) et Traitement Automatique des Langues (TAL), Nov 2022, Marseille, France. pp.139-153. ⟨hal-03846834⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA MODYCO CENTRALESUPELEC UNIV-PARIS-SACLAY UNIV-PARIS-LUMIERES LISN UNIV-PARIS-NANTERRE GS-COMPUTER-SCIENCE LISN-TLP

249 Consultations

115 Téléchargements

Pull your treebank up by its own bootstraps

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager