Search CORE

1,591 research outputs found

Preparing, restructuring, and augmenting a French treebank: lexicalised parsers or coherent treebanks?

Author: Schluter Natalie
van Genabith Josef
Publication venue
Publication date: 01/01/2007
Field of study

We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from the Paris 7 Treebank (P7T), which is cleaner, more coherent, has several transformed structures, and introduces new linguistic analyses. To determine the effect of these changes, we investigate how theMFT fares in statistical parsing. Probabilistic parsers trained on the MFT training set (currently 3800 trees) already perform better than their counterparts trained on five times the P7T data (18,548 trees), providing an extreme example of the importance of data quality over quantity in statistical parsing. Moreover, regression analysis on the learning curve of parsers trained on the MFT lead to the prediction that parsers trained on the full projected 18,548 tree MFT training set will far outscore their counterparts trained on the full P7T. These analyses also show how problematic data can lead to problematic conclusions–in particular, we find that lexicalisation in the probabilistic parsing of French is probably not as crucial as was once thought (Arun and Keller (2005))

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Expanding Chinese sentiment dictionaries from large scale unlabeled corpus

Author: Hu Changjian
Qiu Likun
Xu Hongzhi
Zhao Kai
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Summarizing Product Reviews Using Dynamic Relation Extraction

Author: Gråborg Mikael
Handmark Oskar
Publication venue: Lunds universitet/Institutionen för datavetenskap
Publication date: 01/01/2016
Field of study

The accumulated review data for a single product on Amazon.com could po- tentially take several weeks to examine manually. Computationally extracting the essence of a document is a substantial task, which has been explored pre- viously through many different approaches. We explore how statistical predic- tion can be used to perform dynamic relation extraction. Using patterns in the syntactic structure of a sentence, each word is classified as either product fea- ture or descriptor, and then linked together by association. The classifiers are trained with a manually annotated training set and features from dependency parse trees produced by the Stanford CoreNLP library. In this thesis we compare the most widely used machine learning algo- rithms to find the one most suitable for our scenario. We ultimately found that the classification step was most successful with SVM, reaching an FS- core of 80 percent for the relation extraction classification step. The results of the predictions are presented in a graphical interface displaying the relations. An end-to-end evaluation was also conducted, where our system achieved a relaxed recall of 53.35%

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

Author
Publication venue: Croatian Language Technologies Society, Faculty of Humanities and Social Science
Publication date: 01/01/2010
Field of study

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing

Author: Kübler Sandra
Piskorski Jakub
Przepiorkowski Adam
Publication venue
Publication date: 03/11/2008
Field of study

All that glitters...: Interannotator agreement in natural language processing

Author: Borin Lars
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 30/08/2022
Field of study

Evaluation has emerged as a central concern in natural language processing (NLP) over the last few decades. Evaluation is done against a gold standard, a manually linguistically annotated dataset, which is assumed to provide the ground truth against which the accuracy of the NLP system can be assessed automatically. In this article, some methodological questions in connection with the creation of gold standard datasets are discussed, in particular (non-)expectations of linguistic expertise in annotators and the interannotator agreement measure standardly but unreflectedly used as a kind of quality index of NLP gold standards

Septentrio Academic Publishing