Search CORE

195 research outputs found

Named Entity Recognition on Turkish Tweets

Author: JACQUET GUILLAUME
KUCUK DILEK
STEINBERGER Ralf
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 04/10/2013
Field of study

Various recent studies show that the performance of named entity recognition (NER) systems developed for well-formed text types drops significantly when applied to tweets. The only existing study for the highly inflected agglutinative language Turkish reports a drop in F-Measure from 91% to 19% when ported from news articles to tweets. In this study, we present a new named entity-annotated tweet corpus and a detailed analysis of the various tweet-specific linguistic phenomena. We perform comparative NER experiments with a rule-based multilingual NER system adapted to Turkish on three corpora: a news corpus, our new tweet corpus, and another tweet corpus. Based on the analysis and the experimentation results, we suggest system features required to improve NER results for social media like Twitter.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

JRC-Names: Multilingual Entity Name variants and titles as Linked Data

Author: EHRMANN Maud
JACQUET GUILLAUME
STEINBERGER Ralf
Publication venue: 'IOS Press'
Publication date: 30/04/2015
Field of study

Since 2004 the European Commission’s Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities (persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyamín/Biniamin/Беньямин/ بنیامین Netanyahu/ Netanjahu/Nétanyahou/Netahnyahu/Нетаньяху/ نتنیاهو ). This entity name variant data, known as JRCNames, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Union’s Open Data Portal.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

JRC-Names: Multilingual Entity Name variants and titles as Linked Data

Author: Ehrmann Maud
Jacquet Guillaume
Steinberger Ralf
Publication venue: 'IOS Press'
Publication date: 20/05/2016
Field of study

Since 2004 the European Commission's Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities (persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyam'in/Biniamin/Беньямин/بنيامين Netanyahu/Netanjahu/N\'{e}tanyahou/Netahny/Нетаньяху/\نتنياهو). This entity name variant data, known as JRC-Names, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Union's Open Data Portal

Infoscience - École polytechnique fédérale de Lausanne

Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms

Author: Ehrmann Maud
Jacquet Guillaume
Steinberger Ralf
Väyrynen Jaakko
Publication venue: European Language Resources Association (ELRA)
Publication date: 20/05/2016
Field of study

This paper reports on an approach and experiments to automatically build a cross-lingual multi-word entity resource. Starting from a collection of millions of acronym/expansion pairs for 22 languages where expansion variants were grouped into monolingual clusters, we experiment with several aggregation strategies to link these clusters across languages. Aggregation strategies make use of string similarity distances and translation probabilities and they are based on vector space and graph representations. The accuracy of the approach is evaluated against Wikipedia's redirection and cross-lingual linking tables. The resulting multi-word entity resource contains 64,000 multi-word entities with unique identifiers and their 600,000 multilingual lexical variants. We intend to make this new resource publicly available

Infoscience - École polytechnique fédérale de Lausanne

Modeling of high-feed milling and surface quality applied to Inconel 718

Author: FROMENTIN Guillaume
JACQUET Thomas
PRAT David
VIPREY Fabien
Publication venue
Publication date: 17/05/2024
Field of study

In modern manufacturing, machining remains a vital process for complex mechanical components. In particular, the aerospace industry extensively employs high-feed milling techniques to machine complex geometries from nickel-based superalloys. This study focuses on the analysis and modeling of high-feed milling for Inconel 718 in 2.5-axis machining. Its objective is to develop a generalized model of high-feed milling that enables the prediction of surface topography. The proposed model integrates crucial geometric parameters of the tool and its exact kinematic within the machine, along with tool and machine deflections caused by cutting forces. A key novelty of this research lies in its capability to determine surface topography and its quality based on a generalized model, representing significant progress in the field of high-feed milling. To validate the model, experimental efforts are measured to characterize the cutting forces and system deflections during machining. The developed approach demonstrates its ability to model surface topography and to predict surface roughness. It also highlights the influence of tool and machine deflection on surface quality. This research contributes to the advancement of the application of high-feed milling in aerospace manufacturing by enhancing machining capabilities and improving part quality

SAM : Science Arts et Métiers

Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts

Author: BALAHUR DOBRESCU ALEXANDRA
EL GHALI ADIL
JACQUET GUILLAUME
KUCUK DILEK
PEREA ORTEGA JOSE MANUEL
STEINBERGER Ralf
TURCHI Marco
ZAVARELLA Vanni
Publication venue: European Language Resources Association
Publication date: 22/10/2013
Field of study

Sentiment analysis (SA) regards the classification of texts according to the polarity of the opinions they express. SA systems are highly relevant to many real-world applications (e.g. marketing, eGovernance, business intelligence, behavioral sciences) and also to many tasks in Natural Language Processing (NLP) – information extraction, question answering, textual entailment, to name just a few. The importance of this field has been proven by the high number of approaches proposed in research, as well as by the interest that it raised from other disciplines and the applications that were created using its technology. In our case, the primary focus is to use sentiment analysis in the context of media monitoring, to enable tracking of global reactions to events. The main challenge that we face is that tweets are written in different languages and an unbiased system should be able to deal with all of them, in order to process all (possible) available data. Unfortunately, although many linguistic resources exist for processing texts written in English, for many other languages data and tools are scarce. Following our initial efforts described in (Balahur and Turchi, 2013), in this article we extend our study on the possibility to implement a multilingual system that is able to a) classify sentiment expressed in tweets in various languages using training data obtained through machine translation; b) to verify the extent to which the quality of the translations influences the sentiment classification performance, in this case, of highly informal texts; and c) to improve multilingual sentiment classification using small amounts of data annotated in the target language. To this aim, varying sizes of target language data are tested. The languages we explore are: Arabic, Turkish, Russian, Italian, Spanish, German and French.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Representation and parsing of multiword expressions: Current trends

This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

Language Science Press

Representation and parsing of multiword expressions: Current trends

Language Science Press