5 research outputs found
Introducción a la tarea compartida Tweet-Norm 2013: Normalización léxica de tuits en español
En este artÃculo se presenta una introducción a la tarea Tweet-Norm 2013 : descripción, corpora, anotación, preproceso, sistemas presentados y resultados
obtenidos.Postprint (published version
TweetNorm: a benchmark for lexical normalization of spanish tweets
The language used in social media is often characterized by the abundance of informal and non-standard writing. The normalization of this non-standard language can be crucial to facilitate the subsequent textual processing and to consequently help boost the performance of natural language processing tools applied to social media text. In this paper we present a benchmark for lexical normalization of social media posts, specifically for tweets in Spanish language. We describe the tweet normalization challenge we organized recently, analyze the performance achieved by the different systems submitted to the challenge, and delve into the characteristics of systems to identify the features that were useful. The organization of this challenge has led to the production of a benchmark for lexical normalization of social media, including an evaluation framework, as well as an annotated corpus of Spanish tweets-TweetNorm_es-, which we make publicly available. The creation of this benchmark and the evaluation has brought to light the types of words that submitted systems did best with, and posits the main shortcomings to be addressed in future work.Postprint (published version
EVALITA Evaluation of NLP and Speech Tools for Italian Proceedings of the Final Workshop
Editor of the proceedings of EVALITA 2016
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations