    A novel concept-level approach for ultra-concise opinion summarization

    The Web 2.0 has resulted in a shift as to how users consume and interact with the information, and has introduced a wide range of new textual genres, such as reviews or microblogs, through which users communicate, exchange, and share opinions. The exploitation of all this user-generated content is of great value both for users and companies, in order to assist them in their decision-making processes. Given this context, the analysis and development of automatic methods that can help manage online information in a quicker manner are needed. Therefore, this article proposes and evaluates a novel concept-level approach for ultra-concise opinion abstractive summarization. Our approach is characterized by the integration of syntactic sentence simplification, sentence regeneration and internal concept representation into the summarization process, thus being able to generate abstractive summaries, which is one the most challenging issues for this task. In order to be able to analyze different settings for our approach, the use of the sentence regeneration module was made optional, leading to two different versions of the system (one with sentence regeneration and one without). For testing them, a corpus of 400 English texts, gathered from reviews and tweets belonging to two different domains, was used. Although both versions were shown to be reliable methods for generating this type of summaries, the results obtained indicate that the version without sentence regeneration yielded to better results, improving the results of a number of state-of-the-art systems by 9%, whereas the version with sentence regeneration proved to be more robust to noisy data.This research work has been partially funded by the University of Alicante, Generalitat Valenciana, Spanish Government and the European Commission through the projects, “Tratamiento inteligente de la información para la ayuda a la toma de decisiones” (GRE12-44), “Explotación y tratamiento de la información disponible en Internet para la anotación y generación de textos adaptados al usuario” (GRE13-15), DIIM2.0 (PROMETEOII/2014/001), ATTOS (TIN2012-38536-C03-03), LEGOLANG-UAGE (TIN2012-31224), SAM (FP7-611312), and FIRST (FP7-287607)

    Webometrics benefitting from web mining? An investigation of methods and applications of two research fields

    Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms

    Sentiment analysis in arabic: opinion polarity detection

    Con Mención de Doctorado Internacional[ES]El análisis de sentimientos está obteniendo una gran importancia debido al aumento de popularidad de la web 2.0. Esta memoria se centra en el estudio de diferentes aspectos del análisis de sentimientos. El primer objetivo es analizar las opiniones que provienen del árabe y predecir su polaridad. Para alcanzar este objetivo se han generado dos corpora: OCA y EVOCA. OCA es un corpus de opinión de películas en árabe, y EVOCA es un corpus paralelo a OCA que incluye la traducción al inglés de las opiniones. Otro objetivo consiste en el análisis de sentimientos adaptado a diferentes dominios. Para ello, se ha generado el corpus SINAI-SA y se han aplicado distintas técnicas de aprendizaje automático. Finalmente, en esta memoria se realiza un estudio sobre revisiones neutrales. Para llevar a cabo este objetivo, se han investigado dos enfoque principales, uno basado en orientación semántica y el otro basado en algoritmos de aprendizaje automático como SVM o NB.[EN]Sentiment analysis is becoming increasingly important due the growing popularity of Web 2.0. This study focuses mainly on how to analyze opinions in Arabic language and predict their polarity. To achieve that, two corpora have been generated (OCA and EVOCA), OCA is an opinion corpus for Arabic movie reviews, while EVOCA is the translated version of OCA to English. Another corpus was created (SINAI-SA corpus) used with other corpora in order to predict sentiments in different domains. SINAI corpus was also used to study how to sort comments behave as textual information for the prediction of customer rates. Another question that was solved in this study is “How to treat with the neutral reviews”. Two main approaches have been investigated in this research, one based on semantic orientation and the other one based on machine learning algorithms like SVM or NBTesis Univ. Jaén. Departamento de Informática, leída el 7 de octubre de 201

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Implementation of a facebook crawler for opinion monitoring and trend analysis purposes: a case study of government service delivery in Dwesa

    The Internet has shifted from the Web 1.0 era to the Web 2.0 era. In the contemporary era of web 2.0, the Internet is being used to build and reflect social relationships among people who share similar interests and activities. This is done through services such as Social Networking Sites (Facebook, Twitter etc.) and the web blogs. Currently, there is a very high usage of Social Networking Sites (SNSs) and blogs where people share their views, opinions, and thoughts. This leads to the production of a lot of data by people who post such content on SNSs. As a result, SNSs and blogs become the ideal platforms for opinion monitoring and the trend analysis. These SNSs and Blogs could be used by service providers for tracking what the public thinks or requires. The reason being, having such knowledge can help in decision making and future planning. If service providers can keep track of such views, opinions or thoughts with regard to the services they provide, they can better their understanding about the public or clients’ needs and improve the provision of relevant services. This research project presents a system prototype for performing opinion monitoring and trend analysis on Facebook. The proposed system crawl Facebook, indexes the data and provides user interface (UI) where end users can search and see the trending of a topics of their choice. The system prototype could also be used to check the trending topics without having to search. The main objective of this research project was to develop a framework that will contribute in improving the way government officials, companies or any service providers and normal citizens communicate regarding services they provide. This research project is premised on the conceptualization that if the government officials, companies or any service providers can keep track of the citizen’s opinions, views and thoughts with regards to services they provide it can help improve the delivery of such services. This research and the implementation of the trend analysis tool is undertaken in the context of the Siyakhula Living Lab (SLL), an Information and Communication Technologies for Development (ICTD) intervention for Dwesa marginalized community

    Towards the automatic analsis of sentiments in Basque: the creation of basic resources and the identification of valence shifters in different language levels

    243 p. (eusk) 139 p. (eng.)Tesi-lan honetan, hizkuntzalaritza aplikatuaren ikuspegitik, euskarazko sentimendu analisian lehenurratsak egin dira. Bi helburu nagusi egon dira tesi-proiektuan. Alde batetik, sentimendu analisia egitekooinarrizko baliabideak sortu ditugu euskararentzat. Zehatz esanda, Euskarazko Iritzi Corpusa, Sentitegiizeneko euskarazko sentimendu lexikoia eta dokumentu-mailako sentimendu sailkatzailea garatu ditugu.Corpusak sei domeinuetako 240 iritzi-testu biltzen ditu. RST hurbilpenaz baliatuta, corpusekodiskurtso-informazioa etiketatuta dago. Gainera, iritzi-testuen orientazio semantikoa ere etiketatuta dago.Sentimendu lexikoiari dagokionez, 1.237 hitzez osatuta dago eta bertako sarrerek -5 eta +5 artekosentimendu balentzia dute. Sentimendu lexikoia sortzeko itzulpen metodologia zehatz bat jarraitu dugu.Azkenik, dokumentu mailako sentimendu sailkatzailea ere garatu dugu. Tresnaren oinarrian aurretikaipatu dugu sentimendu lexikoia dago eta, horretaz gain, baditu beste zenbait erregela ere.Beste aldetik, sentimendu analisiaren lanketa teoriko bat ere egin dugu. Sentimendu sailkapena lexikoianoinarrituz egin nahi bada, hitzen sentimendu balentzia jakitearekin ez da nahikoa, izan ere, testuetanbadaude zenbait fenomeno hitz horien sentimendu balentzia eragiten dutenak. Horiei testuinguruzkobalentzia aldatzaileak deitzen zaie eta horiek euskaran nola agertzen diren landu dugu. Gramatika mailabakoitzeko balentzia aldatzaile mota bat landu dugu: fonologian, bustidura adierazkorra; morfologian,morfemak; sintaxian, ezeztapen-markak eta, azkenik, diskurtsoan, diskurtso erlazioak eta unitate zentrala.Emaitzek erakusten dutenez, balentzia aldatzaileek hitzen edo sintagmen sentimendu balentzia indartuedo ahuldu egiten dute. Ahultze horren intentsitatearen arabera, sentimendu balentziaren zeinuan aldaketagerta liteke, positiboa dena negatibo bilakatuz edo alderantziz. Azkenik, kasu batzuetan, balentziaaldatzaileak ez du eraginik sortzen

    Cartoons as interdiscourse : a quali-quantitative analysis of social representations based on collective imagination in cartoons produced after the Charlie Hebdo attack

    The attacks against Charlie Hebdo in Paris at the beginning of the year 2015 urged many cartoonists – most professionals but some laymen as well – to create cartoons as a reaction to this tragedy. The main goal of this article is to show how traumatic events like this one can converge in a rather limited set of metaphors, ranging from easily recognizable topoi to rather vague interdiscourses that circulate in contemporary societies. To do so, we analyzed 450 cartoons that were produced as a reaction to the Charlie Hebdo attacks, and took a quali-quantitative approach that draws both on discourse analysis and semiotics. In this paper, we identified eight main themes and we analyzed the five ones which are anchored in collective imagination (the pen against the sword, the journalist as a modern hero, etc.). Then, we studied the cartoons at figurative, narrative and thematic levels thanks to Greimas’ model of the semiotic square. This paper shows the ways in which these cartoons build upon a memory-based network of events from the recent past (particularly 9/11), and more generally on a collective imagination which can be linked to Western values.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Essays on information arrival and asset prices

    This thesis consists of three empirical papers focusing on the impact of different types of information on investor behaviour and the consequent influence on stock market performance. The first chapter explores the effect of actions taken by the regulator in relation to firms’ law violations on a firm’s stock performance. The results of this study suggest that the announcements made by the Capital Markets Authority (“CMA”) toward firms violating the law have negative and significant effects on the firms’ stock performances. In particular, firms announced to be under investigation experience a more severe impact than those that are the subject of sanction announcements. The second chapter explores the influence of newspaper article sentiment on investors’ trading behaviours. The main results show that financial news articles and their sentiments have significant effects on stock performance indicators. Particularly, Polarity score has significant positive effects on stock returns and a significant negative impact on stock volatility. While the Difficulty and Subjectivity scores have positive and negative impacts on stock returns, respectively, both have a limited impact on stock volatility. Finally, the third chapter reveals the impact of sports events on the nation’s stock market indices. I find that the results of football rivalry matches have a significant impact on the stock market indices of participating countries. Specifically, the result of a national football match positively (negatively) affects the performance of the winning (losing) country’s stock market index. Furthermore, the magnitude of the impact also depends on the characteristics of the game. The results of this investigation show that the victories in rival matches have a greater positive impact on stock returns than non-rival matches. Similarly, the stock market of the country which suffers a loss in a rival match often experiences a larger negative effect from the match, compared to that of a country that loses in a non-rival match

    Evaluating the robustness of EmotiBlog for sentiment analysis and opinion mining

    Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task