9 research outputs found

    New features for sentiment analysis: Do sentences matter?

    Get PDF
    1st International Workshop on Sentiment Discovery from Affective Data 2012, SDAD 2012 - In Conjunction with ECML-PKDD 2012; Bristol; United Kingdom; 28 September 2012 through 28 September 2012In this work, we propose and evaluate new features to be used in a word polarity based approach to sentiment classification. In particular, we analyze sentences as the first step before estimating the overall review polarity. We consider different aspects of sentences, such as length, purity, irrealis content, subjectivity, and position within the opinionated text. This analysis is then used to find sentences that may convey better information about the overall review polarity. The TripAdvisor dataset is used to evaluate the effect of sentence level features on polarity classification. Our initial results indicate a small improvement in classification accuracy when using the newly proposed features. However, the benefit of these features is not limited to improving sentiment classification accuracy since sentence level features can be used for other important tasks such as review summarization.European Commission, FP7, under UBIPOL (Ubiquitous Participation Platform for Policy Making) Projec

    A Fine-grained Multilingual Analysis Based on the Appraisal Theory: Application to Arabic and English Videos

    Get PDF
    International audienceThe objective of this paper is to compare the opinions of two videos in two different languages. To do so, a fine-grained approach inspired from the appraisal theory is used to analyze the content of the videos that concern the same topic. In general, the methods devoted to sentiment analysis concern the study of the polarity of a text or an utterance. The appraisal approach goes further than the basic polarity sentiments and consider more detailed sentiments by covering additional attributes of opinions such as: Attitude, Graduation and Engagement. In order to achieve such a comparison, in AMIS (Chist-Era project), we collected a corpus of 1503 Arabic and 1874 English videos. These videos need to be aligned in order to compare their contents, that is why we propose several methods to make them comparable. Then the best one is selected to align them and to constitute the data-set necessary for the fine-grained sentiment analysis

    Amnestic Forgery: an Ontology of Conceptual Metaphors

    Full text link
    This paper presents Amnestic Forgery, an ontology for metaphor semantics, based on MetaNet, which is inspired by the theory of Conceptual Metaphor. Amnestic Forgery reuses and extends the Framester schema, as an ideal ontology design framework to deal with both semiotic and referential aspects of frames, roles, mappings, and eventually blending. The description of the resource is supplied by a discussion of its applications, with examples taken from metaphor generation, and the referential problems of metaphoric mappings. Both schema and data are available from the Framester SPARQL endpoint

    A knowledge regularized hierarchical approach for emotion cause analysis

    Get PDF
    Emotion cause analysis, which aims to identify the reasons behind emotions, is a key topic in sentiment analysis. A variety of neural network models have been proposed recently, however, these previous models mostly focus on the learning architecture with local textual information, ignoring the discourse and prior knowledge, which play crucial roles in human text comprehension. In this paper, we propose a new method to extract emotion cause with a hierarchical neural model and knowledge-based regularizations, which aims to incorporate discourse context information and restrain the parameters by sentiment lexicon and common knowledge. The experimental results demonstrate that our proposed method achieves the state-of-the-art performance on two public datasets in different languages (Chinese and English), outperforming a number of competitive baselines by at least 2.08% in F-measure

    How to match bilingual tweets ?

    Get PDF
    International audienceIn this paper, we propose a method that aligns comparable bilingual tweets which, not only takes into account the specificity of a Tweet, but treats also proper names, dates and numbers in two different languages. This permits to retrieve more relevant target tweets. The process of matching proper names between Arabic and English is a difficult task, because these two languages use different scripts. For that, we used an approach which projects the sounds of an English proper name into Arabic and aligns it with the most appropriate proper name. We evaluated the method with a classical measure and compared it to the one we developed. The experiments have been achieved on two parallel corpora and shows that our measure outperforms the baseline by 5.6% at R@1 recall

    Measuring the comparability of multilingual corpora extracted from Twitter and others

    Get PDF
    International audienceMultilingual corpora are widely exploited in several tasks of natural language processing, these corpora are principally of two sorts: comparable and parallel corpora. The comparable corpora gather texts in several languages dealing with analogous subjects but are not translations of each other such as in parallel corpora. In this paper, a comparative study on two stemming techniques is conducted in order to improve the comparability measure based on a bilingual dictionary. These methods are: Buckwalter Arabic Morphological Analyzer (BAMA) and a proposed approach based on Light Stemming (LS) adapted specifically to Twitter, then we combined them. We evaluated and compared these techniques on three different (English-Arabic) corpora: a corpus extracted from the social network Twit-ter, Euronews and a parallel corpus extracted from newspapers (ANN). The experimental results show that the best comparability measure is achieved for the combination of BAMA with LS which leads to a similarity of 61% for Twitter, 52% for Euronews and 65% for ANN. For a confidence of 40% we aligned 73.8% of Arabic and English tweets

    Deep Learning for Text Style Transfer: A Survey

    Full text link
    Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task. Our curated paper list is at https://github.com/zhijing-jin/Text_Style_Transfer_SurveyComment: Computational Linguistics Journal 202

    Closed Captions: generador de subtítulos automáticos offline empleando un motor de conversión de voz a texto (STT)

    Full text link
    [ES] Ver películas en el idioma que se estudia es muy beneficioso para el alumno, ya que permite asentar ciertas bases lingüísticas aplicadas en un contexto realista en el que se utiliza lenguaje adecuado a la situación expuesta en el metraje. No obstante, para facilitar la comprensión de la trama por parte del espectador, es importante que exista un refuerzo textual del lenguaje oral utilizado en la película, es decir, subtítulos. Pese a que hoy en día resulta fácil encontrar películas subtituladas, emplear subtítulos en tu idioma materno suele derivar en prestar mayor atención a la lectura que a la escucha, por lo que, si el nivel del alumno lo permite, comenzar a leer los subtítulos en el idioma que se aprende resulta un avance muy interesante en la práctica del idioma. El problema yace en que a menudo, los subtítulos no suelen representar exactamente al audio si no que presentan un significado similar expresado de forma diferente. Para solucionar dicho problema, en este trabajo se pretende desarrollar una aplicación que, mediante un motor Speech-To-Text, procese y transcriba archivos de audio en closed captions con cierto grado de confianza. Dicho texto será transcrito a un archivo SubRip (.srt) con sus correspondientes marcas de tiempo que será reconocido directamente por cualquier reproductor que permita subtitulación, como, por ejemplo, VLC media player.[EN] Watching films in the target language is very beneficial for the learner, as it allows certain linguistic foundations to be established in a realistic context in an appropriate language used to the situation depicted in the film. However, in order to facilitate the viewer's understanding of the plot, it is important that there is textual reinforcement of the spoken language used in the film, i. e. subtitles. Although it is easy to find subtitled films nowadays, using subtitles in your mother tongue often means paying more attention to reading than listening, therefore, if the learner's level allows it, starting to read subtitles in the language being learnt is a very interesting step forward in language practice. The problem lies in the fact that subtitles often do not exactly represent the audio, but rather present a similar meaning expressed in a different way. To solve this problem, this work aims to develop an application that, by means of a Speech-To-Text engine, processes and transcribes audio files into closed captions with a certain degree of confidence. This text will be transcribed to a SubRip (.srt) file with its corresponding time stamps that will be directly recognised by any player that allows subtitling, such as, for example, VLC media player.[CA] Veure pel·lícules en l'idioma que s'estudia és molt beneficiós per a l'alumne, ja que permet assentar certes bases lingüístiques aplicades en un context realista en el qual s'utilitza llenguatge adequat a la situació exposada en el metratge. No obstant això, per facilitar la comprensió de la trama per part de l'espectador, és important que hi haja un reforç textual de la llengua oral utilitzat en la pel·lícula, és a dir, subtítols. Malgrat que hui en dia resulta fàcil trobar pel·lícules subtitulades, emprar subtítols en el teu idioma matern sol derivar en prestar més atenció a la lectura que a l'escolta, de manera que, si el nivell de l'alumne ho permet, començar a llegir els subtítols en l'idioma que s'aprèn resulta un avanç molt interessant en la pràctica de l'idioma. El problema rau en que sovint, els subtítols no solen representar exactament a l'àudio sinó que presenten un significat similar expressat de forma diferent. Per solucionar aquest problema, en aquest treball es pretén desenvolupar una aplicació que, mitjançant un motor Speech-To-Text, processe i transcriga arxius d'àudio en closed captions amb cert grau de confiança. Dit text serà transcrit a un arxiu SubRip (.srt) amb les seues corresponents marques de temps que serà reconegut directament per qualsevol reproductor que permeta subtitulació, com, per exemple, VLC media player.Aibar Armero, J. (2021). Closed Captions: generador de subtítulos automáticos offline empleando un motor de conversión de voz a texto (STT). Universitat Politècnica de València. http://hdl.handle.net/10251/174254TFG

    Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

    Get PDF
    One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.Comment: PhD thesis, Mar 201
    corecore