5,197 research outputs found

    Weakly-Supervised Joint Sentiment-Topic Detection from Text

    Get PDF
    publication-status: Acceptedtypes: ArticleCopyright © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. A reparameterized version of the JST model called Reverse-JST, by reversing the sequence of sentiment and topic generation in the modelling process, is also studied. Although JST is equivalent to Reverse-JST without hierarchical prior, extensive experiments show that when sentiment priors are added, JST performs consistently better than Reverse-JST. Besides, unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when shifting to other domains, the weakly-supervised nature of JST makes it highly portable to other domains. This is verified by the experimental results on datasets from five different domains where the JST model even outperforms existing semi-supervised approaches in some of the datasets despite using no labelled documents. Moreover, the topics and topic sentiment detected by JST are indeed coherent and informative. We hypothesize that the JST model can readily meet the demand of large-scale sentiment analysis from the web in an open-ended fashion

    A modular architecture for systematic text categorisation

    Get PDF
    This work examines and attempts to overcome issues caused by the lack of formal standardisation when defining text categorisation techniques and detailing how they might be appropriately integrated with each other. Despite text categorisation’s long history the concept of automation is relatively new, coinciding with the evolution of computing technology and subsequent increase in quantity and availability of electronic textual data. Nevertheless insufficient descriptions of the diverse algorithms discovered have lead to an acknowledged ambiguity when trying to accurately replicate methods, which has made reliable comparative evaluations impossible. Existing interpretations of general data mining and text categorisation methodologies are analysed in the first half of the thesis and common elements are extracted to create a distinct set of significant stages. Their possible interactions are logically determined and a unique universal architecture is generated that encapsulates all complexities and highlights the critical components. A variety of text related algorithms are also comprehensively surveyed and grouped according to which stage they belong in order to demonstrate how they can be mapped. The second part reviews several open-source data mining applications, placing an emphasis on their ability to handle the proposed architecture, potential for expansion and text processing capabilities. Finding these inflexible and too elaborate to be readily adapted, designs for a novel framework are introduced that focus on rapid prototyping through lightweight customisations and reusable atomic components. Being a consequence of inadequacies with existing options, a rudimentary implementation is realised along with a selection of text categorisation modules. Finally a series of experiments are conducted that validate the feasibility of the outlined methodology and importance of its composition, whilst also establishing the practicality of the framework for research purposes. The simplicity of experiments and results gathered clearly indicate the potential benefits that can be gained when a formalised approach is utilised

    Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis

    Get PDF
    Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development

    Chatbot de Suporte para Plataforma de Marketing Multicanal

    Get PDF
    E-goi is an organization which provides automated multichannel marketing possibilities. Given its system’s complexity, it requires a not so smooth learning curve, which means that sometimes costumers incur upon some difficulties which directs them towards appropriate Costumer Support resources. With an increase in the number of users, these Costumer Support requests are somewhat frequent and demand an increase in availability in Costumer Support channels which become inundated with simple, easily-resolvable requests. The organization idealized the possibility of automating significant portion of costumer generated tickets with the possibility of scaling to deal with other types of operations. This thesis aims to present a long-term solution to that request with the development of a chatbot system, fully integrated with the existing enterprise modules and data sources. In order to accomplish this, prototypes using several Chatbot management and Natural Language Processing frameworks were developed. Afterwards, their advantages and disadvantages were pondered, followed by the implementation of its accompanying system and testing of developed software and Natural Language Processing results. Although the developed overarching system achieved its designed functionalities, the master’s thesis could not offer a viable solution for the problem at hand given that the available data could not provide an intent mining model usable in a real-world context.A E-goi é uma organização que disponibiliza soluções de marketing digital automatizadas e multicanal. Dada a complexidade do seu Sistema, que requer uma curva de aprendizagem não muito suave, o que significa que os seus utilizadores por vezes têm dificuldades que os levam a recorrer aos canais de Apoio ao Cliente. Com um aumento de utilizadores, estes pedidos de Apoio ao Cliente tornam-se frequentes e requerem um aumento da disponibilidade nos canais apropriados que ficam inundados de pedidos simples e de fácil resolução. A organização idealizou a possibilidade de automatizar uma porção significativa de tais pedidos, podendo escalar para outro tipo de operações. Este trabalho de mestrado visa apresentar uma proposta de solução a longo prazo para este problema. Pretende-se o desenvolvimento de um sistema de chatbots, completamente integrado com o sistema existente da empresa e variadas fontes de dados. Para este efeito, foram desenvolvidos protótipos de várias frameworks para gestão de chatbots e de Natural Language Processing, ponderadas as suas vantagens e desvantagens, implementado o sistema englobante e realizados planos de testes ao software desenvolvido e aos resultados de Natural Language Processing. Apesar do sistema desenvolvido ter cumprido as funcionalidades pelas quais foi concebido, a tese de mestrado não foi capaz de obter uma solução viável para o problema dado que com os dados disponibilizados não foi possível produzir um modelo de deteção de intenções usável num contexto real

    accuracy: Tools for Accurate and Reliable Statistical Computing

    Get PDF
    Most empirical social scientists are surprised that low-level numerical issues in software can have deleterious effects on the estimation process. Statistical analyses that appear to be perfectly successful can be invalidated by concealed numerical problems. We have developed a set of tools, contained in accuracy, a package for R and S-PLUS, to diagnose problems stemming from numerical and measurement error and to improve the accuracy of inferences. The tools included in accuracy include a framework for gauging the computational stability of model results, tools for comparing model results, optimization diagnostics, and tools for collecting entropy for true random numbers generation.

    A sentiment analysis approach to increase authorship identification

    Get PDF
    Writing style is considered the manner in which an author expresses his thoughts, influenced by language characteristics, period, school, or nation. Often, this writing style can identify the author. One of the most famous examples comes from 1914 in Portuguese literature. With Fernando Pessoa and his heteronyms Alberto Caeiro, alvaro de Campos, and Ricardo Reis, who had completely different writing styles, led people to believe that they were different individuals. Currently, the discussion of authorship identification is more relevant because of the considerable amount of widespread fake news in social media, in which it is hard to identify who authored a text and even a simple quote can impact the public image of an author, especially if these texts or quotes are from politicians. This paper presents a process to analyse the emotion contained in social media messages such as Facebook to identify the author's emotional profile and use it to improve the ability to predict the author of the message. Using preprocessing techniques, lexicon-based approaches, and machine learning, we achieved an authorship identification improvement of approximately 5% in the whole dataset and more than 50% in specific authors when considering the emotional profile on the writing style, thus increasing the ability to identify the author of a text by considering only the author's emotional profile, previously detected from prior texts.FCT has supported this work – Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2019
    corecore