391 research outputs found

    Stock market sentiment lexicon acquisition using microblogging data and statistical measures

    Get PDF
    Lexicon acquisition is a key issue for sentiment analysis. This paper presents a novel and fast approach for creating stock market lexicons. The approach is based on statistical measures applied over a vast set of labeled messages from StockTwits, which is a specialized stock market microblog. We compare three adaptations of statistical measures, such as pointwise mutual information (PMI), two new complementary statistics and the use of sentiment scores for affirmative and negated con- texts. Using StockTwits, we show that the new lexicons are competitive for measuring investor sentiment when compared with six popular lexicons. We also applied a lexicon to easily produce Twitter investor sentiment indicators and analyzed their correlation with survey sentiment indexes. The new microblogging indicators have a moderate correlation with popular Investors Intelligence (II) and American Association of Individual Investors (AAII) indicators. Thus, the new microblogging approach can be used alternatively to traditional survey indicators with advantages (e.g., cheaper creation, higher frequencies).This work was supported by FCT - Funda ção para a Ciência e Tecnologia within the Project Scope UID/CEC/00319/201

    An empirical analysis of phrase-based and neural machine translation

    Get PDF
    Two popular types of machine translation (MT) are phrase-based and neural machine translation systems. Both of these types of systems are composed of multiple complex models or layers. Each of these models and layers learns different linguistic aspects of the source language. However, for some of these models and layers, it is not clear which linguistic phenomena are learned or how this information is learned. For phrase-based MT systems, it is often clear what information is learned by each model, and the question is rather how this information is learned, especially for its phrase reordering model. For neural machine translation systems, the situation is even more complex, since for many cases it is not exactly clear what information is learned and how it is learned. To shed light on what linguistic phenomena are captured by MT systems, we analyze the behavior of important models in both phrase-based and neural MT systems. We consider phrase reordering models from phrase-based MT systems to investigate which words from inside of a phrase have the biggest impact on defining the phrase reordering behavior. Additionally, to contribute to the interpretability of neural MT systems we study the behavior of the attention model, which is a key component in neural MT systems and the closest model in functionality to phrase reordering models in phrase-based systems. The attention model together with the encoder hidden state representations form the main components to encode source side linguistic information in neural MT. To this end, we also analyze the information captured in the encoder hidden state representations of a neural MT system. We investigate the extent to which syntactic and lexical-semantic information from the source side is captured by hidden state representations of different neural MT architectures.Comment: PhD thesis, University of Amsterdam, October 2020. https://pure.uva.nl/ws/files/51388868/Thesis.pd

    AN APPROACH TO SENTIMENT ANALYSIS –THE CASE OF AIRLINE QUALITY RATING

    Get PDF
    Sentiment mining has been commonly associated with the analysis of a text string to determine whether a corpus is of a negative or positive opinion. Recently, sentiment mining has been extended to address problems such as distinguishing objective from subjective propositions, and determining the sources and topics of different opinions expressed in textual data sets such as web blogs, tweets, message board reviews, and news. Companies can leverage opinion polarity and sentiment topic recognition to gain a deeper understanding of the drivers and the overall scope of sentiments. These insights can advance competitive intelligence, improve customer service, attain better brand image, and enhance competitiveness. This research paper proposes a sentiment mining approach which detects sentiment polarity and sentiment topic from text. The approach includes a sentiment topic recognition model that is based on Correlated Topics Models (CTM) with Variational Expectation-Maximization (VEM) algorithm. We validate the effectiveness and efficiency of this model using airline data from Twitter. We also examine the reputation of three major airlines by computing their Airline Quality Rating (AQR) based on the output from our approach

    Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordZero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.European Union Horizon 202

    Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis

    Get PDF
    Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development

    Extraction of opinionated profiles from comments on web news

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Visual Semantic Embedding Model based on DeViSE for medical imaging

    Get PDF
    Dissertação de mestrado em Informatics EngineeringDuring the last decades, artificial intelligence algorithms have been evolving to the point that they can achieve some amazing results like, identify and navigate roads, identify fraudulent transactions, personalize crops to individual conditions, discover new consumer trends, predict personalized health outcomes, optimize merchandising strategies, predict maintenance, optimize pricing and scheduling in real-time, diagnose diseases, among many others. However, although it can do all of that, it needs all the data to be correctly label, in other words, it can not, for example, diagnose a disease, such as a stroke, if it does not know what a stroke is, so if the algorithm has never been trained to identify strokes a new algorithm has to be created or the current one has to be retrained, similar issues happen in the other examples. This work focuses on this problem and tries to solve it by using a related in a high dimensional vector space, called semantic space, where the knowledge from known classes can be transferred to unknown classes.Durante as últimas décadas, os algoritmos de inteligência artificial têm evoluído ao ponto de alcançarem resultados incríveis, como identificar e navegar estradas, identificar transações fraudulentas, personalizar colheitas para condições individuais, descobrir novas tendências de consumo, prever resultados de saúde personalizados, otimizar merchandising estratégias, prever manutenções, otimizar preços e agendamentos em tempo real, diagnosticar doenças, entre muitos outros. Porém, embora possa fazer tudo isso, precisa que todos os dados sejam identificados corretamente, ou seja, não pode, por exemplo, diagnosticar uma doença, como um acidente vascular cerebral, se não souber o que é um AVC, portanto, se o algoritmo nunca foi treinado para identificar AVC’s um novo algoritmo precisa de ser criado ou o atual de ser retreinado, problemas semelhantes acontecem nos outros exemplos. Esta tese foca-se neste problema e tenta resolvê-lo usando um espaço vetorial relacionado de alta dimensão, denominado espaço semântico, onde o conhecimento de classes conhecidas pode ser transferido para classes desconhecidas
    corecore