5 research outputs found

    Impact of Social Media on Dubai Stock Market using Sentiment Analysis

    Get PDF
    One of the main objectives of having securities stock markets is to ensure fair trading. Our analysis of this study will show how sentiment analysis and text mining techniques can help stock markets to sense wipes in the market participants\u27 behaviors and how the market community can benefit from it. The ability to detect potential insiders and investors\u27 mood would help the stock market to take necessary actions to protect the trading environment and enhance investors’ trust in the market. In this project, we will be building a pilot proof of concept utilizing sentiment analysis on Twitter, one of the most popular social media applications, and Dubai Financial Market, one of the most active stock markets in the United Arab Emirates (UAE), in the English language. The project can grow in sophistication and coverage in the future. In this project, I am using R as a primary development tool where a statistical and visual analysis will be carried out utilizing its rich open community libraries

    Adaptive sentiment analysis

    Get PDF
    Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts

    Statistical data mining for Sina Weibo, a Chinese micro-blog: sentiment modelling and randomness reduction for topic modelling

    Get PDF
    Before the arrival of modern information and communication technology, it was not easy to capture people’s thoughts and sentiments; however, the development of statistical data mining techniques and the prevalence of mass social media provide opportunities to capture those trends. Among all types of social media, micro-blogs make use of the word limit of 140 characters to force users to get straight to thepoint, thus making the posts brief but content-rich resources for investigation. The data mining object of this thesis is Weibo, the most popular Chinese micro-blog. In the first part of the thesis, we attempt to perform various exploratory data mining on Weibo. After the literature review of micro-blogs, the initial steps of data collection and data pre-processing are introduced. This is followed by analysis of the time of the posts, analysis between intensity of the post and share price, term frequency and cluster analysis. Secondly, we conduct time series modelling on the sentiment of Weibo posts. Considering the properties of Weibo sentiment, we mainly adopt the framework of ARMA mean with GARCH type conditional variance to fit the patterns. Other distinct models are also considered for negative sentiment for its complexity. Model selection and validation are introduced to verify the fitted models. Thirdly, Latent Dirichlet Allocation (LDA) is explained in depth as a way to discover topics from large sets of textual data. The major contribution is creating a Randomness Reduction Algorithm applied to post-process the output of topic models, filtering out the insignificant topics and utilising topic distributions to find out the most persistent topics. At the end of this chapter, evidence of the effectiveness of the Randomness Reduction is presented from empirical studies. The topic classification and evolution is also unveiled

    El lenguaje económico en los tiempos de la crisis global: un estudio longitudinal de análisis de sentimiento

    Get PDF
    El siguiente paso fue el análisis de datos, en el cual se realiza el análisis de sentimiento los conjuntos de datos. El análisis consta de tres partes: (a) una tabla de resultados estadísticos descriptivos longitudinales relativos a las puntuaciones de sentimiento, (b) una tabla anual de colocaciones y (c) una discusión sobre los hallazgos en el corpus a partir de la observación de rankings anuales de colocaciones, con la intención de triangular los datos obtenidos. Principalmente, se evidencian dos hechos: (1) Los términos se convierten en palabras evento dado el enorme aumento de su frecuencia de uso debido a los eventos clave de la crisis. A partir de este fenómeno se producen cambios significativos en el uso (la orientación semántica de colocaciones varía) y frecuentemente suelen tiene un nivel menor de especialización. (2) Las medias anuales de la orientación semántica de un término contextualizado permiten observar fluctuaciones importantes en el sentimiento embebido en el discurso. Una triangulación de los datos cuantitativos con sus colocaciones más significativas y los eventos relacionados con la Gran Recesión permite concluir que la orientación semántica de los términos del dominio económico-financiero es muy susceptible de variar a medida que se desarrollaron los hechos de la crisis financiera. Fecha de lectura de Tesis Doctoral: 20 de septiembre 2019Esta tesis se centra en el estudio longitudinal de la influencia de los eventos en la forma en la orientación semántica en la terminología económica. En este caso se estudiará el periodo de la Gran Recesión, un acontecimiento de primer orden que generó una gran cantidad de información textual que se ha aprovechado como fuente de datos susceptibles de ser analizados automáticamente. El análisis de sentimiento es una disciplina del procesamiento del lenguaje natural que se ocupa del tratamiento computacional de la opinión de la subjetividad en los textos. Por ello, el objetivo general de esta tesis es analizar las fluctuaciones en la orientación semántica de una serie de términos económicos dentro del período 2007-2015 a través de la caracterización del impacto de los eventos de mayor orden en las variaciones semánticas de las unidades léxicas. Entre sus objetivos específicos están: (1) recopilar un lexicón de sentimiento de dominio económico-financiero en lengua inglesa a partir de un corpus de noticias económicas diseñado ad-hoc, (2) definir un conjunto de datos longitudinal en forma de oraciones que contienen los términos de estudio y que serán el input del análisis de sentimiento, (3) tras analizar los una serie de términos económicos-financieros, identificar los eventos que han acompañado a cambios en su orientación semántica y (4) analizar las posibles variaciones en la prosodia semántica. Para llevar a cabo el análisis automático, se desarrolló LexiEcon, un lexicón plug-in de dominio específico para la lengua inglesa adaptado para la suite Lingmotif. Dada su amplitud, los resultados de cobertura y exhaustividad de su evaluación fueron muy satisfactorios (F1 0,735). Esta cifra supone alrededor de un 20% más que los resultados que ofrece Lingmotif sin léxico específico cuando clasifica los textos del dominio económico-financiero
    corecore