    A hierarchical topic modelling approach for tweet clustering

    While social media platforms such as Twitter can provide rich and up-to-date information for a wide range of applications, manually digesting such large volumes of data is difficult and costly. Therefore it is important to automatically infer coherent and discriminative topics from tweets. Conventional topic models and document clustering approaches fail to achieve good results due to the noisy and sparse nature of tweets. In this paper, we explore various ways of tackling this challenge and finally propose a two-stage hierarchical topic modelling system that is efficient and effective in alleviating the data sparsity problem. We present an extensive evaluation on two datasets, and report our proposed system achieving the best performance in both document clustering performance and topic coherence

    Cluster Analysis of Twitter Data: A Review of Algorithms

    Twitter, a microblogging online social network (OSN), has quickly gained prominence as it provides people with the opportunity to communicate and share posts and topics. Tremendous value lies in automated analysing and reasoning about such data in order to derive meaningful insights, which carries potential opportunities for businesses, users, and consumers. However, the sheer volume, noise, and dynamism of Twitter, imposes challenges that hinder the efficacy of observing clusters with high intra-cluster (i.e. minimum variance) and low inter-cluster similarities. This review focuses on research that has used various clustering algorithms to analyse Twitter data streams and identify hidden patterns in tweets where text is highly unstructured. This paper performs a comparative analysis on approaches of unsupervised learning in order to determine whether empirical findings support the enhancement of decision support and pattern recognition applications. A review of the literature identified 13 studies that implemented different clustering methods. A comparison including clustering methods, algorithms, number of clusters, dataset(s) size, distance measure, clustering features, evaluation methods, and results was conducted. The conclusion reports that the use of unsupervised learning in mining social media data has several weaknesses. Success criteria and future directions for research and practice to the research community are discussed

    Moeda digital: uma exploração bibliométrica do fenómeno Bitcoin

    Num mundo cada vez mais globalizado, temos vindo a assistir à emergência da moeda digital e do seu potencial para aumentar a eficiência dos sistemas de pagamento existentes. No entanto, o dinheiro digital pode também ocultar riscos sérios que se podem transformar em perdas financeiras significativas para os seus utilizadores. Perante este cenário os bancos centrais estão preocupados com a manutenção da estabilidade e eficiência do sistema financeiro e em preservar a confiança nas suas moedas, pois as inovações nos pagamentos podem ter implicações importantes para a segurança do sistema bancário. Com o presente estudo pretende-se efetuar uma revisão sistemática do atual estado da arte da literatura científica sobre a moeda digital, focada sobretudo no caso específico da bitcoin, de modo a investigar a forma como este fenómeno tem sido estudado até à presente data. Tendo como base uma síntese crítica sobre os resultados obtidos, nomeadamente o locus e foco das questões, teorias, métodos e descobertas abordados na literatura pesquisada, pretende-se contribuir para a construção de uma visão mais integrada de um fenómeno que se encontra em expansão. Para o efeito, foi utilizada uma abordagem metodológica quantitativa, a qual proporciona ao leitor uma visão mais abrangente da temática abordada. Foi selecionado um corpus de 140 artigos publicados em fontes indexadas no site Scopus, com o qual foi construída uma base de dados. Essa base de dados serviu depois para efetuar uma análise bibliométrica para estudar a evolução do estado da arte sobre a bitcoin por parte da literatura científica.In an increasingly globalized world, we have been witnessing the emergence of digital currency and its potential to increase the efficiency of existing payment systems. However, digital money can also hide serious risks that can turn into significant financial losses for its users. Against this background, central banks are concerned about maintaining the stability and efficiency of the financial system and maintaining confidence in their currencies, as innovations in payments can have important implications for the security of the banking system. In addition, there is great uncertainty about what will be the economic benefit of the digital currency and its effects on the effectiveness of monetary policy. With the present study we intend to carry out a systematic review of the current state of the art of the scientific literature on digital currency, focused mainly on the specific case of bitcoin, in order to investigate the way this phenomenon has been studied to date. Based on a critical synthesis of the results obtained, namely the locus and focus of the issues, theories, methods and discoveries addressed in the researched literature, it is intended to contribute to the construction of a more integrated vision of a phenomenon that is expanding. For this purpose, a quantitative methodological approach, which provides the reader with a more comprehensive view of the subject matter, was used. A corpus of 140 research studies published in sources indexed in the Scopus was selected, with which a database was built. This database was then used to perform a bibliometric analysis to study the evolution of the state of the art on Bitcoin by the scientific literature

    Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering

    Second Workshop on Social News on the Web (SNOW), Seoul, Korea, 8 April 2014Twitter has become as much of a news media as a social network, and much research has turned to analysing its content for tracking real-world events, from politics to sports and natural disasters. This paper describes the techniques we employed for the SNOW Data Challenge 2014, described in [16]. We show that aggressive lettering of tweets based on length and structure, combined with hierarchical clustering of tweets and ranking of the resulting clusters, achieves encouraging results. We present empirical results and discussion for two different Twitter streams focusing on the US presidential elections in 2012 and the recent events about Ukraine, Syria and the Bitcoin, in February 2014.Science Foundation Irelan

