    Finding the Most Interpretable Topic Modeling Approach

    Mutual-Excitation of Cryptocurrency Market Returns and Social Media Topics

    Cryptocurrencies have recently experienced a new wave of price volatility and interest; activity within social media communities relating to cryptocurrencies has increased significantly. There is currently limited documented knowledge of factors which could indicate future price movements. This paper aims to decipher relationships between cryptocurrency price changes and topic discussion on social media to provide, among other things, an understanding of which topics are indicative of future price movements. To achieve this a well-known dynamic topic modelling approach is applied to social media communication to retrieve information about the temporal occurrence of various topics. A Hawkes model is then applied to find interactions between topics and cryptocurrency prices. The results show particular topics tend to precede certain types of price movements, for example the discussion of 'risk and investment vs trading' being indicative of price falls, the discussion of 'substantial price movements' being indicative of volatility, and the discussion of 'fundamental cryptocurrency value' by technical communities being indicative of price rises. The knowledge of topic relationships gained here could be built into a real-time system, providing trading or alerting signals.Comment: 3rd International Conference on Knowledge Engineering and Applications (ICKEA 2018) - Moscow, Russia (June 25-27 2018


    Portal berita memberikan informasi yang sangat beragam, namun judul berita tidak dapat dijadikan acuan utama dalam penentuan topik suatu berita secara keseluruhan karena judul berita  bersifat  hipebola untuk menarik pembaca. Oleh karena itu, penelitian ini  mengusulkan sistem identifikasi topik artikel berita menggunakan topic modelling dengan algoritma Latent Dirichlet Allocation (LDA). Tahapan penelitian diawali dengan pengambilan data secara otomatis dari situs web detik.com dan tempo.co dengan proses web scrapping, kemudian dilakukan preprocessing terhadap data. Ada 4 tahap preprocessing yaitu tokenization, case folding, stopword removal, dan stemming. Tahap terakhir adalah topic modelling dengan algoritma LDA. Topic modelling merupakan model statistik untuk menentukan inti atau topik pada kumpulan dokumen. Identifikasi  topik dengan algoritma LDA  didasarkan pada probabilitas kemunculan kata dalam kumpulan dokumen. Penelitian ini menghasilkan topik yang paling sering muncul dalam portal berita kriminal adalah pembunuha

    Social Media Mining in Drug Development Decision Making: Prioritizing Multiple Sclerosis Patients’ Unmet Medical Needs

    Pharmaceutical companies increasingly must consider patients’ needs in drug development. Since patients’ needs are often difficult to measure, especially in rare diseases, information in drug development decision-making is limited. In the proposed study, we employ the opportunity algorithm to identify and prioritize unmet medical needs of multiple sclerosis patients shared in social media posts. Using topic modeling and sentiment analysis features of the opportunity algorithm are generated. The result implies that sensory problems, pain, mental health problems, fatigue and sleep disturbances represent the highest unmet medical needs of the samples population. The present study suggests a promising potential of this method to provide relevant insights into rare disease populations to promote patient-centered drug development

    Knowledge Discovery from CVs: A Topic Modeling Procedure

    With a huge number of CVs available online, recruiting via the web has become an integral part of human resource management for companies. Automated text mining methods can be used to analyze large databases containing CVs. We present a topic modeling procedure consisting of five steps with the aim of identifying competences in CVs in an automated manner. Both the procedure and its exemplary application to CVs from IT experts are described in detail. The specific characteristics of CVs are considered in each step for optimal results. The exemplary application suggests that clearly interpretable topics describing fine-grained competences (e.g., Java programming, web design) can be discovered. This information can be used to rapidly assess the contents of a CV, categorize CVs and identify candidates for job offers. Furthermore, a topic-based search technique is evaluated to provide helpful decision support

    A Social Citizen Dashboard for Participatory Urban Planning in Berlin: Prototype and Evaluation

    Participatory urban planning enables citizens to make their voices heard in the urban planning process. The resulting measures are more likely to be accepted by the community. However, the parti-cipation process becomes more effortful and time-consuming. New approaches have been developed using digital technologies to facilitate citizen participation, such as topic modeling based on social media. Using Twitter data for the city of Berlin, we explore how social media and topic modeling can be used to classify and analyze citizen opinions. We develop a Social Citizen Dashboard allowing for a better understanding of changes in citizens’ priorities and incorporating constant cycles of feedback throughout planning phases. Evaluation interviews indicate the dashboard’s potential usefulness and implications as well as point to limitation in data quality and spur further research potentials

    Utility of Large-scale Recipe Data in Food Computing

    This article aims to look at the recipe data analysis from a critical perspective, offering the authors’ own learning experience from successes and failures of the research process. The present recipe research has been limited by the availability of data, which in the case of recipes mostly consists of texts depicting a variety of ingredients. This has contributed to a better understanding of flavour formation and nutritional value of food but has not led further to establishing a corpus of healthy and unhealthy foods. Time-related cooking aspects have remained largely out of the present research’s scope due to the difficulties in obtaining immediately analyzable data. The same goes for the recipe-relate research on food texture, color and other aspects. In this research the methodology of topic modelling has been applied to analyze recipes in North American and Mexican cuisines in order to highlight the core culinary themes within these two cuisines. Potential for result analysis, as well as its limitations, are also discussed. Topic models of agglomerated data can be helpful in further multisensory research, as they provide some insights into the colour, the flavour and, potentially, the texture of certain groups of dishes. It can be combined further on with social media sentiment analysis and other research methods to better grasp the human relationship with food. © 2021 Baltic Journal of Modern Computing. All rights Reserved

    Mutual-excitation of cryptocurrency market returns and social media topics

    Data Science as a Tool to Support Decision-Making: Descending Hierarchical Classification of Access to Information Requests in the Municipality of São Paulo

    Buscou-se compreender de que forma a ciência de dados e as tecnologias de mineração e classificação de textos podem contribuir para a tomada de decisões a partir de uma melhor compreensão agregada dos pe didos de acesso à informação. A pesquisa utilizou dados dos pedidos de acesso à informação feitos à Prefeitura Municipal de São Paulo (PMSP), de 2012 a 2019, disponíveis no Portal de Dados Abertos da municipalidade, propondo a identificação e classificação das principais questões apresentadas. Os 39.369 textos dos pedidos de acesso submetidos à PMSP foram reunidos em um corpus e submetidos a análise por meio de Classificação Hierárquica Descendente (CHD). Ao propor uma classificação de textos como uma metodologia para análise de dados textuais, reforçou-se um paradigma de que dados textuais não pertencem apenas ao campo qualitativo. Além disso, a consideração de apenas substantivos, excluídos verbos e advérbios; e os adjetivos mais ocorrentes serem usados como parte de expressões, permitiu uma otimização do contexto dos pedidos, proporcionando classificar os dados textuais de maneira mais objetiva, mitigando o viés dos investigadores. O artigo apresenta também outros estudos de caso relevantes para a pesquisa, com referências encontradas na análise de pedidos de acesso à informação, contribuindo para a compreensão de pedidos dos cidadãos de modo aglutinado e permitindo aos tomadores de decisões um melhor entendimento das demandas da sociedade, podendo resultar em políticas públicas mais focadas. Conclui-se que a análise dos dados através da CHD permite obter informações relevantes para a tomada de decisão baseada em dados e evidências e que a abordagem favorece a concretização de decisões fundamentadas e mais próximas das necessidades dos cidadãos.info:eu-repo/semantics/publishedVersio

    Mapping Phenomena Relevant to Adolescent Emotion Regulation: A Text-Mining Systematic Review

    Adolescence is a developmentally sensitive period for emotion regulation with potentially lifelong implications for mental health and well-being. Although substantial empirical research has addressed this topic, the literature is fragmented across subdisciplines, and an overarching theoretical framework is lacking. The first step toward constructing a unifying framework is identifying relevant phenomena. This systematic review of 6305 articles used text mining to identify phenomena relevant to adolescents’ emotion regulation. First, a baseline was established of relevant phenomena discussed in theory and recent narrative reviews. Then, article keywords and abstracts were analyzed using text mining, examining term frequency as an indicator of relevance and term co-occurrence as an indicator of association. The results reflected themes commonly featured in theory and narrative reviews, such as socialization and neurocognitive development, but also identified undertheorized themes, such as developmental disorders, physical health, external stressors, structural disadvantage, substance use, identity and moral development, and sexual development. The findings illustrate how text mining systematic reviews, a novel approach, may complement narrative reviews. Future theoretical work might integrate these undertheorized themes into an overarching framework, and empirical research might consider them as promising areas for future research, or as potential confounders in research on adolescents’ emotion regulation