2,588 research outputs found

    Social media mental health analysis framework through applied computational approaches

    Get PDF
    Studies have shown that mental illness burdens not only public health and productivity but also established market economies throughout the world. However, mental disorders are difficult to diagnose and monitor through traditional methods, which heavily rely on interviews, questionnaires and surveys, resulting in high under-diagnosis and under-treatment rates. The increasing use of online social media, such as Facebook and Twitter, is now a common part of people’s everyday life. The continuous and real-time user-generated content often reflects feelings, opinions, social status and behaviours of individuals, creating an unprecedented wealth of person-specific information. With advances in data science, social media has already been increasingly employed in population health monitoring and more recently mental health applications to understand mental disorders as well as to develop online screening and intervention tools. However, existing research efforts are still in their infancy, primarily aimed at highlighting the potential of employing social media in mental health research. The majority of work is developed on ad hoc datasets and lacks a systematic research pipeline. [Continues.]</div

    Active learning in annotating micro-blogs dealing with e-reputation

    Full text link
    Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

    Geo Data Science for Tourism

    Get PDF
    This reprint describes the recent challenges in tourism seen from the point of view of data science. Thanks to the use of the most popular Data Science concepts, you can easily recognise trends and patterns in tourism, detect the impact of tourism on the environment, and predict future trends in tourism. This reprint starts by describing how to analyse data related to the past, then it moves on to detecting behaviours in the present, and, finally, it describes some techniques to predict future trends. By the end of the reprint, you will be able to use data science to help tourism businesses make better use of data and improve their decision making and operations.

    Unmasking Bias and Inequities: A Systematic Review of Bias Detection and Mitigation in Healthcare Artificial Intelligence Using Electronic Health Records

    Full text link
    Objectives: Artificial intelligence (AI) applications utilizing electronic health records (EHRs) have gained popularity, but they also introduce various types of bias. This study aims to systematically review the literature that address bias in AI research utilizing EHR data. Methods: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guideline. We retrieved articles published between January 1, 2010, and October 31, 2022, from PubMed, Web of Science, and the Institute of Electrical and Electronics Engineers. We defined six major types of bias and summarized the existing approaches in bias handling. Results: Out of the 252 retrieved articles, 20 met the inclusion criteria for the final review. Five out of six bias were covered in this review: eight studies analyzed selection bias; six on implicit bias; five on confounding bias; four on measurement bias; two on algorithmic bias. For bias handling approaches, ten studies identified bias during model development, while seventeen presented methods to mitigate the bias. Discussion: Bias may infiltrate the AI application development process at various stages. Although this review discusses methods for addressing bias at different development stages, there is room for implementing additional effective approaches. Conclusion: Despite growing attention to bias in healthcare AI, research using EHR data on this topic is still limited. Detecting and mitigating AI bias with EHR data continues to pose challenges. Further research is needed to raise a standardized method that is generalizable and interpretable to detect, mitigate and evaluate bias in medical AI.Comment: 29 pages, 2 figures, 2 tables, 2 supplementary files, 66 reference

    Controversy trend detection in social media

    Get PDF
    In this research, we focus on the early prediction of whether topics are likely to generate significant controversy (in the form of social media such as comments, blogs, etc.). Controversy trend detection is important to companies, governments, national security agencies, and marketing groups because it can be used to identify which issues the public is having problems with and develop strategies to remedy them. For example, companies can monitor their press release to find out how the public is reacting and to decide if any additional public relations action is required, social media moderators can moderate discussions if the discussions start becoming abusive and getting out of control, and governmental agencies can monitor their public policies and make adjustments to the policies to address any public concerns. An algorithm was developed to predict controversy trends by taking into account sentiment expressed in comments, burstiness of comments, and controversy score. To train and test the algorithm, an annotated corpus was developed consisting of 728 news articles and over 500,000 comments on these articles made by viewers from CNN.com. This study achieved an average F-score of 71.3% across all time spans in detection of controversial versus non-controversial topics. The results suggest that it is possible for early prediction of controversy trends leveraging social media

    Data analysis and processing from remote sensors for detection of forest fires risk

    Get PDF
    Mestrado de dupla diplomação com o Centro Federal de Educação Tecnológica Celso Suckow da Fonseca - CEFET/RJThis work focuses on developing fire risk detection and prevention algorithms using data collected by sensors in the forest. The study involved a State of the Art review and a Theoretical Foundation, followed by Data Characterization and Data Analysis, which were divided into several sub-sections. The study developed regression models for different types of data and found that the random forest regression model was the best performing for transition times. The study compared different regression models, finding that the Support Vector Regression (SVR) model performed worse than the Gradient Boosting Regression (GBR) and Random Forest Regression (RFR) models. The study concluded that using algorithms to identify periods of the day was a useful strategy for avoiding false alerts and that training the models for each individual module was the best strategy. Furthermore, the RFR and GBR regression models were found to be the most effective for the data available in this study. However, improvements are necessary to reduce false positives and facilitate abnormality detection. Overall, this work provides insight into the most effective methods for analyzing and processing data collected by sensors in the forest for fire risk detection and prevention, with the potential to create alerts for those involved in fighting forest fires.Este trabalho centra-se no desenvolvimento de algoritmos de detecção e prevenção de riscos de incêndio utilizando dados recolhidos por sensores na floresta. O estudo envolveu uma revisão do estado da arte e uma Fundação Teórica, seguida de Caracterização e Análise de Dados, que foi dividida em várias subsecções. O estudo desenvolveu modelos de regressão para diferentes tipos de dados e descobriu que o modelo de random forest regression era o que tinha melhor desempenho para os dados de transição dia e noite. O estudo comparou diferentes modelos de regressão, descobrindo que o modelo de Support Vector Regression (SVR) teve pior desempenho do que os modelos de Gradient Boosting Regression (GBR) e de Random Forest Regression (RFR). O estudo concluiu que a utilização de algoritmos para identificar períodos do dia era uma estratégia útil para evitar falsos alertas, e que o treino dos modelos para cada módulo individual era a melhor estratégia. Além disso, os modelos de regressão RFR e GBR foram considerados como os mais eficazes para os dados disponíveis neste estudo. No entanto, são necessárias melhorias para reduzir os falsos positivos e facilitar a detecção de anomalias. Dessa forma, este trabalho fornece uma visão dos métodos mais eficazes para analisar e processar os dados recolhidos pelos sensores na floresta para detecção e prevenção de riscos de incêndio, com o potencial de criar alertas para os envolvidos no combate aos incêndios florestais

    Asset Clusters and Asset Networks in Financial Risk Management and Portfolio Optimization

    Get PDF
    In this work we use explorative statistical and data mining methods for financial applications like risk management, portfolio optimization and market analysis. The outcomes are visualized and the relations are quantified by mathematical measures. Researchers, analysts and decision makers can visually explore the structures and can carry out management initiatives based on automatic measures provided by the system. There are example applications to equity and loan portfolios