7,839 research outputs found
Early risk detection of self-harm and depression severity using BERT-based transformers:iLab at CLEF eRisk 2020
This paper briefly describes our research groups’ efforts in tackling Task 1 (Early Detection of Signs of Self-Harm), and Task 2 (Measuring the Severity of the Signs of Depression) from the CLEF eRisk Track. Core to how we approached these problems was the use of BERT-based classifiers which were trained specifically for each task. Our results on both tasks indicate that this approach delivers high performance across a series of measures, particularly for Task 1, where our submissions obtained the best performance for precision, F1, latency-weighted F1 and ERDE at 5 and 50. This work suggests that BERT-based classifiers, when trained appropriately, can accurately infer which social media users are at risk of self-harming, with precision up to 91.3% for Task 1. Given these promising results, it will be interesting to further refine the training regime, classifier and early detection scoring mechanism, as well as apply the same approach to other related tasks (e.g., anorexia, depression, suicide)
Early risk detection of self-harm and depression severity using BERT-based transformers : iLab at CLEF eRisk 2020
This paper briefly describes our research groups’ efforts in tackling Task 1 (Early Detection of Signs of Self-Harm), and Task 2 (Measuring the Severity of the Signs of Depression) from the CLEF eRisk Track. Core to how we approached these problems was the use of BERT-based classifiers which were trained specifically for each task. Our results on both tasks indicate that this approach delivers high performance across a series of measures, particularly for Task 1, where our submissions obtained the best performance for precision, F1, latency-weighted F1 and ERDE at 5 and 50. This work suggests that BERT-based classifiers, when trained appropriately, can accurately infer which social media users are at risk of self-harming, with precision up to 91.3% for Task 1. Given these promising results, it will be interesting to further refine the training regime, classifier and early detection scoring mechanism, as well as apply the same approach to other related tasks (e.g., anorexia, depression, suicide)
Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits
Research has proven that stress reduces quality of life and causes many
diseases. For this reason, several researchers devised stress detection systems
based on physiological parameters. However, these systems require that
obtrusive sensors are continuously carried by the user. In our paper, we
propose an alternative approach providing evidence that daily stress can be
reliably recognized based on behavioral metrics, derived from the user's mobile
phone activity and from additional indicators, such as the weather conditions
(data pertaining to transitory properties of the environment) and the
personality traits (data concerning permanent dispositions of individuals). Our
multifactorial statistical model, which is person-independent, obtains the
accuracy score of 72.28% for a 2-class daily stress recognition problem. The
model is efficient to implement for most of multimedia applications due to
highly reduced low-dimensional feature space (32d). Moreover, we identify and
discuss the indicators which have strong predictive power.Comment: ACM Multimedia 2014, November 3-7, 2014, Orlando, Florida, US
Extração de informação de saúde através das redes sociais
Social media has been proven to be an excellent resource for connecting people
and creating a parallel community. Turning it into a suitable source for extracting
real world events information and information about its users as well. All of this
information can be carefully re-arranged for social monitoring purposes and for the
good of its community. For extracting health evidence in the social media, we
started by analyzing and identifying postpartum depression in social media posts.
We participated in an online challenge, eRisk 2020, continuing the previous participation
of BioInfo@UAVR, predicting self-harm users based on their publications on
Reddit. We built an algorithm based on methods of Natural Language Processing
capable of pre-processing text data and vectorizing it. We make use of linguistic
features based on the frequency of specific sets of words, and other models widely
used that represent whole documents with vectors, such as Tf-Idf and Doc2Vec.
The vectors and the correspondent label are then passed to a Machine Learning
classifier in order to train it. Based on the patterns it found, the model predicts
a classification for unlabeled users. We use multiple classifiers, to find the one
that behaves the best with the data. With the goal of getting the most out of
the model, an optimization step is performed in which we remove stop words and
set the text vectorization algorithms and classifier to be ran in parallel. An analysis
of the feature importance is integrated and a validation step is performed.
The results are discussed and presented in various plots, and include a comparison
between different tuning strategies and the relation between the parameters and
the score. We conclude that the choice of parameters is essential for achieving a
better score and for finding them, there are other strategies more efficient then the
widely used Grid Search. Finally, we compare several approaches for building an
incremental classification based on the post timeline of the users. And conclude
that it is possible to have a chronological perception of certain traits of Reddit
users, specifically evaluating the risk of self-harm with a F1 Score of 0.73.As redes sociais são um excelente recurso para conectar pessoas, criando assim
uma comunidade paralela em que fluem informações acerca de eventos globais
bem como sobre os seus utilizadores. Toda esta informação pode ser trabalhada
com o intuito de monitorizar o bem estar da sua comunidade. De forma a encontrar
evidência médica nas redes sociais, começámos por analisar e identificar
posts de mães em risco de depressão pós-parto no Reddit. Participámos num concurso
online, eRisk 2020, com o intuito de continuar a participação da equipa BioInfo@
UAVR, em que prevemos utilizadores que estão em risco de se automutilarem
através da análise das suas publicações no Reddit. Construímos um algoritmo com
base em métodos de Processamento de Linguagem Natural capaz de pré-processar
os dados de texto e vectorizá-los. Fazendo uso de características linguísticas baseadas
na frequência de conjuntos de palavras, e outros modelos usados globalmente,
capazes de representar documentos com vetores, como o Tf-Idf e o Doc2Vec. Os
vetores e a sua respetiva classificação são depois disponibilizados a algoritmos de
Aprendizagem Automática, para serem treinados e encontrar padrões entre eles.
Utilizamos vários classificadores, de forma a encontrar o que se comporta melhor
com os dados. Com base nos padrões que encontrou, os classificadores prevêm
a classificação de utilizadores ainda por avaliar. De forma a tirar o máximo proveito
do algoritmo, é desempenhada uma otimização em que as stop words são
removidas e paralelizamos os algoritmos de vectorização de texto e o classificador.
Incorporamos uma análise da importância dos atributos do modelo e a otimização
dos híper parâmetros de forma a obter um resultado melhor. Os resultados
são discutidos e apresentados em múltiplos plots, e incluem a comparação entre
diferentes estratégias de optimização e observamos a relação entre os parâmetros
e a sua performance. Concluimos que a escolha dos parâmetros é essencial para
conseguir melhores resultados e que para os encontrar, existem estratégias mais
eficientes que o habitual Grid Search, como o Random Search e a Bayesian Optimization.
Comparamos também várias abordagens para formar uma classificação
incremental que tem em conta a cronologia dos posts. Concluimos que é possível
ter uma perceção cronológica de traços dos utilizadores do Reddit, nomeadamente
avaliar o risco de automutilação, com um F1 Score de 0,73.Mestrado em Engenharia de Computadores e Telemátic
Automated depression detection in text data: leveraging lexical features, phonesthemes embedding, and roberta transformer model
Indexed keywords
SciVal Topics
Metrics
Funding details
Abstract
Depression is a prevalent mental disorder characterized by persistent sadness, lack of interest, and diminished pleasure. Detecting depression is crucial for timely intervention and support. In this paper, we address the task of depression detection in text data, focusing on binary classification and regression. We present our approach, leveraging a dataset comprising labeled messages from Telegram groups related to mental disorders. We begin by exploring the existing literature on depression detection, highlighting the challenges faced and the methods employed. Our approach involves data pre-processing, lexical feature extraction, phonesthemes embedding, and using the RoBERTa transformer model. We achieved promising results in the training phase through rigorous experimentation and model refinement. However, we encountered challenges upon evaluating our approach in the MentalRiskEs evaluation. We identified areas for improvement, particularly in latency and speed of detection for real-time monitoring of depression-related risks. This research contributes to the ongoing efforts in automating depression detection and provides insights into the potential of text analysis techniques for mental health assessment. We remain committed to further enhancing our methodology and advancing the field to improve the well-being of individuals affected by depression.Universidad Tecnológica de Bolíva
Early Detection of Cyberbullying on Social Media Networks
[Abstract]
Cyberbullying is an important issue for our society and has a major negative effect on the victims, that can be highly damaging due to the frequency and high propagation provided by Information Technologies. Therefore, the early detection of cyberbullying in social networks becomes crucial to mitigate the impact on the victims. In this article, we aim to explore different approaches that take into account the time in the detection of cyberbullying in social networks. We follow a supervised learning method with two different specific early detection models, named threshold and dual. The former follows a more simple approach, while the latter requires two machine learning models. To the best of our knowledge, this is the first attempt to investigate the early detection of cyberbullying. We propose two groups of features and two early detection methods, specifically designed for this problem. We conduct an extensive evaluation using a real world dataset, following a time-aware evaluation that penalizes late detections. Our results show how we can improve baseline detection models up to 42%.This research was supported by the Ministry of Economy and Competitiveness of Spain and FEDER funds of the European Union (Project PID2019-111388GB-I00) and by the Centro de Investigación de Galicia “CITIC”, funded by Xunta de Galicia (Galicia, Spain) and the European Union (European Regional Development Fund — Galicia 2014–2020 Program) , by grant ED431G 2019/01Xunta de Galicia; ED431G 2019/0
- …