69 research outputs found
Social media data mining: tools for collecting Twitter data
[ES] En la actualidad la minería de datos de los medios sociales suele estar centrada en recopilar información procedente de Twitter. Uno de los principales problemas en el análisis de redes sociales en estos casos es la adquisición de los datos en forma de grafo. La posibilidad de hacer esta labor «a mano», es impensable cuando se tratan redes de cientos o miles de nodos en constante comunicación entre ellos. Twitter, además, es un ejemplo muy claro de esta problemática. De cualquier trending topic se pueden generan cientos de tweets en una hora. Es necesario herramientas de adquisición de todos esos datos de una forma automatizada utilizando las posibilidades que la propia red social ofrece. Un ejemplo de herramienta que ofrece una obtención de datos, con limitaciones, es la herramienta Netlytic. No es la única, pero es sencilla de utilizar. En este documento se muestra un caso de uso de un proyecto recoplado con Netlytic y graficado mediante Gephi[EN] Currently, social media data mining is often focused on collecting information from Twitter. One of the main problems in social network analysis in these cases is the acquisition of data in the form of a graph. The possibility of doing this task "by hand" is unthinkable when dealing with networks of hundreds or thousands of nodes in constant communication with each other. Twitter, moreover, is a very clear example of this problem. Any trending topic can generate hundreds of tweets in an hour. Tools are needed to acquire all this data in an automated way using the possibilities offered by the social network itself. An example of a tool that offers data acquisition, with limitations, is the Netlytic tool. It is not the only one, but it is simple to use. This document shows a use case of a project collected with Netlytic and plotted using Gephi
Implementing local-explainability in Gradient Boosting Trees: Feature Contribution
[EN] Gradient Boost Decision Trees (GBDT) is a powerful additive model based on tree ensembles. Its nature makes GBDT a black-box model even though there are multiple explainable artificial intelligence (XAI) models obtaining information by reinterpreting the model globally and locally. Each tree of the ensemble is a transparent model itself but the final outcome is the result of a sum of these trees and it is not easy to clarify. In this paper, a feature contribution method for GBDT is developed. The proposed method takes advantage of the GBDT architecture to calculate the contribution of each feature using the residue of each node. This algorithm allows to calculate the sequence of node decisions given a prediction. Theoretical proofs and multiple experiments have been carried out to demonstrate the performance of our method which is not only a local explicability model for the GBDT algorithm but also a unique option that reflects GBDTs internal behavior. The proposal is aligned to the contribution of characteristics having impact in some artificial intelligence problems such as ethical analysis of Artificial Intelligence (AI) and comply with the new European laws such as the General Data Protection Regulation (GDPR) about the right to explain and nondiscrimination.S
An Ontology-Based multi-domain model in Social Network Analysis: Experimental validation and case study
The use of social network theory and methods of analysis have been applied to
different domains in recent years, including public health. The complete
procedure for carrying out a social network analysis (SNA) is a time-consuming
task that entails a series of steps in which the expert in social network
analysis could make mistakes. This research presents a multi-domain knowledge
model capable of automatically gathering data and carrying out different social
network analyses in different domains, without errors and obtaining the same
conclusions that an expert in SNA would obtain. The model is represented in an
ontology called OntoSNAQA, which is made up of classes, properties and rules
representing the domains of People, Questionnaires and Social Network Analysis.
Besides the ontology itself, different rules are represented by SWRL and SPARQL
queries. A Knowledge Based System was created using OntoSNAQA and applied to a
real case study in order to show the advantages of the approach. Finally, the
results of an SNA analysis obtained through the model were compared to those
obtained from some of the most widely used SNA applications: UCINET, Pajek,
Cytoscape and Gephi, to test and confirm the validity of the model
The socialisation of the adolescent who carries out team sports: a transversal study of centrality with a social network analysis
[ES] Se analiza la actividad física realizada por los adolescentes del estudio, su relación con el sobrepeso (sobrepeso+obesidad) y la estructura de la red social de amistad establecida en adolescentes que practican deportes colectivos, utilizando diferentes parámetros indicativos de centralidad.[EN]Objectives To analyse the physical activity carried out
by the adolescents in the study, its relationship to being
overweight (overweight+obese) and to analyse the
structure of the social network of friendship established in
adolescents doing group sports, using different parameters
indicative of centrality.
Setting It was carried out in an educational environment,
in 11 classrooms belonging to 5 Schools in Ponferrada
(Spain).
Participants 235 adolescents were included in the study
(49.4% female), who were classified as normal weight or
overweight.
Primary and secondary outcome measures Physical
Activity Questionnaire for Adolescents (PAQ-A) was used
to study the level of physical activity. A social network
analysis was carried out to analyse structural variables of
centrality in different degrees of contact.
Results 30.2% of the participants in our study were
overweight. Relative to female participants in this study,
males obtained significantly higher scores in the PAQ-A
(OR: 2.11; 95% CI: 1.04 to 4.25; p value: 0.036) and were
more likely to participate in group sport (OR: 4.59; 95%
CI: 2.28 to 9.22; p value: 0.000). We found no significant
relationship between physical activity and the weight
status in the total sample, but among female participants,
those with overweight status had higher odds of reporting
high levels of physical exercise (OR: 4.50; 95% CI: 1.21 to
16.74; p value: 0.025). In terms of centrality, differentiating
by gender, women who participated in group sports
were more likely to be classified as having low values of
centrality, while the opposite effect occurred for men, more
likely to be classified as having high values of centrality.
Conclusions Our findings, with limitations, underline the
importance of two fundamental aspects to be taken into
account in the design of future strategies: gender and
the centrality within the social network depending on the
intensity of contact they have with their peers
The socialisation of the adolescent who carries out team sports: a transversal study of centrality with a social network analysis
[EN] Objectives To analyse the physical activity carried out by the adolescents in the study, its relationship to being overweight (overweight+obese) and to analyse the structure of the social network of friendship established in adolescents doing group sports, using different parameters indicative of centrality.
Setting It was carried out in an educational environment, in 11 classrooms belonging to 5 Schools in Ponferrada (Spain).
Participants 235 adolescents were included in the study (49.4% female), who were classified as normal weight or overweight.
Primary and secondary outcome measures Physical Activity Questionnaire for Adolescents (PAQ-A) was used to study the level of physical activity. A social network analysis was carried out to analyse structural variables of centrality in different degrees of contact.
Results 30.2% of the participants in our study were overweight. Relative to female participants in this study, males obtained significantly higher scores in the PAQ-A (OR: 2.11; 95% CI: 1.04 to 4.25; p value: 0.036) and were more likely to participate in group sport (OR: 4.59; 95% CI: 2.28 to 9.22; p value: 0.000). We found no significant relationship between physical activity and the weight status in the total sample, but among female participants, those with overweight status had higher odds of reporting high levels of physical exercise (OR: 4.50; 95% CI: 1.21 to 16.74; p value: 0.025). In terms of centrality, differentiating by gender, women who participated in group sports were more likely to be classified as having low values of centrality, while the opposite effect occurred for men, more likely to be classified as having high values of centrality.
Conclusions Our findings, with limitations, underline the importance of two fundamental aspects to be taken into account in the design of future strategies: gender and the centrality within the social network depending on the intensity of contact they have with their peers.S
A Semantic Social Network Analysis Tool for Sensitivity Analysis and What-If Scenario Testing in Alcohol Consumption Studies
Social Network Analysis (SNA) is a set of techniques developed in the field
of social and behavioral sciences research, in order to characterize and study
the social relationships that are established among a set of individuals. When
building a social network for performing an SNA analysis, an initial process of
data gathering is achieved in order to extract the characteristics of the
individuals and their relationships. This is usually done by completing a
questionnaire containing different types of questions that will be later used
to obtain the SNA measures needed to perform the study. There are, then, a
great number of different possible network generating questions and also many
possibilities for mapping the responses to the corresponding characteristics
and relationships. Many variations may be introduced into these questions (the
way they are posed, the weights given to each of the responses, etc.) that may
have an effect on the resulting networks. All these different variations are
difficult to achieve manually, because the process is time-consuming and error
prone. The tool described in this paper uses semantic knowledge representation
techniques in order to facilitate this kind of sensitivity studies. The base of
the tool is a conceptual structure, called "ontology" that is able to represent
the different concepts and their definitions. The tool is compared to other
similar ones, and the advantages of the approach are highlighted, giving some
particular examples from an ongoing SNA study about alcohol consumption habits
in adolescents
eHealth Intervention to Improve Health Habits in the Adolescent Population: Mixed Methods Study
[EN] Background:
Technology has provided a new way of life for the adolescent population. Indeed, strategies aimed at improving health-related behaviors through digital platforms can offer promising results. However, since it has been shown that peers are capable of modifying behaviors related to food and physical exercise, it is important to study whether digital interventions based on peer influence are capable of improving the weight status of adolescents.
Objective:
The purpose of this study was to assess the effectiveness of an eHealth app in an adolescent population in terms of improvements in their age- and sex-adjusted BMI percentiles. Other goals of the study were to examine the social relationships of adolescents pre- and postintervention, and to identify the group leaders and study their profiles, eating and physical activity habits, and use of the web app.
Methods:
The BMI percentiles were calculated in accordance with the reference guidelines of the World Health Organization. Participants’ diets and levels of physical activity were assessed using the Mediterranean Diet Quality Index (KIDMED) questionnaire and the Physical Activity Questionnaire for Adolescents (PAQ-A), respectively. The variables related to social networks were analyzed using the social network analysis (SNA) methodology. In this respect, peer relationships that were considered reciprocal friendships were used to compute the “degree” measure, which was used as an indicative parameter of centrality.
Results:
The sample population comprised 210 individuals in the intervention group (IG) and 91 individuals in the control group (CG). A participation rate of 60.1% (301/501) was obtained. After checking for homogeneity between the IG and the CG, it was found that adolescents in the IG at BMI percentiles both below and above the 50th percentile (P50) modified their BMI to approach this reference value (with a significance of P<.001 among individuals with an initial BMI below the P50 and P=.04 for those with an initial BMI above the P50). The diet was also improved in the IG compared with the CG (P<.001). After verifying that the social network had increased postintervention, it was seen that the group leaders (according to the degree SNA measure) were also leaders in physical activity performed (P=.002) and use of the app.
Conclusions:
The eHealth app was able to modify behaviors related to P50 compliance and exert a positive influence in relation to diet and physical exercise. Digital interventions in the adolescent population, based on the improvement in behaviors related to healthy habits and optimizing the social network, can offer promising results that help in the fight against obesity.SIThis research was funded by the Junta de Castilla y León grant number LE014G
Evaluation of Country Dietary Habits Using Machine Learning Techniques in Relation to Deaths from COVID-19
COVID-19 disease has affected almost every country in the world. The large
number of infected people and the different mortality rates between countries
has given rise to many hypotheses about the key points that make the virus so
lethal in some places. In this study, the eating habits of 170 countries were
evaluated in order to find correlations between these habits and mortality
rates caused by COVID-19 using machine learning techniques that group the
countries together according to the different distribution of fat, energy, and
protein across 23 different types of food, as well as the amount ingested in
kilograms. Results shown how obesity and the high consumption of fats appear in
countries with the highest death rates, whereas countries with a lower rate
have a higher level of cereal consumption accompanied by a lower total average
intake of kilocalories
Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)–Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study
[EN] Background: Eating disorders affect an increasing number of people. Social networks provide information that can help. Objective: We aimed to find machine learning models capable of efficiently categorizing tweets about eating disorders domain. Methods: We collected tweets related to eating disorders, for 3 consecutive months. After preprocessing, a subset of 2000 tweets was labeled: (1) messages written by people suffering from eating disorders or not, (2) messages promoting suffering from eating disorders or not, (3) informative messages or not, and (4) scientific or nonscientific messages. Traditional machine learning and deep learning models were used to classify tweets. We evaluated accuracy, F1 score, and computational time for each model. Results: A total of 1,058,957 tweets related to eating disorders were collected. were obtained in the 4 categorizations, with The bidirectional encoder representations from transformer–based models had the best score among the machine learning and deep learning techniques applied to the 4 categorization tasks (F1 scores 71.1%-86.4%). Conclusions: Bidirectional encoder representations from transformer–based models have better performance, although their computational cost is significantly higher than those of traditional techniques, in classifying eating disorder–related tweets.S
A Web-Based Tool for Automatic Data Collection, Curation, and Visualization of Complex Healthcare Survey Studies including Social Network Analysis
There is a great concern nowadays regarding alcohol consumption and drug
abuse, especially in young people. Analyzing the social environment where these
adolescents are immersed, as well as a series of measures determining the
alcohol abuse risk or personal situation and perception using a number of
questionnaires like AUDIT, FAS, KIDSCREEN, and others, it is possible to gain
insight into the current situation of a given individual regarding his/her
consumption behavior. But this analysis, in order to be achieved, requires the
use of tools that can ease the process of questionnaire creation, data
gathering, curation and representation, and later analysis and visualization to
the user. This research presents the design and construction of a web-based
platform able to facilitate each of the mentioned processes by integrating the
different phases into an intuitive system with a graphical user interface that
hides the complexity underlying each of the questionnaires and techniques used
and presenting the results in a flexible and visual way, avoiding any manual
handling of data during the process. Advantages of this approach are shown and
compared to the previous situation where some of the tasks were accomplished by
time consuming and error prone manipulations of data
- …