69 research outputs found

    Social media data mining: tools for collecting Twitter data

    Get PDF
    [ES] En la actualidad la minería de datos de los medios sociales suele estar centrada en recopilar información procedente de Twitter. Uno de los principales problemas en el análisis de redes sociales en estos casos es la adquisición de los datos en forma de grafo. La posibilidad de hacer esta labor «a mano», es impensable cuando se tratan redes de cientos o miles de nodos en constante comunicación entre ellos. Twitter, además, es un ejemplo muy claro de esta problemática. De cualquier trending topic se pueden generan cientos de tweets en una hora. Es necesario herramientas de adquisición de todos esos datos de una forma automatizada utilizando las posibilidades que la propia red social ofrece. Un ejemplo de herramienta que ofrece una obtención de datos, con limitaciones, es la herramienta Netlytic. No es la única, pero es sencilla de utilizar. En este documento se muestra un caso de uso de un proyecto recoplado con Netlytic y graficado mediante Gephi[EN] Currently, social media data mining is often focused on collecting information from Twitter. One of the main problems in social network analysis in these cases is the acquisition of data in the form of a graph. The possibility of doing this task "by hand" is unthinkable when dealing with networks of hundreds or thousands of nodes in constant communication with each other. Twitter, moreover, is a very clear example of this problem. Any trending topic can generate hundreds of tweets in an hour. Tools are needed to acquire all this data in an automated way using the possibilities offered by the social network itself. An example of a tool that offers data acquisition, with limitations, is the Netlytic tool. It is not the only one, but it is simple to use. This document shows a use case of a project collected with Netlytic and plotted using Gephi

    Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

    Get PDF
    [EN] Gradient Boost Decision Trees (GBDT) is a powerful additive model based on tree ensembles. Its nature makes GBDT a black-box model even though there are multiple explainable artificial intelligence (XAI) models obtaining information by reinterpreting the model globally and locally. Each tree of the ensemble is a transparent model itself but the final outcome is the result of a sum of these trees and it is not easy to clarify. In this paper, a feature contribution method for GBDT is developed. The proposed method takes advantage of the GBDT architecture to calculate the contribution of each feature using the residue of each node. This algorithm allows to calculate the sequence of node decisions given a prediction. Theoretical proofs and multiple experiments have been carried out to demonstrate the performance of our method which is not only a local explicability model for the GBDT algorithm but also a unique option that reflects GBDTs internal behavior. The proposal is aligned to the contribution of characteristics having impact in some artificial intelligence problems such as ethical analysis of Artificial Intelligence (AI) and comply with the new European laws such as the General Data Protection Regulation (GDPR) about the right to explain and nondiscrimination.S

    An Ontology-Based multi-domain model in Social Network Analysis: Experimental validation and case study

    Full text link
    The use of social network theory and methods of analysis have been applied to different domains in recent years, including public health. The complete procedure for carrying out a social network analysis (SNA) is a time-consuming task that entails a series of steps in which the expert in social network analysis could make mistakes. This research presents a multi-domain knowledge model capable of automatically gathering data and carrying out different social network analyses in different domains, without errors and obtaining the same conclusions that an expert in SNA would obtain. The model is represented in an ontology called OntoSNAQA, which is made up of classes, properties and rules representing the domains of People, Questionnaires and Social Network Analysis. Besides the ontology itself, different rules are represented by SWRL and SPARQL queries. A Knowledge Based System was created using OntoSNAQA and applied to a real case study in order to show the advantages of the approach. Finally, the results of an SNA analysis obtained through the model were compared to those obtained from some of the most widely used SNA applications: UCINET, Pajek, Cytoscape and Gephi, to test and confirm the validity of the model

    The socialisation of the adolescent who carries out team sports: a transversal study of centrality with a social network analysis

    Get PDF
    [ES] Se analiza la actividad física realizada por los adolescentes del estudio, su relación con el sobrepeso (sobrepeso+obesidad) y la estructura de la red social de amistad establecida en adolescentes que practican deportes colectivos, utilizando diferentes parámetros indicativos de centralidad.[EN]Objectives To analyse the physical activity carried out by the adolescents in the study, its relationship to being overweight (overweight+obese) and to analyse the structure of the social network of friendship established in adolescents doing group sports, using different parameters indicative of centrality. Setting It was carried out in an educational environment, in 11 classrooms belonging to 5 Schools in Ponferrada (Spain). Participants 235 adolescents were included in the study (49.4% female), who were classified as normal weight or overweight. Primary and secondary outcome measures Physical Activity Questionnaire for Adolescents (PAQ-A) was used to study the level of physical activity. A social network analysis was carried out to analyse structural variables of centrality in different degrees of contact. Results 30.2% of the participants in our study were overweight. Relative to female participants in this study, males obtained significantly higher scores in the PAQ-A (OR: 2.11; 95% CI: 1.04 to 4.25; p value: 0.036) and were more likely to participate in group sport (OR: 4.59; 95% CI: 2.28 to 9.22; p value: 0.000). We found no significant relationship between physical activity and the weight status in the total sample, but among female participants, those with overweight status had higher odds of reporting high levels of physical exercise (OR: 4.50; 95% CI: 1.21 to 16.74; p value: 0.025). In terms of centrality, differentiating by gender, women who participated in group sports were more likely to be classified as having low values of centrality, while the opposite effect occurred for men, more likely to be classified as having high values of centrality. Conclusions Our findings, with limitations, underline the importance of two fundamental aspects to be taken into account in the design of future strategies: gender and the centrality within the social network depending on the intensity of contact they have with their peers

    The socialisation of the adolescent who carries out team sports: a transversal study of centrality with a social network analysis

    Get PDF
    [EN] Objectives To analyse the physical activity carried out by the adolescents in the study, its relationship to being overweight (overweight+obese) and to analyse the structure of the social network of friendship established in adolescents doing group sports, using different parameters indicative of centrality. Setting It was carried out in an educational environment, in 11 classrooms belonging to 5 Schools in Ponferrada (Spain). Participants 235 adolescents were included in the study (49.4% female), who were classified as normal weight or overweight. Primary and secondary outcome measures Physical Activity Questionnaire for Adolescents (PAQ-A) was used to study the level of physical activity. A social network analysis was carried out to analyse structural variables of centrality in different degrees of contact. Results 30.2% of the participants in our study were overweight. Relative to female participants in this study, males obtained significantly higher scores in the PAQ-A (OR: 2.11; 95% CI: 1.04 to 4.25; p value: 0.036) and were more likely to participate in group sport (OR: 4.59; 95% CI: 2.28 to 9.22; p value: 0.000). We found no significant relationship between physical activity and the weight status in the total sample, but among female participants, those with overweight status had higher odds of reporting high levels of physical exercise (OR: 4.50; 95% CI: 1.21 to 16.74; p value: 0.025). In terms of centrality, differentiating by gender, women who participated in group sports were more likely to be classified as having low values of centrality, while the opposite effect occurred for men, more likely to be classified as having high values of centrality. Conclusions Our findings, with limitations, underline the importance of two fundamental aspects to be taken into account in the design of future strategies: gender and the centrality within the social network depending on the intensity of contact they have with their peers.S

    A Semantic Social Network Analysis Tool for Sensitivity Analysis and What-If Scenario Testing in Alcohol Consumption Studies

    Full text link
    Social Network Analysis (SNA) is a set of techniques developed in the field of social and behavioral sciences research, in order to characterize and study the social relationships that are established among a set of individuals. When building a social network for performing an SNA analysis, an initial process of data gathering is achieved in order to extract the characteristics of the individuals and their relationships. This is usually done by completing a questionnaire containing different types of questions that will be later used to obtain the SNA measures needed to perform the study. There are, then, a great number of different possible network generating questions and also many possibilities for mapping the responses to the corresponding characteristics and relationships. Many variations may be introduced into these questions (the way they are posed, the weights given to each of the responses, etc.) that may have an effect on the resulting networks. All these different variations are difficult to achieve manually, because the process is time-consuming and error prone. The tool described in this paper uses semantic knowledge representation techniques in order to facilitate this kind of sensitivity studies. The base of the tool is a conceptual structure, called "ontology" that is able to represent the different concepts and their definitions. The tool is compared to other similar ones, and the advantages of the approach are highlighted, giving some particular examples from an ongoing SNA study about alcohol consumption habits in adolescents

    eHealth Intervention to Improve Health Habits in the Adolescent Population: Mixed Methods Study

    Get PDF
    [EN] Background: Technology has provided a new way of life for the adolescent population. Indeed, strategies aimed at improving health-related behaviors through digital platforms can offer promising results. However, since it has been shown that peers are capable of modifying behaviors related to food and physical exercise, it is important to study whether digital interventions based on peer influence are capable of improving the weight status of adolescents. Objective: The purpose of this study was to assess the effectiveness of an eHealth app in an adolescent population in terms of improvements in their age- and sex-adjusted BMI percentiles. Other goals of the study were to examine the social relationships of adolescents pre- and postintervention, and to identify the group leaders and study their profiles, eating and physical activity habits, and use of the web app. Methods: The BMI percentiles were calculated in accordance with the reference guidelines of the World Health Organization. Participants’ diets and levels of physical activity were assessed using the Mediterranean Diet Quality Index (KIDMED) questionnaire and the Physical Activity Questionnaire for Adolescents (PAQ-A), respectively. The variables related to social networks were analyzed using the social network analysis (SNA) methodology. In this respect, peer relationships that were considered reciprocal friendships were used to compute the “degree” measure, which was used as an indicative parameter of centrality. Results: The sample population comprised 210 individuals in the intervention group (IG) and 91 individuals in the control group (CG). A participation rate of 60.1% (301/501) was obtained. After checking for homogeneity between the IG and the CG, it was found that adolescents in the IG at BMI percentiles both below and above the 50th percentile (P50) modified their BMI to approach this reference value (with a significance of P<.001 among individuals with an initial BMI below the P50 and P=.04 for those with an initial BMI above the P50). The diet was also improved in the IG compared with the CG (P<.001). After verifying that the social network had increased postintervention, it was seen that the group leaders (according to the degree SNA measure) were also leaders in physical activity performed (P=.002) and use of the app. Conclusions: The eHealth app was able to modify behaviors related to P50 compliance and exert a positive influence in relation to diet and physical exercise. Digital interventions in the adolescent population, based on the improvement in behaviors related to healthy habits and optimizing the social network, can offer promising results that help in the fight against obesity.SIThis research was funded by the Junta de Castilla y León grant number LE014G

    Evaluation of Country Dietary Habits Using Machine Learning Techniques in Relation to Deaths from COVID-19

    Full text link
    COVID-19 disease has affected almost every country in the world. The large number of infected people and the different mortality rates between countries has given rise to many hypotheses about the key points that make the virus so lethal in some places. In this study, the eating habits of 170 countries were evaluated in order to find correlations between these habits and mortality rates caused by COVID-19 using machine learning techniques that group the countries together according to the different distribution of fat, energy, and protein across 23 different types of food, as well as the amount ingested in kilograms. Results shown how obesity and the high consumption of fats appear in countries with the highest death rates, whereas countries with a lower rate have a higher level of cereal consumption accompanied by a lower total average intake of kilocalories

    Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)–Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

    Get PDF
    [EN] Background: Eating disorders affect an increasing number of people. Social networks provide information that can help. Objective: We aimed to find machine learning models capable of efficiently categorizing tweets about eating disorders domain. Methods: We collected tweets related to eating disorders, for 3 consecutive months. After preprocessing, a subset of 2000 tweets was labeled: (1) messages written by people suffering from eating disorders or not, (2) messages promoting suffering from eating disorders or not, (3) informative messages or not, and (4) scientific or nonscientific messages. Traditional machine learning and deep learning models were used to classify tweets. We evaluated accuracy, F1 score, and computational time for each model. Results: A total of 1,058,957 tweets related to eating disorders were collected. were obtained in the 4 categorizations, with The bidirectional encoder representations from transformer–based models had the best score among the machine learning and deep learning techniques applied to the 4 categorization tasks (F1 scores 71.1%-86.4%). Conclusions: Bidirectional encoder representations from transformer–based models have better performance, although their computational cost is significantly higher than those of traditional techniques, in classifying eating disorder–related tweets.S

    A Web-Based Tool for Automatic Data Collection, Curation, and Visualization of Complex Healthcare Survey Studies including Social Network Analysis

    Full text link
    There is a great concern nowadays regarding alcohol consumption and drug abuse, especially in young people. Analyzing the social environment where these adolescents are immersed, as well as a series of measures determining the alcohol abuse risk or personal situation and perception using a number of questionnaires like AUDIT, FAS, KIDSCREEN, and others, it is possible to gain insight into the current situation of a given individual regarding his/her consumption behavior. But this analysis, in order to be achieved, requires the use of tools that can ease the process of questionnaire creation, data gathering, curation and representation, and later analysis and visualization to the user. This research presents the design and construction of a web-based platform able to facilitate each of the mentioned processes by integrating the different phases into an intuitive system with a graphical user interface that hides the complexity underlying each of the questionnaires and techniques used and presenting the results in a flexible and visual way, avoiding any manual handling of data during the process. Advantages of this approach are shown and compared to the previous situation where some of the tasks were accomplished by time consuming and error prone manipulations of data
    corecore