3,510 research outputs found

    Self-disclosure model for classifying & predicting text-based online disclosure

    Full text link
    Les mĂ©dias sociaux et les sites de rĂ©seaux sociaux sont devenus des babillards numĂ©riques pour les internautes Ă  cause de leur Ă©volution accĂ©lĂ©rĂ©e. Comme ces sites encouragent les consommateurs Ă  exposer des informations personnelles via des profils et des publications, l'utilisation accrue des mĂ©dias sociaux a gĂ©nĂ©rĂ© des problĂšmes d’invasion de la vie privĂ©e. Des chercheurs ont fait de nombreux efforts pour dĂ©tecter l'auto-divulgation en utilisant des techniques d'extraction d'informations. Des recherches rĂ©centes sur l'apprentissage automatique et les mĂ©thodes de traitement du langage naturel montrent que la comprĂ©hension du sens contextuel des mots peut entraĂźner une meilleure prĂ©cision que les mĂ©thodes d'extraction de donnĂ©es traditionnelles. Comme mentionnĂ© prĂ©cĂ©demment, les utilisateurs ignorent souvent la quantitĂ© d'informations personnelles publiĂ©es dans les forums en ligne. Il est donc nĂ©cessaire de dĂ©tecter les diverses divulgations en langage naturel et de leur donner le choix de tester la possibilitĂ© de divulgation avant de publier. Pour ce faire, ce travail propose le « SD_ELECTRA », un modĂšle de langage spĂ©cifique au contexte. Ce type de modĂšle dĂ©tecte les divulgations d'intĂ©rĂȘts, de donnĂ©es personnelles, d'Ă©ducation et de travail, de relations, de personnalitĂ©, de rĂ©sidence, de voyage et d'accueil dans les donnĂ©es des mĂ©dias sociaux. L'objectif est de crĂ©er un modĂšle linguistique spĂ©cifique au contexte sur une plate-forme de mĂ©dias sociaux qui fonctionne mieux que les modĂšles linguistiques gĂ©nĂ©raux. De plus, les rĂ©cents progrĂšs des modĂšles de transformateurs ont ouvert la voie Ă  la formation de modĂšles de langage Ă  partir de zĂ©ro et Ă  des scores plus Ă©levĂ©s. Les rĂ©sultats expĂ©rimentaux montrent que SD_ELECTRA a surpassĂ© le modĂšle de base dans toutes les mĂ©triques considĂ©rĂ©es pour la mĂ©thode de classification de texte standard. En outre, les rĂ©sultats montrent Ă©galement que l'entraĂźnement d'un modĂšle de langage avec un corpus spĂ©cifique au contexte de prĂ©entraĂźnement plus petit sur un seul GPU peut amĂ©liorer les performances. Une application Web illustrative est conçue pour permettre aux utilisateurs de tester les possibilitĂ©s de divulgation dans leurs publications sur les rĂ©seaux sociaux. En consĂ©quence, en utilisant l'efficacitĂ© du modĂšle suggĂ©rĂ©, les utilisateurs pourraient obtenir un apprentissage en temps rĂ©el sur l'auto-divulgation.Social media and social networking sites have evolved into digital billboards for internet users due to their rapid expansion. As these sites encourage consumers to expose personal information via profiles and postings, increased use of social media has generated privacy concerns. There have been notable efforts from researchers to detect self-disclosure using Information extraction (IE) techniques. Recent research on machine learning and natural language processing methods shows that understanding the contextual meaning of the words can result in better accuracy than traditional data extraction methods. Driven by the facts mentioned earlier, users are often ignorant of the quantity of personal information published in online forums, there is a need to detect various disclosures in natural language and give them a choice to test the possibility of disclosure before posting. For this purpose, this work proposes "SD_ELECTRA," a context-specific language model to detect Interest, Personal, Education and Work, Relationship, Personality, Residence, Travel plan, and Hospitality disclosures in social media data. The goal is to create a context-specific language model on a social media platform that performs better than the general language models. Moreover, recent advancements in transformer models paved the way to train language models from scratch and achieve higher scores. Experimental results show that SD_ELECTRA has outperformed the base model in all considered metrics for the standard text classification method. In addition, the results also show that training a language model with a smaller pre-training context-specific corpus on a single GPU can improve its performance. An illustrative web application designed allows users to test the disclosure possibilities in their social media posts. As a result, by utilizing the efficiency of the suggested model, users would be able to get real-time learning on self-disclosure

    Utilizing Consumer Health Posts for Pharmacovigilance: Identifying Underlying Factors Associated with Patients’ Attitudes Towards Antidepressants

    Get PDF
    Non-adherence to antidepressants is a major obstacle to antidepressants therapeutic benefits, resulting in increased risk of relapse, emergency visits, and significant burden on individuals and the healthcare system. Several studies showed that non-adherence is weakly associated with personal and clinical variables, but strongly associated with patients’ beliefs and attitudes towards medications. The traditional methods for identifying the key dimensions of patients’ attitudes towards antidepressants are associated with some methodological limitations, such as concern about confidentiality of personal information. In this study, attempts have been made to address the limitations by utilizing patients’ self report experiences in online healthcare forums to identify underlying factors affecting patients attitudes towards antidepressants. The data source of the study was a healthcare forum called “askapatients.com”. 892 patients’ reviews were randomly collected from the forum for the four most commonly prescribed antidepressants including Sertraline (Zoloft) and Escitalopram (Lexapro) from SSRI class, and Venlafaxine (Effexor) and duloxetine (Cymbalta) from SNRI class. Methodology of this study is composed of two main phases: I) generating structured data from unstructured patients’ drug reviews and testing hypotheses concerning attitude, II) identification and normalization of Adverse Drug Reactions (ADRs), Withdrawal Symptoms (WDs) and Drug Indications (DIs) from the posts, and mapping them to both The UMLS and SNOMED CT concepts. Phase II also includes testing the association between ADRs and attitude. The result of the first phase of this study showed that “experience of adverse drug reactions”, “perceived distress received from ADRs”, “lack of knowledge about medication’s mechanism”, “withdrawal experience”, “duration of usage”, and “drug effectiveness” are strongly associated with patients attitudes. However, demographic variables including “age” and “gender” are not associated with attitude. Analysis of the data in second phase of the study showed that from 6,534 identified entities, 73% are ADRs, 12% are WDs, and 15 % are drug indications. In addition, psychological and cognitive expressions have higher variability than physiological expressions. All three types of entities were mapped to 811 UMLS and SNOMED CT concepts. Testing the association between ADRs and attitude showed that from twenty-one physiological ADRs specified in the ASEC questionnaire, “dry mouth”, “increased appetite”, “disorientation”, “yawning”, “weight gain”, and “problem with sexual dysfunction” are associated with attitude. A set of psychological and cognitive ADRs, such as “emotional indifference” and “memory problem were also tested that showed significance association between these types of ADRs and attitude. The findings of this study have important implications for designing clinical interventions aiming to improve patients\u27 adherence towards antidepressants. In addition, the dataset generated in this study has significant implications for improving performance of text-mining algorithms aiming to identify health related information from consumer health posts. Moreover, the dataset can be used for generating and testing hypotheses related to ADRs associated with psychiatric mediations, and identifying factors associated with discontinuation of antidepressants. The dataset and guidelines of this study are available at https://sites.google.com/view/pharmacovigilanceinpsychiatry/hom

    A Human-Centered Approach to Improving Adolescent Online Sexual Risk Detection Algorithms

    Get PDF
    Computational risk detection has the potential to protect especially vulnerable populations from online victimization. Conducting a comprehensive literature review on computational approaches for online sexual risk detection led to the identification that the majority of this work has focused on identifying sexual predators after-the-fact. Also, many studies rely on public datasets and third-party annotators to establish ground truth and train their algorithms, which do not accurately represent young social media users and their perspectives to prevent victimization. To address these gaps, this dissertation integrated human-centered approaches to both creating representative datasets and developing sexual risk detection machine learning models to ensure the broader societal impacts of this important work. In order to understand what and how adolescents talk about their online sexual interactions to inform study designs, a thematic content analysis of posts by adolescents on an online peer support mental health was conducted. Then, a user study and web-based platform, Instagram Data Donation (IGDD), was designed to create an ecologically valid dataset. Youth could donate and annotate their Instagram data for online risks. After participating in the study, an interview study was conducted to understand how youth felt annotating data for online risks. Based on private conversations annotated by participants, sexual risk detection classifiers were created. The results indicated Convolutional Neural Network (CNN) and Random Forest models outperformed in identifying sexual risks at the conversation-level. Our experiments showed that classifiers trained on entire conversations performed better than message-level classifiers. We also trained classifiers to detect the severity risk level of a given message with CNN outperforming other models. We found that contextual (e.g., age, gender, and relationship type) and psycho-linguistic features contributed the most to accurately detecting sexual conversations. Our analysis provides insights into the important factors that enhance automated detection of sexual risks within youths\u27 private conversations

    Protectbot: A Chatbot to Protect Children on Gaming Platforms

    Get PDF
    Online gaming no longer has limited access, as it has become available to a high percentage of children in recent years. Consequently, children are exposed to multifaceted threats, such as cyberbullying, grooming, and sexting. The online gaming industry is taking concerted measures to create a safe environment for children to play and interact with, such efforts remain inadequate and fragmented. Different approaches utilizing machine learning (ML) techniques to detect child predatory behavior have been designed to provide potential detection and protection in this context. After analyzing the available AI tools and solutions it was observed that the available solutions are limited to the identification of predatory behavior in chat logs which is not enough to avert the multifaceted threats. In this thesis, we developed a chatbot Protectbot to interact with the suspect on the gaming platform. Protectbot leveraged the dialogue generative pre-trained transformer (DialoGPT) model which is based on Generative Pre-trained Transformer 2 (GPT-2). To analyze the suspect\u27s behavior, we developed a text classifier based on natural language processing that can classify the chats as predatory and non-predatory. The developed classifier is trained and tested on Pan 12 dataset. To convert the text into numerical vectors we utilized fastText. The best results are obtained by using non-linear SVM on sentence vectors obtained from fastText. We got a recall of 0.99 and an F_0.5-score of 0.99 which is better than the state-of-the-art methods. We also built a new dataset containing 71 predatory full chats retrieved from Perverted Justice. Using sentence vectors generated by fastText and KNN classifier, 66 chats out of 71 were correctly classified as predatory chats

    Towards sustainable e-learning platforms in the context of cybersecurity: A TAM-driven approach

    Get PDF
    The rapid growth of electronic learning (e-learning) platforms has raised concerns about cybersecurity risks. The vulnerability of university students to cyberattacks and privacy concerns within e-learning platforms presents a pressing issue. Students’ frequent and intense internet presence, coupled with their extensive computer usage, puts them at higher risk of being a potential victim of cyberattacks. This problem necessitates a deeper understanding in order to enhance cybersecurity measures and safeguard students’ privacy and intellectual property in educational environments. This dissertation work addresses the following research questions: (a) To what extent do cybersecurity perspectives affect student’s intention to use e-learning platforms? (b) To what extent do students’ privacy concerns affect their intention to use e-learning platforms? (c) To what extent does students’ cybersecurity awareness affect their intention to use e-learning platforms? (d) To what extent do academic integrity concerns affect their intention to use e-learning platforms? and (e) To what extent does students’ computer self-efficacy affect their intention to use e-learning platforms? This study was conducted using an enhanced version of the technology acceptance model (TAM3) to examine the factors influencing students’ intention to use e-learning platforms. The study involved undergraduate and graduate students at Eastern Michigan University, and data were collected through a web-based questionnaire. The questionnaire was developed using the Qualtrics tool and included validated measures and scales with close-ended questions. The collected data were analyzed using SPSS 28, and the significance level for hypothesis testing was set at 0.05. Out of 6,800 distributed surveys, 590 responses were received, and after data cleaning, 582 responses were included in the final sample. The findings revealed that cybersecurity perspectives, cybersecurity awareness, academic integrity concerns, and computer self-efficacy significantly influenced students’ intention to use e-learning platforms. The study has implications for practitioners, educators, and researchers involved in designing secure e-learning platforms, emphasizing the importance of cybersecurity and recommending effective cybersecurity training programs to enhance user engagement. Overall, the study highlights the role of cybersecurity in promoting the adoption and usage of e-learning platforms, providing valuable insights for developers and educators to create secure e-learning environments and benefiting stakeholders in the e-learning industry

    Graph-based, systems approach for detecting violent extremist radicalization trajectories and other latent behaviors, A

    Get PDF
    2017 Summer.Includes bibliographical references.The number and lethality of violent extremist plots motivated by the Salafi-jihadist ideology have been growing for nearly the last decade in both the U.S and Western Europe. While detecting the radicalization of violent extremists is a key component in preventing future terrorist attacks, it remains a significant challenge to law enforcement due to the issues of both scale and dynamics. Recent terrorist attack successes highlight the real possibility of missed signals from, or continued radicalization by, individuals whom the authorities had formerly investigated and even interviewed. Additionally, beyond considering just the behavioral dynamics of a person of interest is the need for investigators to consider the behaviors and activities of social ties vis-Ă -vis the person of interest. We undertake a fundamentally systems approach in addressing these challenges by investigating the need and feasibility of a radicalization detection system, a risk assessment assistance technology for law enforcement and intelligence agencies. The proposed system first mines public data and government databases for individuals who exhibit risk indicators for extremist violence, and then enables law enforcement to monitor those individuals at the scope and scale that is lawful, and account for the dynamic indicative behaviors of the individuals and their associates rigorously and automatically. In this thesis, we first identify the operational deficiencies of current law enforcement and intelligence agency efforts, investigate the environmental conditions and stakeholders most salient to the development and operation of the proposed system, and address both programmatic and technical risks with several initial mitigating strategies. We codify this large effort into a radicalization detection system framework. The main thrust of this effort is the investigation of the technological opportunities for the identification of individuals matching a radicalization pattern of behaviors in the proposed radicalization detection system. We frame our technical approach as a unique dynamic graph pattern matching problem, and develop a technology called INSiGHT (Investigative Search for Graph Trajectories) to help identify individuals or small groups with conforming subgraphs to a radicalization query pattern, and follow the match trajectories over time. INSiGHT is aimed at assisting law enforcement and intelligence agencies in monitoring and screening for those individuals whose behaviors indicate a significant risk for violence, and allow for the better prioritization of limited investigative resources. We demonstrated the performance of INSiGHT on a variety of datasets, to include small synthetic radicalization-specific data sets, a real behavioral dataset of time-stamped radicalization indicators of recent U.S. violent extremists, and a large, real-world BlogCatalog dataset serving as a proxy for the type of intelligence or law enforcement data networks that could be utilized to track the radicalization of violent extremists. We also extended INSiGHT by developing a non-combinatorial neighbor matching technique to enable analysts to maintain visibility of potential collective threats and conspiracies and account for the role close social ties have in an individual's radicalization. This enhancement was validated on small, synthetic radicalization-specific datasets as well as the large BlogCatalog dataset with real social network connections and tagging behaviors for over 80K accounts. The results showed that our algorithm returned whole and partial subgraph matches that enabled analysts to gain and maintain visibility on neighbors' activities. Overall, INSiGHT led to consistent, informed, and reliable assessments about those who pose a significant risk for some latent behavior in a variety of settings. Based upon these results, we maintain that INSiGHT is a feasible and useful supporting technology with the potential to optimize law enforcement investigative efforts and ultimately enable the prevention of individuals from carrying out extremist violence. Although the prime motivation of this research is the detection of violent extremist radicalization, we found that INSiGHT is applicable in detecting latent behaviors in other domains such as on-line student assessment and consumer analytics. This utility was demonstrated through experiments with real data. For on-line student assessment, we tested INSiGHT on a MOOC dataset of students and time-stamped on-line course activities to predict those students who persisted in the course. For consumer analytics, we tested the performance on a real, large proprietary consumer activities dataset from a home improvement retailer. Lastly, motivated by the desire to validate INSiGHT as a screening technology when ground truth is known, we developed a synthetic data generator of large population, time-stamped, individual-level consumer activities data consistent with an a priori project set designation (latent behavior). This contribution also sets the stage for future work in developing an analogous synthetic data generator for radicalization indicators to serve as a testbed for INSiGHT and other data mining algorithms

    Themes and Participants’ Role in Online Health Discussion: Evidence From Reddit

    Get PDF
    Health-related topics are discussed widely on different social networking sites. These discussions and their related aspects can reveal significant insights and patterns that are worth studying and understanding. In this dissertation, we explore the patterns of mandatory and voluntary vaccine online discussions including the topics discussed, the words correlated with each of them, and the sentiment expressed. Moreover, we explore the role opinion leaders play in the health discussion and their impact on participation in a particular discussion. Opinion leaders are determined, and their impact on discussion participation is differentiated based on their different characteristics such as their connections and locations in the social network, their content, and their sentiment. We apply social network analysis, topic modeling, sentiment analysis, machine learning, econometric analysis, and other techniques to analyze the collected data from Reddit. The results of our analyses show that sentiment is an important factor in health discussion, and it varies between different types of discussions. In addition, we identified the main topics discussed for each vaccine. Furthermore, the results of our study found that global opinion leaders have more influence compared to local opinion leaders in elevating the health discussion. Our study has important theoretical and practical implications

    Machine Intelligence in Africa: a survey

    Full text link
    In the last 5 years, the availability of large audio datasets in African countries has opened unlimited opportunities to build machine intelligence (MI) technologies that are closer to the people and speak, learn, understand, and do businesses in local languages, including for those who cannot read and write. Unfortunately, these audio datasets are not fully exploited by current MI tools, leaving several Africans out of MI business opportunities. Additionally, many state-of-the-art MI models are not culture-aware, and the ethics of their adoption indexes are questionable. The lack thereof is a major drawback in many applications in Africa. This paper summarizes recent developments in machine intelligence in Africa from a multi-layer multiscale and culture-aware ethics perspective, showcasing MI use cases in 54 African countries through 400 articles on MI research, industry, government actions, as well as uses in art, music, the informal economy, and small businesses in Africa. The survey also opens discussions on the reliability of MI rankings and indexes in the African continent as well as algorithmic definitions of unclear terms used in MI.Comment: Accepted and to be presented at DSAI 202

    Annual Report of Undergraduate Research Fellows, August 2012 to May 2013

    Get PDF
    Annual Report of Undergraduate Research Fellows from August 2012 to May 2013
    • 

    corecore