29 research outputs found

    Using Twitter to Detect Hate Crimes and Their Motivations: The HateMotiv Corpus

    No full text
    With the rapidly increasing use of social media platforms, much of our lives is spent online. Despite the great advantages of using social media, unfortunately, the spread of hate, cyberbullying, harassment, and trolling can be very common online. Many extremists use social media platforms to communicate their messages of hatred and spread violence, which may result in serious psychological consequences and even contribute to real-world violence. Thus, the aim of this research was to build the HateMotiv corpus, a freely available dataset that is annotated for types of hate crimes and the motivation behind committing them. The dataset was developed using Twitter as an example of social media platforms and could provide the research community with a very unique, novel, and reliable dataset. The dataset is unique as a consequence of its topic-specific nature and its detailed annotation. The corpus was annotated by two annotators who are experts in annotation based on unified guidelines, so they were able to produce an annotation of a high standard with F-scores for the agreement rate as high as 0.66 and 0.71 for type and motivation labels of hate crimes, respectively

    Building a semantically annotated corpus for chronic disease complications using two document types.

    No full text
    Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively

    Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain- Specific Terminological Resource

    No full text
    Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus-a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm's wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks

    Artificial Neural Network-Based Mechanism to Detect Security Threats in Wireless Sensor Networks

    No full text
    Wireless sensor networks (WSNs) are essential in many areas, from healthcare to environmental monitoring. However, WSNs are vulnerable to routing attacks that might jeopardize network performance and data integrity due to their inherent vulnerabilities. This work suggests a unique method for enhancing WSN security through the detection of routing threats using feed-forward artificial neural networks (ANNs). The proposed solution makes use of ANNs’ learning capabilities to model the network’s dynamic behavior and recognize routing attacks like black-hole, gray-hole, and wormhole attacks. CICIDS2017 is a heterogeneous dataset that was used to train and test the proposed system in order to guarantee its robustness and adaptability. The system’s ability to recognize both known and novel attack patterns enhances its efficacy in real-world deployment. Experimental assessments using an NS2 simulator show how well the proposed method works to improve routing protocol security. The proposed system’s performance was assessed using a confusion matrix. The simulation and analysis demonstrated how much better the proposed system performs compared to the existing methods for routing attack detection. With an average detection rate of 99.21% and a high accuracy of 99.49%, the proposed system minimizes the rate of false positives. The study advances secure communication in WSNs and provides a reliable means of protecting sensitive data in resource-constrained settings

    Using Social Media to Detect Fake News Information Related to Product Marketing: The FakeAds Corpus

    No full text
    Nowadays, an increasing portion of our lives is spent interacting online through social media platforms, thanks to the widespread adoption of the latest technology and the proliferation of smartphones. Obtaining news from social media platforms is fast, easy, and less expensive compared with other traditional media platforms, e.g., television and newspapers. Therefore, social media is now being exploited to disseminate fake news and false information. This research aims to build the FakeAds corpus, which consists of tweets for product advertisements. The aim of the FakeAds corpus is to study the impact of fake news and false information in advertising and marketing materials for specific products and which types of products (i.e., cosmetics, health, fashion, or electronics) are targeted most on Twitter to draw the attention of consumers. The corpus is unique and novel, in terms of the very specific topic (i.e., the role of Twitter in disseminating fake news related to production promotion and advertisement) and also in terms of its fine-grained annotations. The annotation guidelines were designed with guidance by a domain expert, and the annotation is performed by two domain experts, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.815

    PhenoNorm normalisation workflow.

    No full text
    <p>PhenoNorm normalisation workflow.</p

    Micro-averaged performance comparison of PhenoNorm against other normsalisation approaches applied to the NCBI disease corpus.

    No full text
    <p>Micro-averaged performance comparison of PhenoNorm against other normsalisation approaches applied to the NCBI disease corpus.</p