5 research outputs found

    A News Verification Browser for the Detection of Clickbait, Satire, and Falsified News

    Get PDF
    The LiT.RL News Verification Browser is a research tool for news readers, journalists, editors or information professionals. The tool analyzes the language used in digital news web pages to determine if they are clickbait, satirical news, or falsified news, and visualizes the results by highlighting content in color-coded categories. Although the clickbait, satire, and falsification detectors perform to certain accuracy levels on test data, during real-world internet use accuracy may vary. The browser is not a replacement for digital literacy and is not always correct. All processing is completed on the local machine - results are not sent to or from a remote server. Results may be saved locally to a standard SQLite database for further analysis

    Exploiting Semantic Similarity Between Citation Contexts For Direct Citation Weighting And Residual Citation

    Get PDF
    This study used the semantic similarity between citation contexts to develop one scheme for weighting direct citations, and another scheme for allocating residual citations to a publication from its nth citation generation level publication. A relationship between the new direct citation weighting scheme and each of five existing schemes was investigated while the new residual citation scheme was compared with the cascading citation scheme. Two datasets from biomedical publications were used for this study, one each for the direct and residual citation weighting aspects of the study. The sample for the direct citation aspect contained 100 publications that received 7317 citations, 11,234 citation contexts, and 9,795 citation context pairs. A sample of 981 citation context pairs was given to two human experts for annotation into “similar”, “somewhat similar”, and “not similar” classes. Semantic similarity scores between the 11,234 citation contexts were obtained using BioSent2Vec word-embedding model for biomedical publications. The residual citation aspect sample included ten base articles and five generations of citations from which 5272 citation context pairs were obtained. Results of the Spearman’s rank correlation test showed that the correlation coefficients between the proposed direct citation weighting scheme and each of the weighting schemes “number of positive sentiments,” “number of multiple citation mentions,” “sum of multiple citation mentions,” “number of citations,” and “number of citation mentions” were .83, .89, .89, .93, and .99 respectively. The average residual citations received from the 2nd, 3rd, 4th and 5th citation generation level papers were 0.47, 0.43, 0.40, and 0.37 respectively. These average residual citations were significantly different from the averages of 0.5, 0.25, 0.125, and 0.0625 suggested by the cascading citation scheme. Even though the proposed direct citation weighting scheme and the residual citation scheme require more complex computations, it is recommended that they should be considered as credible alternatives to the “number of citation mentions” and cascading citation scheme respectively

    Applying insights from machine learning towards guidelines for the detection of text-based fake news

    Get PDF
    Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202

    Applying insights from machine learning towards guidelines for the detection of text-based fake news

    Get PDF
    Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202

    Recreational nastiness or playful mischief? Contrasting perspectives on internet trolling between news media and avid internet users

    Get PDF
    The term “internet trolling” has come to encompass a wide range of disparate behaviours: ranging from abusive speech and computer hacking to sarcastic humour and friendly teasing. While some of these behaviours are clearly antisocial and, in extreme cases, criminal, others are harmless and can even be prosocial. Previous studies have shown that self-identified internet trollers tend to credit internet trolling’s poor reputation to misunderstanding and overreaction from people unfamiliar with internet culture and humour, whereas critics of trolling have argued that the term has been used to downplay and gloss over problematic transgressive behaviour. As the internet has come to dominate much of our everyday lives as a place of work, play, learn, and connection with other people, it is imperative that harmful trolling behaviours can be identified and managed in nuanced ways that do not unnecessarily suppress harmless activities. This thesis disambiguates some of the competing and contrary ideas about internet trolling by comparing perceptions of trolling drawn from two sources in two studies. Study 1 was a content analysis of 240 articles sampled from 11 years of English language news articles mentioning internet trolling to establish a ”mainstream” perspective. Study 2 was a series of in-depth, semi-structured interviews with 20 participants who self-identified as avid internet users familiar with internet trolling as part of their everyday internet use. Study 1 found that 97% of the news articles portrayed internet trolling in a negative light, with reporting about harassment and online hostility being the most common. By contrast, Study 2 found that 30% of the 20 participants held mostly positive views of trolling, 25% mostly negative, and 45% were ambivalent. Analysis of these two studies reveal four characteristics of internet trolling interactions which can serve as a framework for evaluating potential risk of harm: 1) targetedness, 2) embodiedness, 3) ability to disengage, and 4) troller intent. This thesis argues that debate over the definition of “trolling” is not useful for the purposes of addressing online harm. Instead, the proposed framework can be used to identify harmful online behaviours, regardless of what they are called
    corecore