1,500 research outputs found

    Towards the ontology-based approach for factual information matching

    Get PDF
    Factual information is information based on facts or relating to facts. The reliability of automatically extracted facts is the main problem of processing factual information. The fact retrieval system remains one of the most effective tools for identifying the information for decision-making. In this work, we explore how can natural language processing methods and problem domain ontology help to check contradictions and mismatches in facts automatically

    Fake Content Detection in the Information Exponential Spreading Era

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementRecent years brought an information access democratization, allowing people to access a huge amount of information and the ability to share it, in a way that it can easily reach millions of people in a very short time. This allows to have right and wrong uses of this capabilities, that in some cases can be used to spread malicious content to achieve some sort of goal. Several studies have been made regarding text mining and sentiment analysis, aiming to spot fake information and avoid misinformation spreading. The trustworthiness and veracity of the information that is accessible to people is getting increasingly important, and in some cases critical, and can be seen has a huge challenge for the current digital era. This problem might be addressed with the help of science and technology. One question that we can do to ourselves is: How do we guarantee that there is a correct use of information, and that people can trust in the veracity of it? Using mathematics and statistics, combined with machine learning classification and predictive algorithms, using the current computation power of information systems, can help minimize the problem, or at least spot the potential fake information. One suggests developing a research work that aims to reach a model for the prediction of a given text content is trustworthy. The results were promising reaching a predicting model with good performance

    Data Science, Machine learning and big data in Digital Journalism: A survey of state-of-the-art, challenges and opportunities

    Get PDF
    Digital journalism has faced a dramatic change and media companies are challenged to use data science algo-rithms to be more competitive in a Big Data era. While this is a relatively new area of study in the media landscape, the use of machine learning and artificial intelligence has increased substantially over the last few years. In particular, the adoption of data science models for personalization and recommendation has attracted the attention of several media publishers. Following this trend, this paper presents a research literature analysis on the role of Data Science (DS) in Digital Journalism (DJ). Specifically, the aim is to present a critical literature review, synthetizing the main application areas of DS in DJ, highlighting research gaps, challenges, and op-portunities for future studies. Through a systematic literature review integrating bibliometric search, text min-ing, and qualitative discussion, the relevant literature was identified and extensively analyzed. The review reveals an increasing use of DS methods in DJ, with almost 47% of the research being published in the last three years. An hierarchical clustering highlighted six main research domains focused on text mining, event extraction, online comment analysis, recommendation systems, automated journalism, and exploratory data analysis along with some machine learning approaches. Future research directions comprise developing models to improve personalization and engagement features, exploring recommendation algorithms, testing new automated jour-nalism solutions, and improving paywall mechanisms.Acknowledgements This work was supported by the FCT-Funda?a ? o para a Ciência e Tecnologia, under the Projects: UIDB/04466/2020, UIDP/04466/2020, and UIDB/00319/2020

    Automated Fake News detection using computational Forensic Linguistics

    Get PDF
    In our society, everyone has access to the internet and can post anything about any topic at any time. Despite its many advantages, this possibility brought along a serious problem: Fake News. Fake News is news that is not real for not following journalism principles. Instead, Fake News try to mimic the look and feel of real news with the intent to disinform the reader. However, what makes Fake News a real problem is the influence that it can have on our society. Lay people are attracted to this kind of news and often give more attention to them than truthful accounts. Despite the development of systems to detect Fake News, most are based on fact-checking methods, which are unfit when the news's truth is distorted, exaggerated, or even placed out of context. We aim to detect Portuguese Fake News using machine learning techniques with a Forensic Linguistic approach. Contrary to previous approaches, our approach builds upon linguistic and stylistic analysis methods that have been tried and tested in Forensic Linguistic analysis. After collecting the corpus from multiple sources, we formulated the task as a text classification problem and demonstrated the proposed classifier's capability for detecting Fake News. The results reported are promising, achieving high accuracies on the test data

    A comparison of machine learning approaches for predicting ClaimReview markup attributes from fact-checking websites

    Get PDF
    The spreading of fake news is a reality within modern times. However, in the daily fight against disinformation, the fact-checking agencies are one of the strongest allies. Some techniques have been in place to help in this battle, and one of them is the ClaimReview web markup, which had been introduced to grant access to fact-checking articles meaning by search engines. Despite its importance within this context, barely half of the fact-checkers have adopted it. Therefore, in this work, we provide a starting point for the automatic generation of ClaimReview markup, investigating means to predict Claim- Review’s attributes using machine learning models. By experimenting and comparing the baseline approach, Support Vector Machine, with the state-of-the-art (BERT) we have achieved noticeable results, creating a benchmark for upcoming researches in this domain

    MISNIS: an intelligent platform for Twitter topic mining

    Get PDF
    Twitter has become a major tool for spreading news, for dissemination of positions and ideas, and for the commenting and analysis of current world events. However, with more than 500 million tweets flowing per day, it is necessary to find efficient ways of collecting, storing, managing, mining and visualizing all this information. This is especially relevant if one considers that Twitter has no ways of indexing tweet contents, and that the only available categorization “mechanism” is the #hashtag, which is totally dependent of a user's will to use it. This paper presents an intelligent platform and framework, named MISNIS - Intelligent Mining of Public Social Networks’ Influence in Society - that facilitates these issues and allows a non-technical user to easily mine a given topic from a very large tweet's corpus and obtain relevant contents and indicators such as user influence or sentiment analysis. When compared to other existent similar platforms, MISNIS is an expert system that includes specifically developed intelligent techniques that: (1) Circumvent the Twitter API restrictions that limit access to 1% of all flowing tweets. The platform has been able to collect more than 80% of all flowing portuguese language tweets in Portugal when online; (2) Intelligently retrieve most tweets related to a given topic even when the tweets do not contain the topic #hashtag or user indicated keywords. A 40% increase in the number of retrieved relevant tweets has been reported in real world case studies. The platform is currently focused on Portuguese language tweets posted in Portugal. However, most developed technologies are language independent (e.g. intelligent retrieval, sentiment analysis, etc.), and technically MISNIS can be easily expanded to cover other languages and locations

    Veracity vs. Reliability: Changing the Approach of Our Annotation Guideline

    Get PDF
    This paper presents the evolution of an annotation guideline designed for the disinformation detection task, an essential step of my doctoral thesis. The annotation proposal aims to label all the structural and content elements of a news item, as well as to classify them as Reliable or Unreliable. The initial objective was to annotate those elements into Fake or True, but for that classification, world knowledge is needed. Our current goal is to annotate news on the basis of a purely textual, semantic and linguistic analysis, without using external knowledge and, for that reason, the annotation was redirected towards a reliability rating, rather than a veracity classification. This article justifies the change of perspective at this stage of the thesis, explains the difference between veracity and reliability and shows the concrete changes that have been adopted in our annotation proposal with this new approach.This research work has been partially funded by the Spanish Government and Fondo Europeo de Desarrollo Regional (FEDER) through the project Modelang: Modeling the behavior of digital entities by Human Language Technologies (RTI2018-094653-B-C22) as well as supported by a grant from the Consellería de Innovación, Universidades, Ciencia y Sociedad Digital (ACIF/2020/177) from the Spanish Government
    corecore