7 research outputs found

    Google Snippets and Twitter Posts; Examining Similarities to Identify Misinformation

    Get PDF
    Despite numerous efforts to address the persistent issue of fake news, its proliferation continues due to the vast volume of information circulating on social media platforms. This poses a significant challenge to manual fact-checking processes. To explore a potential solution, this study investigates the applicability of Google search and its results as a practical tool for detecting fake news on platforms like Twitter. The research focuses explicitly on comparing Google search result snippets with tweets to assess their similarity and determine if such similarity can serve as an indicator of misinformation. However, the study reveals that the observed similarity between tweets and snippets does not necessarily correlate with news credibility. Consequently, alternative techniques, such as retrieving complete news articles and assessing sources, may be necessary to effectively tackle the challenge of fake news detection on social media. This research spots light on the limitations of relying solely on snippet similarity. In addition, it suggests the importance of considering comprehensive content analysis and source credibility in future works to combat misinformation

    Comparative study on sentimental analysis using machine learning techniques

    Get PDF
    With the advancement of the Internet and the world wide web (WWW), it is observed that there is an exponential growth of data and information across the internet. In addition, there is a huge growth in digital or textual data generation. This is because users post the reply comments in social media websites based on the experiences about an event or product. Furthermore, people are interested to know whether the majority of potential buyers will have a positive or negative experience on the event or the product. This kind of classification in general can be attained through Sentiment Analysis which inputs unstructured text comments about the product reviews, events, etc., from all the reviews or comments posted by users. This further classifies the data into different categories namely positive, negative or neutral opinions. Sentiment analysis can be performed by different machine learning models like CNN, Naive Bayes, Decision Tree, XgBoost, Logistic Regression etc. The proposed work is compared with the existing solutions in terms of different performance metrics and XgBoost outperforms out of all other methods

    Τεχνικές Επεξεργασίας Φυσικής Γλώσσας για Εντοπισμό και Αποφυγή Ψευδών Ειδήσεων στα Μέσα Κοινωνικής Δικτύωσης

    Get PDF
    Στην εργασία μας, διερευνούμε την ανίχνευση ψευδών tweets στο Twitter χρησιμοποιώντας την επεξεργασία φυσικής γλώσσας (NLP) με τη γλώσσα προγραμματισμού Python μέσω της εποπτευόμενης μηχανικής μάθησης. Μελετήσαμε μια ποικιλία προσεγγίσεων στο θέμα από διάφορες πηγές και συγγραφείς. Αυτό μας ενέπνευσε να συνδυάσουμε αυτές τις προσεγγίσεις με στόχο να μάθουμε ποιοι συνδυασμοί λειτουργούν καλύτερα. Για αυτό τον σκοπό, έχουμε αναπτύξει ένα εργαλείο λογισμικού, το οποίο ελέγχει το ποσοστό επιτυχίας τεσσάρων (4) διαφορετικών συστημάτων για την ανίχνευση ψευδών ειδήσεων χρησιμοποιώντας τέσσερα (4) διαφορετικά σύνολα δεδομένων, με αποτέλεσμα συνολικά δεκαέξι (16) ποσοστά επιτυχίας, ένα για κάθε συνδυασμό. Για τη δημιουργία του παραπάνω εργαλείου, χρησιμοποιήσαμε το σύνολο δεδομένων PHEME [15], το οποίο περιλαμβάνει χιλιάδες πραγματικά προ-επεξεργασμένα tweets με ετικέτα που εξάγονται μέσω του TweeterAPI [16]. Δημιουργήσαμε ένα πρόγραμμα python, το οποίο αναλύει το προαναφερθέν σύνολο δεδομένων και αποθηκεύει όλα τα tweets από αυτό σε αρχεία της μορφής .tsv. Έχουμε τέσσερα (4) διαφορετικά σύνολα δεδομένων που διαφοροποιούνται βάσει των ακόλουθων χαρακτηριστικών: 1. Πρέπει να αποδεχτούμε την ύπαρξη διπλών tweet: Μερικά από τα ίδια tweets μπορεί να έχουν κοινοποιηθεί από διάφορους χρήστες / προφίλ. 2. Πρέπει να αποδεχτούμε μια τρίτη ετικέτα για την εγκυρότητα των tweets εκτός από το "true" (αληθής είδηση) ή "false" (ψευδής είδηση) , το οποίο είναι το "undefined" (είδηση απροσδιόριστης εγκυρότητας). Αφού επιλεγεί ένα αρχείο .tsv, πραγματοποιείται η ανάλυση συναισθήματος σε κάθε tweet με τη χρήση του αλγορίθμου Sentiment Intensity Analyzer [17]. Στη συνέχεια, επεξεργάζονται τα αποτελέσματα αυτής της ανάλυσης και αποφασίζεται εάν ένα tweet θα πρέπει να επισημαίνεται ως θετικό ή αρνητικό. Στη συνέχεια, χρησιμοποιούμε ένα pipeline στο οποίο εκτελούνται κατά σειρά τα ακόλουθα βήματα: 1. Αναγνώριση λεξικών μονάδων (Tokenization) και λημματοποίηση (Lemmatization) σε αναπαράσταση σάρωσης λέξεων (bag of words) χρησιμοποιώντας NLTK 2. Διανυσματοποίηση (Vectorization) χρησιμοποιώντας τον απαριθμητή διανυσμάτων (Count Vectorizer) ή διανυσματοποιητή συχνότητας όρου – άνισης κατανομής του όρου (TF-IDF Vectorizer) μέσω της βιβλιοθήκης Scikit-Learn. 3. Ταξινόμηση (Classification) χρησιμοποιώντας Μηχανές Διανυσμάτων Υποστήριξης (Support Vector Machine) ή τον Πολυωνυμικό Απλοϊκό Ταξινομητή Bayes (Multinomial Naive Bayes) μέσω της βιβλιοθήκης Scikit-Learn. Ο συνδυασμός της γλωσσικής και της συναισθηματικής επεξεργασίας εξάγει διαφορετικά αποτελέσματα με βάση την επιλογή του αρχείου, του διανυσματοποιητή και του ταξινομητή. Το ποσοστό επιτυχίας των σωστά επισημασμένων με ετικέτα δεδομένων κυμαίνεται μεταξύ 54,6% και 99,8%.The advancement of social networks has facilitated the sharing and spread of news among people all over the world. With the growth of these networks and of the volume of news shared daily, the phenomena of fake news have become stronger and widely spread. Over the past few years, big social networks like Twitter admit that fake and duplicate accounts, fake news and fake likes exist in their networks. This stems from the fact that the social network account owners have the ability to distribute false information, to support or attack an idea or a product, to promote or demote an election candidate, as well as to influence real network users in their decision making. Therefore, misinformation detection in enhancing public trust and society stability becomes of critical importance. Along these lines, detection of misinformation is still a challenging problem for the Natural Language Processing community. In our work, we have utilized natural language processing and supervised machine learning in order to detect fake tweets using Python. We have studied a variety of approaches on the subject from various sources and authors. This inspired us to combine these approaches with the goal to find out which combinations work better. Therefore, we have developed a software tool, which checks the success ratio of four (4) different systems for fake news detection using four (4) different datasets, resulting in a total of sixteen (16) ratios, one for each combination

    Implemetación de clasificador de noticias en idioma español para la identificación de Fake News mediante el análisis, traducción automática y validación de un conjunto de datos en inglés, y el uso de técnicas de aprendizaje máquina y procesamiento de lenguaje natural

    Get PDF
    El presente trabajo presenta la implementación de un clasificador de noticias que permite la identificación de noticias falsas en el idioma español. Dicho clasificador se basa en el entrenamiento de modelos de aprendizaje automático supervisado mediante el uso de técnicas y herramientas de procesamiento de lenguaje natural. Uno de los mayores desafíos de este trabajo es la escasez de conjuntos de datos en español que puedan ser utilizados para el entrenamiento de los modelos de aprendizaje automático. Como respuesta a este desafío, se hace uso de la metodología del backtranslation y de la métrica METEOR (Banerjee y Lavie 2005) para la evaluación de la traducción automática de un conjunto de datos de noticias falsas en idioma inglés hacia el idioma español. Dicho conjunto de datos traducido es después utilizado como fuente de datos para el entrenamiento del modelo de aprendizaje automático. Debido a que ningún modelo de aprendizaje automático puede utilizar una muestra textual directamente para su entrenamiento, se realiza la implementación de un transformador que permite la extracción de características semánticas, sintácticas y de polaridad. Las características semánticas son extraídas mediante el uso de un modelo de aprendizaje automático que permite la obtención de vectores de palabras que codifican las relaciones semánticas y de significado entre palabras. Las características sintácticas son expresadas mediante la obtención de etiquetas de parte del discurso y de nombres de entidades. Las características de polaridad son obtenidas con un léxico de sentimientos en el idioma español. Se realiza además el despliegue de dicho modelo en un sistema web para su uso por usuarios externos.Tesi

    Applying insights from machine learning towards guidelines for the detection of text-based fake news

    Get PDF
    Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202

    Applying insights from machine learning towards guidelines for the detection of text-based fake news

    Get PDF
    Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202
    corecore