1,766 research outputs found

    Fact or Fiction

    Get PDF
    Fake news is increasingly pervasive, and we address its problematic aspects to help people intelligently consume news. In this project, we research machine learning models to extract objective sentences, encouraging unbiased discussions based on facts. The most accurate model, a convolutional neural network, achieves an accuracy of 85.69%. The team implemented an end-to-end web system that highlights objective sentences in user input to make our model publicly accessible. The system also provides additional information about user input, such as links to related web pages. We evaluate our system both qualitatively by interviewing users, and quantitatively with surveys consisting of rating scale questions. Received positive feedback indicates the usability of our platform

    Performance Comparison of Turkish Web Pages Classification

    Get PDF
    Nowadays., web page classification is essential for efficient and fast search engines. There is an ever-increasing need for automatic classification techniques with higher classification accuracy. In this article., a performance comparison of existing Turkish language CNN models for web pages classification systems is performed. In more detail., the content of web pages is extracted first., then preprocessing steps that aim to detect the important parts and eliminate useless contents are used. Next., Bert word embedding is integrated to represent the texts by efficient numerical vectors. Finally., three state-of-the-art CNN models that fully support the Turkish language are investigated to find the best classifier. Overall., the three studied models obtained an acceptable performance while classifying the Turkish webpages., however., the third model was able to achieve slightly better than the other two models. © 2021 IEEE

    Identifying Documents In-Scope of a Collection from Web Archives

    Full text link
    Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e.g., scientific digital libraries and repositories of technical reports. In doing so, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the huge number of documents collected by web archiving institutions. In this paper, we explore different learning models and feature representations to determine the best performing ones for identifying the documents of interest from the web archived data. Specifically, we study both machine learning and deep learning models and "bag of words" (BoW) features extracted from the entire document or from specific portions of the document, as well as structural features that capture the structure of documents. We focus our evaluation on three datasets that we created from three different Web archives. Our experimental results show that the BoW classifiers that focus only on specific portions of the documents (rather than the full text) outperform all compared methods on all three datasets.Comment: 10 page

    Sentiment Analysis of Persian Language: Review of Algorithms, Approaches and Datasets

    Full text link
    Sentiment analysis aims to extract people's emotions and opinion from their comments on the web. It widely used in businesses to detect sentiment in social data, gauge brand reputation, and understand customers. Most of articles in this area have concentrated on the English language whereas there are limited resources for Persian language. In this review paper, recent published articles between 2018 and 2022 in sentiment analysis in Persian Language have been collected and their methods, approach and dataset will be explained and analyzed. Almost all the methods used to solve sentiment analysis are machine learning and deep learning. The purpose of this paper is to examine 40 different approach sentiment analysis in the Persian Language, analysis datasets along with the accuracy of the algorithms applied to them and also review strengths and weaknesses of each. Among all the methods, transformers such as BERT and RNN Neural Networks such as LSTM and Bi-LSTM have achieved higher accuracy in the sentiment analysis. In addition to the methods and approaches, the datasets reviewed are listed between 2018 and 2022 and information about each dataset and its details are provided

    Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media

    Get PDF
    When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of statistical and non-semantic deep learning models
    corecore