Search CORE

1,766 research outputs found

Fact or Fiction

Author: Lovering Charles J
Lu Anqi
Nguyen Dinh Cuong Tri
Thanh Nguyen Huyen Bui
Publication venue: Digital WPI
Publication date: 09/01/2018
Field of study

Fake news is increasingly pervasive, and we address its problematic aspects to help people intelligently consume news. In this project, we research machine learning models to extract objective sentences, encouraging unbiased discussions based on facts. The most accurate model, a convolutional neural network, achieves an accuracy of 85.69%. The team implemented an end-to-end web system that highlights objective sentences in user input to make our model publicly accessible. The system also provides additional information about user input, such as links to related web pages. We evaluate our system both qualitatively by interviewing users, and quantitatively with surveys consisting of rating scale questions. Received positive feedback indicates the usability of our platform

DigitalCommons@WPI

Performance Comparison of Turkish Web Pages Classification

Author: Alqaraleh Saed
Nergiz Sirin Hatice Meltem
Ozkan Furkan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Nowadays., web page classification is essential for efficient and fast search engines. There is an ever-increasing need for automatic classification techniques with higher classification accuracy. In this article., a performance comparison of existing Turkish language CNN models for web pages classification systems is performed. In more detail., the content of web pages is extracted first., then preprocessing steps that aim to detect the important parts and eliminate useless contents are used. Next., Bert word embedding is integrated to represent the texts by efficient numerical vectors. Finally., three state-of-the-art CNN models that fully support the Turkish language are investigated to find the best classifier. Overall., the three studied models obtained an acceptable performance while classifying the Turkish webpages., however., the third model was able to achieve slightly better than the other two models. © 2021 IEEE

DSpace@HKU

Identifying Documents In-Scope of a Collection from Web Archives

Author: Alam Sawood
Bengio Yoshua
Caragea Cornelia
Craven Mark
Dooley Chase
Dumais Susan
Joachims Thorsten
Le Quoc
Lu Qing
McCallum Andrew
McCallum Andrew
Phillips Mark
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/09/2020
Field of study

Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e.g., scientific digital libraries and repositories of technical reports. In doing so, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the huge number of documents collected by web archiving institutions. In this paper, we explore different learning models and feature representations to determine the best performing ones for identifying the documents of interest from the web archived data. Specifically, we study both machine learning and deep learning models and "bag of words" (BoW) features extracted from the entire document or from specific portions of the document, as well as structural features that capture the structure of documents. We focus our evaluation on three datasets that we created from three different Web archives. Our experimental results show that the BoW classifiers that focus only on specific portions of the documents (rather than the full text) outperform all compared methods on all three datasets.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Sentiment Analysis of Persian Language: Review of Algorithms, Approaches and Datasets

Author: Banirostam Touraj
Nazarizadeh Ali
Sayyadpour Minoo
Publication venue
Publication date: 09/11/2022
Field of study

Sentiment analysis aims to extract people's emotions and opinion from their comments on the web. It widely used in businesses to detect sentiment in social data, gauge brand reputation, and understand customers. Most of articles in this area have concentrated on the English language whereas there are limited resources for Persian language. In this review paper, recent published articles between 2018 and 2022 in sentiment analysis in Persian Language have been collected and their methods, approach and dataset will be explained and analyzed. Almost all the methods used to solve sentiment analysis are machine learning and deep learning. The purpose of this paper is to examine 40 different approach sentiment analysis in the Persian Language, analysis datasets along with the accuracy of the algorithms applied to them and also review strengths and weaknesses of each. Among all the methods, transformers such as BERT and RNN Neural Networks such as LSTM and Bi-LSTM have achieved higher accuracy in the sentiment analysis. In addition to the methods and approaches, the datasets reviewed are listed between 2018 and 2022 and information about each dataset and its details are provided

arXiv.org e-Print Archive

Recommended from our members

Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media

Author: Alani Harith
Burel Gregoire
Publication venue
Publication date: 18/05/2018
Field of study

Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (e.g., hurricane, floods, etc.) and information categories (e.g., reports on affected individuals, donations and volunteering, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNNs) and validated against traditional machine learning models. Results show that the CNN-based API results can be relied upon when dealing with specific crises with the benefits associated with the usage word embeddings

Open Research Online (The Open University)

Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media

Author: F Atefeh
H Gao
P Meier
TJ Campanella
Y Bengio
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of statistical and non-semantic deep learning models

Crossref

Open Research Online (The Open University)