768 research outputs found
Recommended from our members
Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media
Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (e.g., hurricane, floods, etc.) and information categories (e.g., reports on affected individuals, donations and volunteering, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNNs) and validated against traditional machine learning models. Results show that the CNN-based API results can be relied upon when dealing with specific crises with the benefits associated with the usage word embeddings
A Novel Distributed Representation of News (DRNews) for Stock Market Predictions
In this study, a novel Distributed Representation of News (DRNews) model is
developed and applied in deep learning-based stock market predictions. With the
merit of integrating contextual information and cross-documental knowledge, the
DRNews model creates news vectors that describe both the semantic information
and potential linkages among news events through an attributed news network.
Two stock market prediction tasks, namely the short-term stock movement
prediction and stock crises early warning, are implemented in the framework of
the attention-based Long Short Term-Memory (LSTM) network. It is suggested that
DRNews substantially enhances the results of both tasks comparing with five
baselines of news embedding models. Further, the attention mechanism suggests
that short-term stock trend and stock market crises both receive influences
from daily news with the former demonstrates more critical responses on the
information related to the stock market {\em per se}, whilst the latter draws
more concerns on the banking sector and economic policies.Comment: 25 page
A Novel Kernel for Text Classification Based on Semantic and Statistical Information
In text categorization, a document is usually represented by a vector space model which can accomplish the classification task, but the model cannot deal with Chinese synonyms and polysemy phenomenon. This paper presents a novel approach which takes into account both the semantic and statistical information to improve the accuracy of text classification. The proposed approach computes semantic information based on HowNet and statistical information based on a kernel function with class-based weighting. According to our experimental results, the proposed approach could achieve state-of-the-art or competitive results as compared with traditional approaches such as the k-Nearest Neighbor (KNN), the Naive Bayes and deep learning models like convolutional networks
Identifying Restaurants Proposing Novel Kinds of Cuisines: Using Yelp Reviews
These days with TV-shows and starred chefs, new kinds of cuisines appear in the market. The main cuisines like French, Italian, Japanese, Chinese and Indian are always appreciated but they are no longer the most popular. The new trend is the fusion cuisine, which is obtained by combining different main cuisines. The opening of a new restaurant proposing new kinds of cuisine produces a lot of excitement in people. They feel the need to try it and be part of this new culture. Yelp is a platform which publishes crowd sourced reviews about different businesses, in particular, restaurants. For some restaurants in Yelp if the kind of cuisine is available, usually, there is a tag only for the main cuisines, but there is no information for the fusion cuisine. There is a need to develop a system which is able to identify restaurants proposing fusion cuisine (novel or unknown cuisines).
This proposal is to address the novelty detection task using Yelp reviews. The idea is that the semi-supervised Machine Learning models trained only on the reviews of restaurants proposing the main cuisine will be able to discriminate between restaurants providing the main cuisine and restaurants providing the novel ones.
We propose effective novelty detection approaches for the unknown cuisine type identification problem using Long Short Term Memory (LSTM), autoencoder and Term-Frequency and Inverse Document Frequency(). Our main idea is to obtain features from LSTM, autoencoder and TF-IDF and use these features with standard semi-supervised novelty detection algorithms like Gaussian Mixture Model, Isolation Forest and One-class Support Vector Machines (SVM) to identify the unknown cuisines.
We conducted extensive experiments that prove the effectiveness of our approaches. The score that we obtained has a very high discrimination power because the best value of AUROC for the novelty detection problem is 0.85 from LSTM. LSTM outperforms our baseline model of TF-IDF and the main motivation is due to its ability to retain only the useful parts of a sentence
Deep Learning for Opinion Mining and Topic Classification of Course Reviews
Student opinions for a course are important to educators and administrators,
regardless of the type of the course or the institution. Reading and manually
analyzing open-ended feedback becomes infeasible for massive volumes of
comments at institution level or online forums. In this paper, we collected and
pre-processed a large number of course reviews publicly available online. We
applied machine learning techniques with the goal to gain insight into student
sentiments and topics. Specifically, we utilized current Natural Language
Processing (NLP) techniques, such as word embeddings and deep neural networks,
and state-of-the-art BERT (Bidirectional Encoder Representations from
Transformers), RoBERTa (Robustly optimized BERT approach) and XLNet
(Generalized Auto-regression Pre-training). We performed extensive
experimentation to compare these techniques versus traditional approaches. This
comparative study demonstrates how to apply modern machine learning approaches
for sentiment polarity extraction and topic-based classification utilizing
course feedback. For sentiment polarity, the top model was RoBERTa with 95.5%
accuracy and 84.7% F1-macro, while for topic classification, an SVM (Support
Vector Machine) was the top classifier with 79.8% accuracy and 80.6% F1-macro.
We also provided an in-depth exploration of the effect of certain
hyperparameters on the model performance and discussed our observations. These
findings can be used by institutions and course providers as a guide for
analyzing their own course feedback using NLP models towards self-evaluation
and improvement.Comment: Accepted and Published in Education and Information Technologies
(Accepted March 2023
An assessment of deep learning models and word embeddings for toxicity detection within online textual comments
Today, increasing numbers of people are interacting online and a lot of textual comments are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in general, comments that may hurt someone’s feelings. In this scenario, the detection of this kind of toxicity has an important role to moderate online communication. Deep learning technologies have recently delivered impressive performance within Natural Language Processing applications encompassing Sentiment Analysis and emotion detection across numerous datasets. Such models do not need any pre-defined hand-picked features, but they learn sophisticated features from the input datasets by themselves. In such a domain, word embeddings have been widely used as a way of representing words in Sentiment Analysis tasks, proving to be very effective. Therefore, in this paper, we investigated the use of deep learning and word embeddings to detect six different types of toxicity within online comments. In doing so, the most suitable deep learning layers and state-of-the-art word embeddings for identifying toxicity are evaluated. The results suggest that Long-Short Term Memory layers in combination with mimicked word embeddings are a good choice for this task
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
- …