1,196 research outputs found
Role of sentiment classification in sentiment analysis: a survey
Through a survey of literature, the role of sentiment classification in sentiment analysis has been reviewed. The review identifies the research challenges involved in tackling sentiment classification. A total of 68 articles during 2015 – 2017 have been reviewed on six dimensions viz., sentiment classification, feature extraction, cross-lingual sentiment classification, cross-domain sentiment classification, lexica and corpora creation and multi-label sentiment classification. This study discusses the prominence and effects of sentiment classification in sentiment evaluation and a lot of further research needs to be done for productive results
Data stream treatment using sliding windows with MapReduce
Knowledge Discovery in Databases (KDD) techniques present limitations when the volume of data to process is very large. Any KDD algorithm needs to do several iterations on the complete set of data in order to carry out its work. For continuous data stream processing it is necessary to store part of it in a temporal window.
In this paper, we present a technique that uses the size of the temporal window in a dynamic way, based on the frequency of the data arrival and the response time of the KDD task. The obtained results show that this technique reaches a great size window where each example of the stream is used in more than one iteration of the KDD task.Facultad de Informátic
Data stream treatment using sliding windows with MapReduce
Knowledge Discovery in Databases (KDD) techniques present limitations when the volume of data to process is very large. Any KDD algorithm needs to do several iterations on the complete set of data in order to carry out its work. For continuous data stream processing it is necessary to store part of it in a temporal window.
In this paper, we present a technique that uses the size of the temporal window in a dynamic way, based on the frequency of the data arrival and the response time of the KDD task. The obtained results show that this technique reaches a great size window where each example of the stream is used in more than one iteration of the KDD task.Facultad de Informátic
SOTXTSTREAM: Density-based self-organizing clustering of text streams
A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets
Modeling the effect of blending multiple components on gasoline properties
Global CO2 emissions reached a new historical maximum in 2018 and transportation
sector contributed to one fourth of those emissions. Road transport industry has started
moving towards more sustainable solutions, however, market penetration for electric
vehicles (EV) is still too slow while regulation for biofuels has become stricter due to
the risk of inflated food prices and skepticism regarding their sustainability. In spite of
this, Europe has ambitious targets for the next 30 years and impending strict policies
resulting from these goals will definitely increase the pressure on the oil sector to move
towards cleaner practices and products.
Although the use of biodiesel is quite extended and bioethanol is already used as
a gasoline component, there are no alternative drop-in fuels compatible with spark
ignition engines in the market yet. Alternative feedstock is widely available but its
characteristics differ from those of crude oil, and lack of homogeneity and substantially
lower availability complicate its integration in conventional refining processes. This
work explores the possibility of implementing Machine Learning to develop predictive
models for auto-ignition properties and to gain a better understanding of the blending
behavior of the different molecules that conform commercial gasoline. Additionally,
the methodology developed in this study aims to contribute to new characterization
methods for conventional and renewable gasoline streams in a simpler, faster and more
inexpensive way.
To build the models included in this thesis, a palette with seven different compounds was
chosen: n-heptane, iso-octane, 1-hexene, cyclopentane, toluene, ethanol and ETBE. A
data set containing 243 different combinations of the species in the palette was collected
from literature, together with their experimentally measured RON and/or MON. Linear
Regression based on Ordinary Least Squares was used as the baseline to compare the
performance of more complex algorithms, namely Nearest Neighbors, Support Vector
Machines, Decision Trees and Random Forest. The best predictions were obtained with
a Support Vector Regression algorithm using a non-linear kernel, able to reproduce
synergistic and antagonistic interaction between the seven molecules in the samples
Polar Encoding: A Simple Baseline Approach for Classification with Missing Values
We propose polar encoding, a representation of categorical and numerical
-valued attributes with missing values to be used in a classification
context. We argue that this is a good baseline approach, because it can be used
with any classification algorithm, preserves missingness information, is very
simple to apply and offers good performance. In particular, unlike the existing
missing-indicator approach, it does not require imputation, ensures that
missing values are equidistant from non-missing values, and lets decision tree
algorithms choose how to split missing values, thereby providing a practical
realisation of the "missingness incorporated in attributes" (MIA) proposal.
Furthermore, we show that categorical and -valued attributes can be
viewed as special cases of a single attribute type, corresponding to the
classical concept of barycentric coordinates, and that this offers a natural
interpretation of polar encoding as a fuzzified form of one-hot encoding. With
an experiment based on twenty real-life datasets with missing values, we show
that, in terms of the resulting classification performance, polar encoding
performs better than the state-of-the-art strategies \e{multiple imputation by
chained equations} (MICE) and \e{multiple imputation with denoising
autoencoders} (MIDAS) and -- depending on the classifier -- about as well or
better than mean/mode imputation with missing-indicators
Automatic Synchronization of Multi-User Photo Galleries
In this paper we address the issue of photo galleries synchronization, where
pictures related to the same event are collected by different users. Existing
solutions to address the problem are usually based on unrealistic assumptions,
like time consistency across photo galleries, and often heavily rely on
heuristics, limiting therefore the applicability to real-world scenarios. We
propose a solution that achieves better generalization performance for the
synchronization task compared to the available literature. The method is
characterized by three stages: at first, deep convolutional neural network
features are used to assess the visual similarity among the photos; then, pairs
of similar photos are detected across different galleries and used to construct
a graph; eventually, a probabilistic graphical model is used to estimate the
temporal offset of each pair of galleries, by traversing the minimum spanning
tree extracted from this graph. The experimental evaluation is conducted on
four publicly available datasets covering different types of events,
demonstrating the strength of our proposed method. A thorough discussion of the
obtained results is provided for a critical assessment of the quality in
synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi
- …