1,196 research outputs found

    Role of sentiment classification in sentiment analysis: a survey

    Get PDF
    Through a survey of literature, the role of sentiment classification in sentiment analysis has been reviewed. The review identifies the research challenges involved in tackling sentiment classification. A total of 68 articles during 2015 – 2017 have been reviewed on six dimensions viz., sentiment classification, feature extraction, cross-lingual sentiment classification, cross-domain sentiment classification, lexica and corpora creation and multi-label sentiment classification. This study discusses the prominence and effects of sentiment classification in sentiment evaluation and a lot of further research needs to be done for productive results

    Data stream treatment using sliding windows with MapReduce

    Get PDF
    Knowledge Discovery in Databases (KDD) techniques present limitations when the volume of data to process is very large. Any KDD algorithm needs to do several iterations on the complete set of data in order to carry out its work. For continuous data stream processing it is necessary to store part of it in a temporal window. In this paper, we present a technique that uses the size of the temporal window in a dynamic way, based on the frequency of the data arrival and the response time of the KDD task. The obtained results show that this technique reaches a great size window where each example of the stream is used in more than one iteration of the KDD task.Facultad de Informátic

    Data stream treatment using sliding windows with MapReduce

    Get PDF
    Knowledge Discovery in Databases (KDD) techniques present limitations when the volume of data to process is very large. Any KDD algorithm needs to do several iterations on the complete set of data in order to carry out its work. For continuous data stream processing it is necessary to store part of it in a temporal window. In this paper, we present a technique that uses the size of the temporal window in a dynamic way, based on the frequency of the data arrival and the response time of the KDD task. The obtained results show that this technique reaches a great size window where each example of the stream is used in more than one iteration of the KDD task.Facultad de Informátic

    SOTXTSTREAM: Density-based self-organizing clustering of text streams

    Get PDF
    A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets

    Modeling the effect of blending multiple components on gasoline properties

    Get PDF
    Global CO2 emissions reached a new historical maximum in 2018 and transportation sector contributed to one fourth of those emissions. Road transport industry has started moving towards more sustainable solutions, however, market penetration for electric vehicles (EV) is still too slow while regulation for biofuels has become stricter due to the risk of inflated food prices and skepticism regarding their sustainability. In spite of this, Europe has ambitious targets for the next 30 years and impending strict policies resulting from these goals will definitely increase the pressure on the oil sector to move towards cleaner practices and products. Although the use of biodiesel is quite extended and bioethanol is already used as a gasoline component, there are no alternative drop-in fuels compatible with spark ignition engines in the market yet. Alternative feedstock is widely available but its characteristics differ from those of crude oil, and lack of homogeneity and substantially lower availability complicate its integration in conventional refining processes. This work explores the possibility of implementing Machine Learning to develop predictive models for auto-ignition properties and to gain a better understanding of the blending behavior of the different molecules that conform commercial gasoline. Additionally, the methodology developed in this study aims to contribute to new characterization methods for conventional and renewable gasoline streams in a simpler, faster and more inexpensive way. To build the models included in this thesis, a palette with seven different compounds was chosen: n-heptane, iso-octane, 1-hexene, cyclopentane, toluene, ethanol and ETBE. A data set containing 243 different combinations of the species in the palette was collected from literature, together with their experimentally measured RON and/or MON. Linear Regression based on Ordinary Least Squares was used as the baseline to compare the performance of more complex algorithms, namely Nearest Neighbors, Support Vector Machines, Decision Trees and Random Forest. The best predictions were obtained with a Support Vector Regression algorithm using a non-linear kernel, able to reproduce synergistic and antagonistic interaction between the seven molecules in the samples

    Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

    Full text link
    We propose polar encoding, a representation of categorical and numerical [0,1][0,1]-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and [0,1][0,1]-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies \e{multiple imputation by chained equations} (MICE) and \e{multiple imputation with denoising autoencoders} (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators

    Automatic Synchronization of Multi-User Photo Galleries

    Full text link
    In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi
    • …
    corecore