1,216 research outputs found

    Sentiment classification with concept drift and imbalanced class distributions

    Get PDF
    Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, people express their opinions towards an entity based on their characteristics which may change over time. Userā€˜s opinions are changed due to evolution of target entities over time. However, the existing sentiment classification approaches did not considered the evolution of Userā€˜s opinions. They assumed that instances are independent, identically distributed and generated from a stationary distribution, while generated from a stream distribution. They used the static classification model that builds a classifier using a training set without considering the time that reviews are posted. However, time may be very useful as an important feature for classification task. In this paper, a stream sentiment classification framework is proposed to deal with concept drift and imbalanced data distribution using ensemble learning and instance selection methods. The experimental results show the effectiveness of the proposed method in compared with static sentiment classification

    Predicting Multi-class Customer Profiles Based on Transactions: a Case Study in Food Sales

    Get PDF
    Predicting the class of a customer profile is a key task in marketing, which enables businesses to approach the right customer with the right product at the right time through the right channel to satisfy the customer's evolving needs. However, due to costs, privacy and/or data protection, only the business' owned transactional data is typically available for constructing customer profiles. Predicting the class of customer profiles based on such data is challenging, as the data tends to be very large, heavily sparse and highly skewed. We present a new approach that is designed to efficiently and accurately handle the multi-class classification of customer profiles built using sparse and skewed transactional data. Our approach first bins the customer profiles on the basis of the number of items transacted. The discovered bins are then partitioned and prototypes within each of the discovered bins selected to build the multi-class classifier models. The results obtained from using four multi-class classifiers on real-world transactional data from the food sales domain consistently show the critical numbers of items at which the predictive performance of customer profiles can be substantially improved

    New perspectives and methods for stream learning in the presence of concept drift.

    Get PDF
    153 p.Applications that generate data in the form of fast streams from non-stationary environments, that is,those where the underlying phenomena change over time, are becoming increasingly prevalent. In thiskind of environments the probability density function of the data-generating process may change overtime, producing a drift. This causes that predictive models trained over these stream data become obsoleteand do not adapt suitably to the new distribution. Specially in online learning scenarios, there is apressing need for new algorithms that adapt to this change as fast as possible, while maintaining goodperformance scores. Examples of these applications include making inferences or predictions based onfinancial data, energy demand and climate data analysis, web usage or sensor network monitoring, andmalware/spam detection, among many others.Online learning and concept drift are two of the most hot topics in the recent literature due to theirrelevance for the so-called Big Data paradigm, where nowadays we can find an increasing number ofapplications based on training data continuously available, named as data streams. Thus, learning in nonstationaryenvironments requires adaptive or evolving approaches that can monitor and track theunderlying changes, and adapt a model to accommodate those changes accordingly. In this effort, Iprovide in this thesis a comprehensive state-of-the-art approaches as well as I identify the most relevantopen challenges in the literature, while focusing on addressing three of them by providing innovativeperspectives and methods.This thesis provides with a complete overview of several related fields, and tackles several openchallenges that have been identified in the very recent state of the art. Concretely, it presents aninnovative way to generate artificial diversity in ensembles, a set of necessary adaptations andimprovements for spiking neural networks in order to be used in online learning scenarios, and finally, adrift detector based on this former algorithm. All of these approaches together constitute an innovativework aimed at presenting new perspectives and methods for the field

    A novel penalty-based wrapper objective function for feature selection in big data using cooperative co-evolution

    Get PDF
    The rapid progress of modern technologies generates a massive amount of high-throughput data, called Big Data, which provides opportunities to find new insights using machine learning (ML) algorithms. Big Data consist of many features (also called attributes); however, not all these are necessary or relevant, and they may degrade the performance of ML algorithms. Feature selection (FS) is an essential preprocessing step to reduce the dimensionality of a dataset. Evolutionary algorithms (EAs) are widely used search algorithms for FS. Using classification accuracy as the objective function for FS, EAs, such as the cooperative co-evolutionary algorithm (CCEA), achieve higher accuracy, even with a higher number of features. Feature selection has two purposes: reducing the number of features to decrease computations and improving classification accuracy, which are contradictory but can be achieved using a single objective function. For this very purpose, this paper proposes a penalty-based wrapper objective function. This function can be used to evaluate the FS process using CCEA, hence called Cooperative Co-Evolutionary Algorithm-Based Feature Selection (CCEAFS). An experiment was performed using six widely used classifiers on six different datasets from the UCI ML repository with FS and without FS. The experimental results indicate that the proposed objective function is efficient at reducing the number of features in the final feature subset without significantly reducing classification accuracy. Based on different performance measures, in most cases, naĆÆve Bayes outperforms other classifiers when using CCEAFS

    Evolving Spiking Neural Networks for online learning over drifting data streams

    Get PDF
    Nowadays huge volumes of data are produced in the form of fast streams, which are further affected by non-stationary phenomena. The resulting lack of stationarity in the distribution of the produced data calls for efficient and scalable algorithms for online analysis capable of adapting to such changes (concept drift). The online learning field has lately turned its focus on this challenging scenario, by designing incremental learning algorithms that avoid becoming obsolete after a concept drift occurs. Despite the noted activity in the literature, a need for new efficient and scalable algorithms that adapt to the drift still prevails as a research topic deserving further effort. Surprisingly, Spiking Neural Networks, one of the major exponents of the third generation of artificial neural networks, have not been thoroughly studied as an online learning approach, even though they are naturally suited to easily and quickly adapting to changing environments. This work covers this research gap by adapting Spiking Neural Networks to meet the processing requirements that online learning scenarios impose. In particular the work focuses on limiting the size of the neuron repository and making the most of this limited size by resorting to data reduction techniques. Experiments with synthetic and real data sets are discussed, leading to the empirically validated assertion that, by virtue of a tailored exploitation of the neuron repository, Spiking Neural Networks adapt better to drifts, obtaining higher accuracy scores than naive versions of Spiking Neural Networks for online learning environments.This work was supported by the EU project Pacific AtlanticNetwork for Technical Higher Education and Researchā€”PANTHER(grant number 2013-5659/004-001 EMA2)
    • ā€¦
    corecore