53,920 research outputs found
Time series transductive classification on imbalanced data sets: an experimental study
Graph-based semi-supervised learning (SSL) algorithms perform well on a variety of domains, such as digit recognition and text classification, when the data lie on a low-dimensional manifold. However, it is surprising that these methods have not been effectively applied on time series classification tasks. In this paper, we provide a comprehensive empirical comparison of state-of-the-art graph-based SSL algorithms with respect to graph construction and parameter selection. Specifically, we focus in this paper on the problem of time series transductive classification on imbalanced data sets. Through a comprehensive analysis using recently proposed empirical evaluation models, we confirm some of the hypotheses raised on previous work and show that some of them may not hold in the time series domain. From our results, we suggest the use of the Gaussian Fields and Harmonic Functions algorithm with the mutual k-nearest neighbors graph weighted by the RBF kernel, setting k = 20 on general tasks of time series transductive classification on imbalanced data sets.São Paulo Research Foundation (FAPESP) (grants 2011/17698-5 and 2012/50714-7
Parallel Processing of Large Graphs
More and more large data collections are gathered worldwide in various IT
systems. Many of them possess the networked nature and need to be processed and
analysed as graph structures. Due to their size they require very often usage
of parallel paradigm for efficient computation. Three parallel techniques have
been compared in the paper: MapReduce, its map-side join extension and Bulk
Synchronous Parallel (BSP). They are implemented for two different graph
problems: calculation of single source shortest paths (SSSP) and collective
classification of graph nodes by means of relational influence propagation
(RIP). The methods and algorithms are applied to several network datasets
differing in size and structural profile, originating from three domains:
telecommunication, multimedia and microblog. The results revealed that
iterative graph processing with the BSP implementation always and
significantly, even up to 10 times outperforms MapReduce, especially for
algorithms with many iterations and sparse communication. Also MapReduce
extension based on map-side join usually noticeably presents better efficiency,
although not as much as BSP. Nevertheless, MapReduce still remains the good
alternative for enormous networks, whose data structures do not fit in local
memories.Comment: Preprint submitted to Future Generation Computer System
Binary Linear Classification and Feature Selection via Generalized Approximate Message Passing
For the problem of binary linear classification and feature selection, we
propose algorithmic approaches to classifier design based on the generalized
approximate message passing (GAMP) algorithm, recently proposed in the context
of compressive sensing. We are particularly motivated by problems where the
number of features greatly exceeds the number of training examples, but where
only a few features suffice for accurate classification. We show that
sum-product GAMP can be used to (approximately) minimize the classification
error rate and max-sum GAMP can be used to minimize a wide variety of
regularized loss functions. Furthermore, we describe an
expectation-maximization (EM)-based scheme to learn the associated model
parameters online, as an alternative to cross-validation, and we show that
GAMP's state-evolution framework can be used to accurately predict the
misclassification rate. Finally, we present a detailed numerical study to
confirm the accuracy, speed, and flexibility afforded by our GAMP-based
approaches to binary linear classification and feature selection
- …