12,287 research outputs found
A Latent Source Model for Nonparametric Time Series Classification
For classifying time series, a nearest-neighbor approach is widely used in
practice with performance often competitive with or better than more elaborate
methods such as neural networks, decision trees, and support vector machines.
We develop theoretical justification for the effectiveness of
nearest-neighbor-like classification of time series. Our guiding hypothesis is
that in many applications, such as forecasting which topics will become trends
on Twitter, there aren't actually that many prototypical time series to begin
with, relative to the number of time series we have access to, e.g., topics
become trends on Twitter only in a few distinct manners whereas we can collect
massive amounts of Twitter data. To operationalize this hypothesis, we propose
a latent source model for time series, which naturally leads to a "weighted
majority voting" classification rule that can be approximated by a
nearest-neighbor classifier. We establish nonasymptotic performance guarantees
of both weighted majority voting and nearest-neighbor classification under our
model accounting for how much of the time series we observe and the model
complexity. Experimental results on synthetic data show weighted majority
voting achieving the same misclassification rate as nearest-neighbor
classification while observing less of the time series. We then use weighted
majority to forecast which news topics on Twitter become trends, where we are
able to detect such "trending topics" in advance of Twitter 79% of the time,
with a mean early advantage of 1 hour and 26 minutes, a true positive rate of
95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013
Multivariate texture discrimination based on geodesics to class centroids on a generalized Gaussian Manifold
A texture discrimination scheme is proposed wherein probability distributions are deployed on a probabilistic manifold for modeling the wavelet statistics of images. We consider the Rao geodesic distance (GD) to the class centroid for texture discrimination in various classification experiments. We compare the performance of GD to class centroid with the Euclidean distance in a similar context, both in terms of accuracy and computational complexity. Also, we compare our proposed classification scheme with the k-nearest neighbor algorithm. Univariate and multivariate Gaussian and Laplace distributions, as well as generalized Gaussian distributions with variable shape parameter are each evaluated as a statistical model for the wavelet coefficients. The GD to the centroid outperforms the Euclidean distance and yields superior discrimination compared to the k-nearest neighbor approach
Multivariate texture discrimination using a principal geodesic classifier
A new texture discrimination method is presented for classification and retrieval of colored textures represented in the wavelet domain. The interband correlation structure is modeled by multivariate probability models which constitute a Riemannian manifold. The presented method considers the shape of the class on the manifold by determining the principal geodesic of each class. The method, which we call principal geodesic classification, then determines the shortest distance from a test texture to the principal geodesic of each class. We use the Rao geodesic distance (GD) for calculating distances on the manifold. We compare the performance of the proposed method with distance-to-centroid and knearest neighbor classifiers and of the GD with the Euclidean distance. The principal geodesic classifier coupled with the GD yields better results, indicating the usefulness of effectively and concisely quantifying the variability of the classes in the probabilistic feature space
Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository
Machine learning qualifies computers to assimilate with data, without being
solely programmed [1, 2]. Machine learning can be classified as supervised and
unsupervised learning. In supervised learning, computers learn an objective
that portrays an input to an output hinged on training input-output pairs [3].
Most efficient and widely used supervised learning algorithms are K-Nearest
Neighbors (KNN), Support Vector Machine (SVM), Large Margin Nearest Neighbor
(LMNN), and Extended Nearest Neighbor (ENN). The main contribution of this
paper is to implement these elegant learning algorithms on eleven different
datasets from the UCI machine learning repository to observe the variation of
accuracies for each of the algorithms on all datasets. Analyzing the accuracy
of the algorithms will give us a brief idea about the relationship of the
machine learning algorithms and the data dimensionality. All the algorithms are
developed in Matlab. Upon such accuracy observation, the comparison can be
built among KNN, SVM, LMNN, and ENN regarding their performances on each
dataset.Comment: To be published in the 4th IEEE International Conference on
Electrical Engineering and Information & Communication Technology (iCEEiCT
2018
- …