1,865 research outputs found
Weighted k-Nearest-Neighbor Techniques and Ordinal Classification
In the field of statistical discrimination k-nearest neighbor classification is a well-known, easy and successful method. In this paper we present an extended version of this technique, where the distances of the nearest neighbors can be taken into account. In this sense there is a close connection to LOESS, a local regression technique. In addition we show possibilities to use nearest neighbor for classification in the case of an ordinal class structure. Empirical studies show the advantages of the new techniques
Neural activity classification with machine learning models trained on interspike interval series data
The flow of information through the brain is reflected by the activity
patterns of neural cells. Indeed, these firing patterns are widely used as
input data to predictive models that relate stimuli and animal behavior to the
activity of a population of neurons. However, relatively little attention was
paid to single neuron spike trains as predictors of cell or network properties
in the brain. In this work, we introduce an approach to neuronal spike train
data mining which enables effective classification and clustering of neuron
types and network activity states based on single-cell spiking patterns. This
approach is centered around applying state-of-the-art time series
classification/clustering methods to sequences of interspike intervals recorded
from single neurons. We demonstrate good performance of these methods in tasks
involving classification of neuron type (e.g. excitatory vs. inhibitory cells)
and/or neural circuit activity state (e.g. awake vs. REM sleep vs. nonREM sleep
states) on an open-access cortical spiking activity dataset
CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data
mining and machine learning, as most of the real-life datasets are often
imbalanced in nature. Existing learning algorithms maximise the classification
accuracy by correctly classifying the majority class, but misclassify the
minority class. However, the minority class instances are representing the
concept with greater interest than the majority class instances in real-life
applications. Recently, several techniques based on sampling methods
(under-sampling of the majority class and over-sampling the minority class),
cost-sensitive learning methods, and ensemble learning have been used in the
literature for classifying imbalanced datasets. In this paper, we introduce a
new clustering-based under-sampling approach with boosting (AdaBoost)
algorithm, called CUSBoost, for effective imbalanced classification. The
proposed algorithm provides an alternative to RUSBoost (random under-sampling
with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost)
algorithms. We evaluated the performance of CUSBoost algorithm with the
state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost,
SMOTEBoost on 13 imbalance binary and multi-class datasets with various
imbalance ratios. The experimental results show that the CUSBoost is a
promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201
Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification
This paper describes a classification strategy that can be regarded as amore general form of nearest-neighbor classification. It fuses the concepts ofnearestneighbor,linear discriminantandVantage-Pointtrees, yielding an efficient indexingdata structure and classification algorithm. In the learning phase, we define a set ofdisjoint subspaces of reduced complexity that can be separated by linear discrimi-nants, ending up with an ensemble of simple (weak) classifiers that work locally. Inclassification, the closest centroids to the query determine the set of classifiers con-sidered, which responses are weighted. The algorithm was experimentally validatedin datasets widely used in the field, attaining error rates that are favorably compara-ble to the state-of-the-art classification techniques. Lastly, the proposed solution hasa set of interesting properties for a broad range of applications: 1) it is determinis-tic; 2) it classifies in time approximately logarithmic with respect to the size of thelearning set, being far more efficient than nearest neighbor classification in terms ofcomputational cost; and 3) it keeps the generalization ability of simple models.info:eu-repo/semantics/publishedVersio
- …