36,875 research outputs found
Supervised Intrusions Detection System Using KNN
This paper is on implementations of intrusion detection system using Knn algorithm using R language. The dataset used is the KDDcup 1999 a well know bench mark for IDS. The machine learning algorithm K nearest neighbor(Knn) is use for the detection and classification for the known attacks. The experimental results are obtained using R programming language
Epileptic Seizure Detection in EEGs by Using Random Tree Forest, Naïve Bayes and KNN Classification
Epilepsy is a disease that attacks the nerves. To detect epilepsy, it is necessary to
analyze the results of an EEG test. In this study, we compared the naive bayes, random tree forest and K-nearest neighbour (KNN) classification algorithms to detect epilepsy. The raw EEG data were pre-processed before doing feature extraction. Then, we have done the training in three algorithms: KNN Classification, naïve bayes classification and random tree forest. The last step was validation of the trained machine learning. Comparing those three classifiers, we calculated accuracy, sensitivity, specificity, and precision. The best trained classifier is KNN
classifier (accuracy: 92.7%), rather than random tree forest (accuracy: 86.6%) and naïve bayes classifier (accuracy: 55.6%). Seen from precision performance, KNN Classification also gives the best precision (82.5%) rather than Naïve Bayes classification (25.3%) and random tree forest (68.2%). But, for the sensitivity, Naïve Bayes classification is the best with 80.3% sensitivity, compare to KNN 73.2% and random tree forest (42.2%). For specificity, KNN classification gives 96.7% specificity, then random tree forest 95.9% and Naïve bayes 50.4%. The training time of naïve bayes was 0.166030 sec, while training time of random tree forest was 2.4094sec and KNN was the slower in training that was 4.789 sec. Therefore, KNN Classification gives better performance than naïve bayes and random tree forest classification
Secure -ish Nearest Neighbors Classifier
In machine learning, classifiers are used to predict a class of a given query
based on an existing (classified) database. Given a database S of n
d-dimensional points and a d-dimensional query q, the k-nearest neighbors (kNN)
classifier assigns q with the majority class of its k nearest neighbors in S.
In the secure version of kNN, S and q are owned by two different parties that
do not want to share their data. Unfortunately, all known solutions for secure
kNN either require a large communication complexity between the parties, or are
very inefficient to run.
In this work we present a classifier based on kNN, that can be implemented
efficiently with homomorphic encryption (HE). The efficiency of our classifier
comes from a relaxation we make on kNN, where we allow it to consider kappa
nearest neighbors for kappa ~ k with some probability. We therefore call our
classifier k-ish Nearest Neighbors (k-ish NN).
The success probability of our solution depends on the distribution of the
distances from q to S and increase as its statistical distance to Gaussian
decrease.
To implement our classifier we introduce the concept of double-blinded
coin-toss. In a doubly-blinded coin-toss the success probability as well as the
output of the toss are encrypted. We use this coin-toss to efficiently
approximate the average and variance of the distances from q to S. We believe
these two techniques may be of independent interest.
When implemented with HE, the k-ish NN has a circuit depth that is
independent of n, therefore making it scalable. We also implemented our
classifier in an open source library based on HELib and tested it on a breast
tumor database. The accuracy of our classifier (F_1 score) were 98\% and
classification took less than 3 hours compared to (estimated) weeks in current
HE implementations
AffinityNet: semi-supervised few-shot learning for disease type prediction
While deep learning has achieved great success in computer vision and many
other fields, currently it does not work very well on patient genomic data with
the "big p, small N" problem (i.e., a relatively small number of samples with
high-dimensional features). In order to make deep learning work with a small
amount of training data, we have to design new models that facilitate few-shot
learning. Here we present the Affinity Network Model (AffinityNet), a data
efficient deep learning model that can learn from a limited number of training
examples and generalize well. The backbone of the AffinityNet model consists of
stacked k-Nearest-Neighbor (kNN) attention pooling layers. The kNN attention
pooling layer is a generalization of the Graph Attention Model (GAM), and can
be applied to not only graphs but also any set of objects regardless of whether
a graph is given or not. As a new deep learning module, kNN attention pooling
layers can be plugged into any neural network model just like convolutional
layers. As a simple special case of kNN attention pooling layer, feature
attention layer can directly select important features that are useful for
classification tasks. Experiments on both synthetic data and cancer genomic
data from TCGA projects show that our AffinityNet model has better
generalization power than conventional neural network models with little
training data. The code is freely available at
https://github.com/BeautyOfWeb/AffinityNet .Comment: 14 pages, 6 figure
Exemplar-Centered Supervised Shallow Parametric Data Embedding
Metric learning methods for dimensionality reduction in combination with
k-Nearest Neighbors (kNN) have been extensively deployed in many
classification, data embedding, and information retrieval applications.
However, most of these approaches involve pairwise training data comparisons,
and thus have quadratic computational complexity with respect to the size of
training set, preventing them from scaling to fairly big datasets. Moreover,
during testing, comparing test data against all the training data points is
also expensive in terms of both computational cost and resources required.
Furthermore, previous metrics are either too constrained or too expressive to
be well learned. To effectively solve these issues, we present an
exemplar-centered supervised shallow parametric data embedding model, using a
Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a
shallow high-order parametric embedding function and compares training/test
data only with learned or precomputed exemplars, resulting in a cost function
with linear computational complexity for both training and testing. We also
empirically demonstrate, using several benchmark datasets, that for
classification in two-dimensional embedding space, our approach not only gains
speedup of kNN by hundreds of times, but also outperforms state-of-the-art
supervised embedding approaches.Comment: accepted to IJCAI201
Adaptive Preferential Attached kNN Graph With Distribution-Awareness
Graph-based kNN algorithms have garnered widespread popularity for machine
learning tasks, due to their simplicity and effectiveness. However, the
conventional kNN graph's reliance on a fixed value of k can hinder its
performance, especially in scenarios involving complex data distributions.
Moreover, like other classification models, the presence of ambiguous samples
along decision boundaries often presents a challenge, as they are more prone to
incorrect classification. To address these issues, we propose the Preferential
Attached k-Nearest Neighbors Graph (paNNG), which combines adaptive kNN with
distribution-based graph construction. By incorporating distribution
information, paNNG can significantly improve performance for ambiguous samples
by "pulling" them towards their original classes and hence enable enhanced
overall accuracy and generalization capability. Through rigorous evaluations on
diverse benchmark datasets, paNNG outperforms state-of-the-art algorithms,
showcasing its adaptability and efficacy across various real-world scenarios
- …