256 research outputs found
Secure -ish Nearest Neighbors Classifier
In machine learning, classifiers are used to predict a class of a given query
based on an existing (classified) database. Given a database S of n
d-dimensional points and a d-dimensional query q, the k-nearest neighbors (kNN)
classifier assigns q with the majority class of its k nearest neighbors in S.
In the secure version of kNN, S and q are owned by two different parties that
do not want to share their data. Unfortunately, all known solutions for secure
kNN either require a large communication complexity between the parties, or are
very inefficient to run.
In this work we present a classifier based on kNN, that can be implemented
efficiently with homomorphic encryption (HE). The efficiency of our classifier
comes from a relaxation we make on kNN, where we allow it to consider kappa
nearest neighbors for kappa ~ k with some probability. We therefore call our
classifier k-ish Nearest Neighbors (k-ish NN).
The success probability of our solution depends on the distribution of the
distances from q to S and increase as its statistical distance to Gaussian
decrease.
To implement our classifier we introduce the concept of double-blinded
coin-toss. In a doubly-blinded coin-toss the success probability as well as the
output of the toss are encrypted. We use this coin-toss to efficiently
approximate the average and variance of the distances from q to S. We believe
these two techniques may be of independent interest.
When implemented with HE, the k-ish NN has a circuit depth that is
independent of n, therefore making it scalable. We also implemented our
classifier in an open source library based on HELib and tested it on a breast
tumor database. The accuracy of our classifier (F_1 score) were 98\% and
classification took less than 3 hours compared to (estimated) weeks in current
HE implementations
SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search
The -Nearest Neighbor Search (-NNS) is the backbone of several
cloud-based services such as recommender systems, face recognition, and
database search on text and images. In these services, the client sends the
query to the cloud server and receives the response in which case the query and
response are revealed to the service provider. Such data disclosures are
unacceptable in several scenarios due to the sensitivity of data and/or privacy
laws.
In this paper, we introduce SANNS, a system for secure -NNS that keeps
client's query and the search result confidential. SANNS comprises two
protocols: an optimized linear scan and a protocol based on a novel sublinear
time clustering-based algorithm. We prove the security of both protocols in the
standard semi-honest model. The protocols are built upon several
state-of-the-art cryptographic primitives such as lattice-based additively
homomorphic encryption, distributed oblivious RAM, and garbled circuits. We
provide several contributions to each of these primitives which are applicable
to other secure computation tasks. Both of our protocols rely on a new circuit
for the approximate top- selection from numbers that is built from comparators.
We have implemented our proposed system and performed extensive experimental
results on four datasets in two different computation environments,
demonstrating more than faster response time compared to
optimally implemented protocols from the prior work. Moreover, SANNS is the
first work that scales to the database of 10 million entries, pushing the limit
by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202
Penerapan Metode K-Nearest Neighbor untuk Mengklasifikasi Penyebaran Kasus Demam Berdarah Dengue (DBD) di Kabupaten Maluku Tenggara
Demam Berdarah Dengue (DBD) merupakan penyakit yang disebabkan oleh virus dengue yang ditularkan melalui gigitan nyamuk Aedes Aegypti dan masuk ke peredaran darah manusia. Penyakit ini merupakan penyakit berbahaya yang sering menimbulkan kekhawatiran masyarakat karena perjalanan penyakitnya cepat dan dapat meyebabkan kematian dalam waktu singkat. Oleh karena itu, perlu dilakukan kajian tentang penyebarannya sehingga dapat diambil tindakan cepat dalam mencegah kasus tersebut, Salah satu metode yang dapat digunakan yaitu metode K-Nearest Neighbor (KNN). K-NN merupakan suatu bentuk model pendukung keputusan yang dapat megklasifikasikan data berdasarkan jarak terdekat. Data yang digunakan dalam penelitian ini bersumber dari BPS Kabupaten Maluku Tenggara tahun 2021. Penelitian ini diperoleh hasil bahwa terdapat 2 kelompok penyebaran DBD di Kabupaten Maluku Tenggara yaitu Kecamatan yang berpotensi DBD tinggi yaitu terdiri dari Kecamatan Kei Besar, Kecamatan Kei Kecil dan Kecamatan Kei Besar Selatan Barat. Selanjutnya kecamatan yang berpotensi penyebaran DBD rendah yaitu Kecamatan Kei Besar Utara Barat, Kecamatan Kei Besar Selatan, Kecamatan Kei Besar Timur Selatan, Kecamatan Kei Besar Utara Timur, Kecamatan Hoat Sorbay, Kecamatan Kei Kecil Barat, Kecamatan Kei Kecil Timur dan Kecamatan Manyeuw.Kata Kunci: Demam Berdarah Dengue, K-Nearest Neighbor, Maluku Tenggara
Comparative Study of Supervised Learning Methods for Malware Analysis, Journal of Telecommunications and Information Technology, 2014, nr 4
Malware is a software designed to disrupt or even damage computer system or do other unwanted actions. Nowadays, malware is a common threat of the World Wide Web. Anti-malware protection and intrusion detection can be significantly supported by a comprehensive and extensive analysis of data on the Web. The aim of such analysis is a classification of the collected data into two sets, i.e., normal and malicious data. In this paper the authors investigate the use of three supervised learning methods for data mining to support the malware detection. The results of applications of Support Vector Machine, Naive Bayes and k-Nearest Neighbors techniques to classification of the data taken from devices located in many units, organizations and monitoring systems serviced by CERT Poland are described. The performance of all methods is compared and discussed. The results of performed experiments show that the supervised learning algorithms method can be successfully used to computer data analysis, and can support computer emergency response teams in threats detection
Efficiency and Accuracy Enhancement of Intrusion Detection System Using Feature Selection and Cross-layer Mechanism
The dramatic increase in the number of connected devices and the significant growth of the network traffic data have led to many security vulnerabilities and cyber-attacks. Hence, developing new methods to secure the network infrastructure and protect data from malicious and unauthorized access becomes a vital aspect of communication network design. Intrusion Detection Systems (IDSs), as common widely used security techniques, are critical to detect network attacks and unauthorized network access and thus minimize further cyber-attack damages. However, there are a number of weaknesses that need to be addressed to make reliable IDS for real-world applications. One of the fundamental challenges is the large number of redundant and non-relevant data. Feature selection emerges as a necessary step in efficient IDS design to overcome high dimensionality problem and enhance the performance of IDS through the reduction of its complexity and the acceleration of the detection process. Moreover, detection algorithm has significant impact on the performance of IDS. Machine learning techniques are widely used in such systems which is studied in details in this dissertation. One of the most destructive activities in wireless networks such as MANET is packet dropping. The existence of the intrusive attackers in the network is not the only cause of packet loss. In fact, packet drop can occur because of faulty network. Hence, in order detect the packet dropping caused by a malicious activity of an attacker, information from various layers of the protocol is needed to detect malicious packet loss effectively. To this end, a novel cross-layer design for malicious packet loss detection in MANET is proposed using features from physical layer, network layer and MAC layer to make a better detection decision. Trust-based mechanism is adopted in this design and a packet loss free routing algorithm is presented accordingly
FAPRIL: Towards Faster Privacy-Preserving Fingerprint-Based Localization
Fingerprinting is a commonly used technique to provide accurate localization for indoor areas, where global navigation satellite systems, such as GPS and Galileo, cannot function or are not precise enough. Although fingerprint-based indoor localization has gained wide popularity, existing solutions that preserve privacy either rely on non-colluding servers or have high communication which hinder deployment.
In this work we present FAPRIL, a privacy-preserving indoor localization scheme, which takes advantage of the latest secure two-party computation protocol improvements. We can split our scheme into two parts: an input independent setup phase and an online phase. We concentrate on optimizing the online phase for mobile clients who run on a mobile data plan and observe that recurring operands allow to optimize the total communication overhead even further. Our observation can be generalized, e.g., to improve multiplication of Arithmetic secret shared matrices. We implement FAPRIL on mobile devices and our benchmarks over a simulated LTE network show that the online phase of a private localization takes under 0.15 seconds with less than 0.20 megabytes of communication even for large buildings. The setup phase, which can be pre-computed, depends heavily on the setting but stays in the range 0.28 - 4.14 seconds and 0.69 - 16.00 megabytes per localization query. The round complexity of FAPRIL is constant for both phases
Human sensing indoors in RF utilising unlabeled sensor streams
Indoor human sensing in radio frequencies is crucial for non-invasive, privacy-preserving digital healthcare, and machine learning is the backbone of such systems. Changes in the environment affect negatively the quality of learned mappings, which necessitates a semi-supervised approach that makes use of the unlabeled data stream to allow the learner to refine their hypothesis with time.We first explore the ambulation classification problem with frequency modulated continuous wave (FMCW) radar, replacing manual feature engineering by inductive bias in architectural choices of the neural network. We demonstrate that key ambulations: walk, bend, sit to stand and stand to sit can be distinguished with high accuracy. We then apply variational autoencoders to explore unsupervised localisation in synthetic grayscale images, finding that the goal is achievable with the choice of encoder that encodes temporal structure.Next, we evaluate temporal contrastive learning as the method of using unlabeled sensor streams in fingerprinting localisation, finding that it is a reliable method of defining a notion of pairwise distance on the data in that it improves the classification using the nearest neighbour classifier by both reducing the number of other-class items in same-class clusters, and increasing the pairwise distance contrast. Compared to the state of the art in fingerprinting localisation indoors, our contribution is that we successfully address the unsupervised domain adaptation problem.Finally, we raise the hypothesis that some knowledge can be shared between learners in different houses in a privacy-preserving manner. We adapt federated learning (FL) to the multi-residence indoor localisation scenario, which has not been done before, and propose a localfine-tuning algorithm with acceptance based on local validation error improvement. We find the tuned FL each client has a better personalised model compared to benchmark FL while keeping learning dynamics smooth for all clients
A Web Based Solution to Track Trawl Vessel Activities Over Pipelines in Norwegian Continental Shelf
Master's thesis in Computer ScienceVessel Activities such as trawling and anchoring represent a risk to offshore marine structures such as pipelines, subsea structures, cables and platforms. Third party interference is a major contributor to the damage and failure statistics for subsea pipelines. Detecting such activity at an early stage, increases the probability of introducing cost efficient mitigation measures before costly repairs are necessary. The main goal of this study is to develop an interactive web-based solution to track and monitor trawl vessel activities in the Norwegian Continental Shelf which can be used for assessing integrity of pipelines. Vessels share their location and identity via the Universal Shipborne Automatic Identification System (AIS) over a 24-hour period, refreshing under different time intervals. Hence, there are billions of data points and terabytes of data to feed into our computer systems. Making sense of them poses many challenges, of which the main challenge is to identify the type of the fishing vessel. This problem is important because, identifying the vessel type forms the preliminary in recognizing trawling activities. Trawl patterns have shown to change over time and sometimes also because of a new pipeline being installed. The detailed information about the trawl activity is essential to have an accurate assessment of where to inspect and where to implement corrective intervention, based on up to date trawling intensity and equipment used. The main contribution of this thesis is to implement a machine learning approach to identify the type of fishing vessels and provide a web based solution to perform detailed analysis of trawl vessels activities over the pipelines for a chosen area of interest
- …