256 research outputs found

    Secure kk-ish Nearest Neighbors Classifier

    Get PDF
    In machine learning, classifiers are used to predict a class of a given query based on an existing (classified) database. Given a database S of n d-dimensional points and a d-dimensional query q, the k-nearest neighbors (kNN) classifier assigns q with the majority class of its k nearest neighbors in S. In the secure version of kNN, S and q are owned by two different parties that do not want to share their data. Unfortunately, all known solutions for secure kNN either require a large communication complexity between the parties, or are very inefficient to run. In this work we present a classifier based on kNN, that can be implemented efficiently with homomorphic encryption (HE). The efficiency of our classifier comes from a relaxation we make on kNN, where we allow it to consider kappa nearest neighbors for kappa ~ k with some probability. We therefore call our classifier k-ish Nearest Neighbors (k-ish NN). The success probability of our solution depends on the distribution of the distances from q to S and increase as its statistical distance to Gaussian decrease. To implement our classifier we introduce the concept of double-blinded coin-toss. In a doubly-blinded coin-toss the success probability as well as the output of the toss are encrypted. We use this coin-toss to efficiently approximate the average and variance of the distances from q to S. We believe these two techniques may be of independent interest. When implemented with HE, the k-ish NN has a circuit depth that is independent of n, therefore making it scalable. We also implemented our classifier in an open source library based on HELib and tested it on a breast tumor database. The accuracy of our classifier (F_1 score) were 98\% and classification took less than 3 hours compared to (estimated) weeks in current HE implementations

    SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search

    Get PDF
    The kk-Nearest Neighbor Search (kk-NNS) is the backbone of several cloud-based services such as recommender systems, face recognition, and database search on text and images. In these services, the client sends the query to the cloud server and receives the response in which case the query and response are revealed to the service provider. Such data disclosures are unacceptable in several scenarios due to the sensitivity of data and/or privacy laws. In this paper, we introduce SANNS, a system for secure kk-NNS that keeps client's query and the search result confidential. SANNS comprises two protocols: an optimized linear scan and a protocol based on a novel sublinear time clustering-based algorithm. We prove the security of both protocols in the standard semi-honest model. The protocols are built upon several state-of-the-art cryptographic primitives such as lattice-based additively homomorphic encryption, distributed oblivious RAM, and garbled circuits. We provide several contributions to each of these primitives which are applicable to other secure computation tasks. Both of our protocols rely on a new circuit for the approximate top-kk selection from nn numbers that is built from O(n+k2)O(n + k^2) comparators. We have implemented our proposed system and performed extensive experimental results on four datasets in two different computation environments, demonstrating more than 1831×18-31\times faster response time compared to optimally implemented protocols from the prior work. Moreover, SANNS is the first work that scales to the database of 10 million entries, pushing the limit by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202

    Penerapan Metode K-Nearest Neighbor untuk Mengklasifikasi Penyebaran Kasus Demam Berdarah Dengue (DBD) di Kabupaten Maluku Tenggara

    Get PDF
    Demam Berdarah Dengue (DBD) merupakan penyakit yang disebabkan oleh virus dengue yang ditularkan melalui gigitan nyamuk Aedes Aegypti dan masuk ke peredaran darah manusia. Penyakit ini merupakan penyakit berbahaya yang sering menimbulkan kekhawatiran masyarakat karena perjalanan penyakitnya cepat dan dapat meyebabkan kematian dalam waktu singkat. Oleh karena itu, perlu dilakukan kajian tentang penyebarannya sehingga dapat diambil tindakan cepat dalam mencegah kasus tersebut, Salah satu metode yang dapat digunakan yaitu metode K-Nearest Neighbor (KNN). K-NN merupakan suatu bentuk model pendukung keputusan yang dapat megklasifikasikan data berdasarkan jarak terdekat. Data yang digunakan dalam penelitian ini bersumber dari BPS Kabupaten Maluku Tenggara tahun 2021. Penelitian ini diperoleh hasil bahwa terdapat 2 kelompok penyebaran DBD di Kabupaten Maluku Tenggara yaitu Kecamatan yang berpotensi DBD tinggi yaitu terdiri dari Kecamatan Kei Besar, Kecamatan Kei Kecil dan Kecamatan Kei Besar Selatan Barat. Selanjutnya kecamatan yang berpotensi penyebaran DBD rendah yaitu Kecamatan Kei Besar Utara Barat, Kecamatan Kei Besar Selatan, Kecamatan Kei Besar Timur Selatan, Kecamatan Kei Besar Utara Timur, Kecamatan Hoat Sorbay, Kecamatan Kei Kecil Barat, Kecamatan Kei Kecil Timur dan Kecamatan Manyeuw.Kata Kunci: Demam Berdarah Dengue, K-Nearest Neighbor, Maluku Tenggara

    Comparative Study of Supervised Learning Methods for Malware Analysis, Journal of Telecommunications and Information Technology, 2014, nr 4

    Get PDF
    Malware is a software designed to disrupt or even damage computer system or do other unwanted actions. Nowadays, malware is a common threat of the World Wide Web. Anti-malware protection and intrusion detection can be significantly supported by a comprehensive and extensive analysis of data on the Web. The aim of such analysis is a classification of the collected data into two sets, i.e., normal and malicious data. In this paper the authors investigate the use of three supervised learning methods for data mining to support the malware detection. The results of applications of Support Vector Machine, Naive Bayes and k-Nearest Neighbors techniques to classification of the data taken from devices located in many units, organizations and monitoring systems serviced by CERT Poland are described. The performance of all methods is compared and discussed. The results of performed experiments show that the supervised learning algorithms method can be successfully used to computer data analysis, and can support computer emergency response teams in threats detection

    Efficiency and Accuracy Enhancement of Intrusion Detection System Using Feature Selection and Cross-layer Mechanism

    Get PDF
    The dramatic increase in the number of connected devices and the significant growth of the network traffic data have led to many security vulnerabilities and cyber-attacks. Hence, developing new methods to secure the network infrastructure and protect data from malicious and unauthorized access becomes a vital aspect of communication network design. Intrusion Detection Systems (IDSs), as common widely used security techniques, are critical to detect network attacks and unauthorized network access and thus minimize further cyber-attack damages. However, there are a number of weaknesses that need to be addressed to make reliable IDS for real-world applications. One of the fundamental challenges is the large number of redundant and non-relevant data. Feature selection emerges as a necessary step in efficient IDS design to overcome high dimensionality problem and enhance the performance of IDS through the reduction of its complexity and the acceleration of the detection process. Moreover, detection algorithm has significant impact on the performance of IDS. Machine learning techniques are widely used in such systems which is studied in details in this dissertation. One of the most destructive activities in wireless networks such as MANET is packet dropping. The existence of the intrusive attackers in the network is not the only cause of packet loss. In fact, packet drop can occur because of faulty network. Hence, in order detect the packet dropping caused by a malicious activity of an attacker, information from various layers of the protocol is needed to detect malicious packet loss effectively. To this end, a novel cross-layer design for malicious packet loss detection in MANET is proposed using features from physical layer, network layer and MAC layer to make a better detection decision. Trust-based mechanism is adopted in this design and a packet loss free routing algorithm is presented accordingly

    FAPRIL: Towards Faster Privacy-Preserving Fingerprint-Based Localization

    Get PDF
    Fingerprinting is a commonly used technique to provide accurate localization for indoor areas, where global navigation satellite systems, such as GPS and Galileo, cannot function or are not precise enough. Although fingerprint-based indoor localization has gained wide popularity, existing solutions that preserve privacy either rely on non-colluding servers or have high communication which hinder deployment. In this work we present FAPRIL, a privacy-preserving indoor localization scheme, which takes advantage of the latest secure two-party computation protocol improvements. We can split our scheme into two parts: an input independent setup phase and an online phase. We concentrate on optimizing the online phase for mobile clients who run on a mobile data plan and observe that recurring operands allow to optimize the total communication overhead even further. Our observation can be generalized, e.g., to improve multiplication of Arithmetic secret shared matrices. We implement FAPRIL on mobile devices and our benchmarks over a simulated LTE network show that the online phase of a private localization takes under 0.15 seconds with less than 0.20 megabytes of communication even for large buildings. The setup phase, which can be pre-computed, depends heavily on the setting but stays in the range 0.28 - 4.14 seconds and 0.69 - 16.00 megabytes per localization query. The round complexity of FAPRIL is constant for both phases

    Human sensing indoors in RF utilising unlabeled sensor streams

    Get PDF
    Indoor human sensing in radio frequencies is crucial for non-invasive, privacy-preserving digital healthcare, and machine learning is the backbone of such systems. Changes in the environment affect negatively the quality of learned mappings, which necessitates a semi-supervised approach that makes use of the unlabeled data stream to allow the learner to refine their hypothesis with time.We first explore the ambulation classification problem with frequency modulated continuous wave (FMCW) radar, replacing manual feature engineering by inductive bias in architectural choices of the neural network. We demonstrate that key ambulations: walk, bend, sit to stand and stand to sit can be distinguished with high accuracy. We then apply variational autoencoders to explore unsupervised localisation in synthetic grayscale images, finding that the goal is achievable with the choice of encoder that encodes temporal structure.Next, we evaluate temporal contrastive learning as the method of using unlabeled sensor streams in fingerprinting localisation, finding that it is a reliable method of defining a notion of pairwise distance on the data in that it improves the classification using the nearest neighbour classifier by both reducing the number of other-class items in same-class clusters, and increasing the pairwise distance contrast. Compared to the state of the art in fingerprinting localisation indoors, our contribution is that we successfully address the unsupervised domain adaptation problem.Finally, we raise the hypothesis that some knowledge can be shared between learners in different houses in a privacy-preserving manner. We adapt federated learning (FL) to the multi-residence indoor localisation scenario, which has not been done before, and propose a localfine-tuning algorithm with acceptance based on local validation error improvement. We find the tuned FL each client has a better personalised model compared to benchmark FL while keeping learning dynamics smooth for all clients

    A Web Based Solution to Track Trawl Vessel Activities Over Pipelines in Norwegian Continental Shelf

    Get PDF
    Master's thesis in Computer ScienceVessel Activities such as trawling and anchoring represent a risk to offshore marine structures such as pipelines, subsea structures, cables and platforms. Third party interference is a major contributor to the damage and failure statistics for subsea pipelines. Detecting such activity at an early stage, increases the probability of introducing cost efficient mitigation measures before costly repairs are necessary. The main goal of this study is to develop an interactive web-based solution to track and monitor trawl vessel activities in the Norwegian Continental Shelf which can be used for assessing integrity of pipelines. Vessels share their location and identity via the Universal Shipborne Automatic Identification System (AIS) over a 24-hour period, refreshing under different time intervals. Hence, there are billions of data points and terabytes of data to feed into our computer systems. Making sense of them poses many challenges, of which the main challenge is to identify the type of the fishing vessel. This problem is important because, identifying the vessel type forms the preliminary in recognizing trawling activities. Trawl patterns have shown to change over time and sometimes also because of a new pipeline being installed. The detailed information about the trawl activity is essential to have an accurate assessment of where to inspect and where to implement corrective intervention, based on up to date trawling intensity and equipment used. The main contribution of this thesis is to implement a machine learning approach to identify the type of fishing vessels and provide a web based solution to perform detailed analysis of trawl vessels activities over the pipelines for a chosen area of interest
    corecore