443 research outputs found

    The Variance of the Number of 2-Protected Nodes in a Trie

    Full text link
    We derive an asymptotic expression for the variance of the number of 2-protected nodes (neither leaves nor parents of leaves) in a binary trie. In an unbiased trie on n leaves we find, for example, that the vari-ance is approximately.934n plus small fluctuations (also of order n); but our result covers the general (biased) case as well. Our proof relies on the asymp-totic similarities between a trie and its Poissonized counterpart, whose behavior we glean via the Mellin transform and singularity analysis

    Fast Approximate Reconciliation of Set Differences

    Full text link
    We present new, simple, efficient data structures for approximate reconciliation of set differences, a useful standalone primitive for peer-to-peer networks and a natural subroutine in methods for exact reconciliation. In the approximate reconciliation problem, peers A and B respectively have subsets of elements SA and SB of a large universe U. Peer A wishes to send a short message M to peer B with the goal that B should use M to determine as many elements in the set SB–SA as possible. To avoid the expense of round trip communication times, we focus on the situation where a single message M is sent. We motivate the performance tradeoffs between message size, accuracy and computation time for this problem with a straightforward approach using Bloom filters. We then introduce approximation reconciliation trees, a more computationally efficient solution that combines techniques from Patricia tries, Merkle trees, and Bloom filters. We present an analysis of approximation reconciliation trees and provide experimental results comparing the various methods proposed for approximate reconciliation.National Science Foundation (ANI-0093296, ANI-9986397, CCR-0118701, CCR-0121154); Alfred P. Sloan Research Fellowshi

    Secure Hot Path Crowdsourcing with Local Differential Privacy under Fog Computing Architecture

    Full text link
    Crowdsourcing plays an essential role in the Internet of Things (IoT) for data collection, where a group of workers is equipped with Internet-connected geolocated devices to collect sensor data for marketing or research purpose. In this paper, we consider crowdsourcing these worker's hot travel path. Each worker is required to report his real-time location information, which is sensitive and has to be protected. Encryption-based methods are the most direct way to protect the location, but not suitable for resource-limited devices. Besides, local differential privacy is a strong privacy concept and has been deployed in many software systems. However, the local differential privacy technology needs a large number of participants to ensure the accuracy of the estimation, which is not always the case for crowdsourcing. To solve this problem, we proposed a trie-based iterative statistic method, which combines additive secret sharing and local differential privacy technologies. The proposed method has excellent performance even with a limited number of participants without the need of complex computation. Specifically, the proposed method contains three main components: iterative statistics, adaptive sampling, and secure reporting. We theoretically analyze the effectiveness of the proposed method and perform extensive experiments to show that the proposed method not only provides a strict privacy guarantee, but also significantly improves the performance from the previous existing solutions.Comment: This paper appears in IEEE Transactions on Services Computing. https://doi.org/10.1109/TSC.2020.303933

    Anomaly-Based Intrusion Detection by Modeling Probability Distributions of Flow Characteristics

    Get PDF
    In recent years, with the increased use of network communication, the risk of compromising the information has grown immensely. Intrusions have evolved and become more sophisticated. Hence, classical detection systems show poor performance in detecting novel attacks. Although much research has been devoted to improving the performance of intrusion detection systems, few methods can achieve consistently efficient results with the constant changes in network communications. This thesis proposes an intrusion detection system based on modeling distributions of network flow statistics in order to achieve a high detection rate for known and stealthy attacks. The proposed model aggregates the traffic at the IP subnetwork level using a hierarchical heavy hitters algorithm. This aggregated traffic is used to build the distribution of network statistics for the most frequent IPv4 addresses encountered as destination. The obtained probability density functions are learned by the Extreme Learning Machine method which is a single-hidden layer feedforward neural network. In this thesis, different sequential and batch learning strategies are proposed in order to analyze the efficiency of this proposed approach. The performance of the model is evaluated on the ISCX-IDS 2012 dataset consisting of injection attacks, HTTP flooding, DDoS and brute force intrusions. The experimental results of the thesis indicate that the presented method achieves an average detection rate of 91% while having a low misclassification rate of 9%, which is on par with the state-of-the-art approaches using this dataset. In addition, the proposed method can be utilized as a network behavior analysis tool specifically for DDoS mitigation, since it can isolate aggregated IPv4 addresses from the rest of the network traffic, thus supporting filtering out DDoS attacks

    Deployable filtering architectures against large denial-of-service attacks

    Get PDF
    Denial-of-Service attacks continue to grow in size and frequency despite serious underreporting. While several research solutions have been proposed over the years, they have had important deployment hurdles that have prevented them from seeing any significant level of deployment on the Internet. Commercial solutions exist, but they are costly and generally are not meant to scale to Internet-wide levels. In this thesis we present three filtering architectures against large Denial-of-Service attacks. Their emphasis is in providing an effective solution against such attacks while using simple mechanisms in order to overcome the deployment hurdles faced by other solutions. While these are well-suited to being implemented in fast routing hardware, in the early stages of deployment this is unlikely to be the case. Because of this, we implemented them on low-cost off-the-shelf hardware and evaluated their performance on a network testbed. The results are very encouraging: this setup allows us to forward traffic on a single PC at rates of millions of packets per second even for minimum-sized packets, while at the same time processing as many as one million filters; this gives us confidence that the architecture as a whole could combat even the large botnets currently being reported. Better yet, we show that this single-PC performance scales well with the number of CPU cores and network interfaces, which is promising for our solutions if we consider the current trend in processor design. In addition to using simple mechanisms, we discuss how the architectures provide clear incentives for ISPs that adopt them early, both at the destination as well as at the sources of attacks. The hope is that these will be sufficient to achieve some level of initial deployment. The larger goal is to have an architectural solution against large DoS deployed in place before even more harmful attacks take place; this thesis is hopefully a step in that direction

    Differentially Private Vertical Federated Clustering

    Full text link
    In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running any k-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the final k-means utility to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques

    Exploring algorithms to recognize similar board states in Arimaa

    Get PDF
    The game of Arimaa was invented as a challenge to the field of game-playing artificial intelligence, which had grown somewhat haughty after IBM\u27s supercomputer Deep Blue trounced world champion Kasparov at chess. Although Arimaa is simple enough for a child to learn and can be played with an ordinary chess set, existing game-playing algorithms and techniques have had a difficult time rising up to the challenge of defeating the world\u27s best human Arimaa players, mainly due to the game\u27s impressive branching factor. This thesis introduces and analyzes new algorithms and techniques that attempt to recognize similar board states based on relative piece strength in a concentrated area of the board. Using this data, game-playing programs would be able to recognize patterns in order to discern tactics and moves that could lead to victory or defeat in similar situations based on prior experience

    Normal Limiting Distribution of the Size of Binary Interval Trees

    Get PDF
    The limiting distribution of the size of binary interval tree is investigated. Our illustration is based on the contraction method, and it is quite different from the case in one-sided binary interval tree. First, we build a distributional recursive equation of the size. Then, we draw the expectation, the variance, and some high order moments. Finally, it is shown that the size (with suitable standardization) approaches the standard normal random variable in the Zolotarev metric space
    corecore