50 research outputs found

    Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search

    Full text link
    Earth Mover's Distance (EMD) is an important similarity measure between two distributions, used in computer vision and many other application domains. However, its exact calculation is computationally and memory intensive, which hinders its scalability and applicability for large-scale problems. Various approximate EMD algorithms have been proposed to reduce computational costs, but they suffer lower accuracy and may require additional memory usage or manual parameter tuning. In this paper, we present a novel approach, NNS-EMD, to approximate EMD using Nearest Neighbor Search (NNS), in order to achieve high accuracy, low time complexity, and high memory efficiency. The NNS operation reduces the number of data points compared in each NNS iteration and offers opportunities for parallel processing. We further accelerate NNS-EMD via vectorization on GPU, which is especially beneficial for large datasets. We compare NNS-EMD with both the exact EMD and state-of-the-art approximate EMD algorithms on image classification and retrieval tasks. We also apply NNS-EMD to calculate transport mapping and realize color transfer between images. NNS-EMD can be 44x to 135x faster than the exact EMD implementation, and achieves superior accuracy, speedup, and memory efficiency over existing approximate EMD methods

    Algorithmic Acceleration of B/FV-like Somewhat Homomorphic Encryption for Compute-Enabled RAM

    Get PDF
    Somewhat Homomorphic Encryption (SHE) allows arbitrary computation with nite multiplicative depths to be performed on encrypted data, but its overhead is high due to memory transfer incurred by large ciphertexts. Recent research has recognized the shortcomings of general-purpose computing for high-performance SHE, and has begun to pioneer the use of hardware-based SHE acceleration with hardware including FPGAs, GPUs, and Compute-Enabled RAM (CE-RAM). CERAM is well-suited for SHE, as it is not limited by the separation between memory and processing that bottlenecks other hardware. Further, CE-RAM does not move data between dierent processing elements. Recent research has shown the high eectiveness of CE-RAM for SHE as compared to highly-optimized CPU and FPGA implementations. However, algorithmic optimization for the implementation on CE-RAM is underexplored. In this work, we examine the eect of existing algorithmic optimizations upon a CE-RAM implementation of the B/FV scheme, and further introduce novel optimization techniques for the Full RNS Variant of B/FV. Our experiments show speedups of up to 784x for homomorphic multiplication, 143x for decryption, and 330x for encryption against a CPU implementation. We also compare our approach to similar work in CE-RAM, FPGA, and GPU acceleration, and note general improvement over existing work. In particular, for homomorphic multiplication we see speedups of 506.5x against CE-RAM, 66.85x against FPGA, and 30.8x against GPU as compared to existing work in hardware acceleration of B/FV

    Emerging Technology Based Design of Primitives for Hardware Security

    Get PDF
    Hardware security concerns such as IP piracy and hardware Trojans have triggered research into circuit protection and malicious logic detection from various design perspectives. In this paper, emerging technologies are investigated by leveraging their unique properties for applications in the hardware security domain. Five example circuit structures including camouflaging gates, polymorphic gates, current/voltage based circuit protectors and current-based XOR logic are designed to prove the high efficiency of Silicon NanoWire FETs and Graphene SymFET in applications such as circuit protection and IP piracy prevention. Simulation results indicate that highly efficient and secure circuit structures can be achieved via the use of emerging technologies

    Notes on Interconnection Networks for PIM 1

    No full text
    “On-chip interconnection networks are becoming increasingly important for SoCs and CMPs. Power and wire constraints are forcing the adoption of new design methodologies for systems-on-chip (SOC) – namely those that incorporate explicit parallelism. To enable these MP-SoC platforms, researchers have recently pursued scaleable communication centric fabrics, (i.e. networks-on-chip – NOC), which possess many features that are particularl
    corecore