50 research outputs found
Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search
Earth Mover's Distance (EMD) is an important similarity measure between two
distributions, used in computer vision and many other application domains.
However, its exact calculation is computationally and memory intensive, which
hinders its scalability and applicability for large-scale problems. Various
approximate EMD algorithms have been proposed to reduce computational costs,
but they suffer lower accuracy and may require additional memory usage or
manual parameter tuning. In this paper, we present a novel approach, NNS-EMD,
to approximate EMD using Nearest Neighbor Search (NNS), in order to achieve
high accuracy, low time complexity, and high memory efficiency. The NNS
operation reduces the number of data points compared in each NNS iteration and
offers opportunities for parallel processing. We further accelerate NNS-EMD via
vectorization on GPU, which is especially beneficial for large datasets. We
compare NNS-EMD with both the exact EMD and state-of-the-art approximate EMD
algorithms on image classification and retrieval tasks. We also apply NNS-EMD
to calculate transport mapping and realize color transfer between images.
NNS-EMD can be 44x to 135x faster than the exact EMD implementation, and
achieves superior accuracy, speedup, and memory efficiency over existing
approximate EMD methods
Algorithmic Acceleration of B/FV-like Somewhat Homomorphic Encryption for Compute-Enabled RAM
Somewhat Homomorphic Encryption (SHE) allows arbitrary computation with nite multiplicative depths to be performed on encrypted data, but its overhead is high due to memory transfer incurred by large ciphertexts. Recent research has recognized the shortcomings of general-purpose computing for high-performance SHE, and has begun to pioneer the use of hardware-based SHE acceleration with hardware including FPGAs, GPUs, and Compute-Enabled RAM (CE-RAM). CERAM is well-suited for SHE, as it is not limited by the separation between memory and processing that bottlenecks other hardware. Further, CE-RAM does not move data between dierent processing elements. Recent research has shown the high eectiveness of CE-RAM for SHE as compared to highly-optimized CPU and FPGA implementations. However, algorithmic optimization for the implementation on CE-RAM is underexplored. In this work, we examine the eect of existing algorithmic optimizations upon a CE-RAM implementation of the B/FV scheme, and further introduce novel optimization techniques for the Full RNS Variant of B/FV. Our experiments show speedups of up to 784x for homomorphic multiplication, 143x for decryption, and 330x for encryption against a CPU implementation. We also compare our approach to similar work in CE-RAM, FPGA, and GPU acceleration, and note general improvement over existing work. In particular, for homomorphic multiplication we see speedups of 506.5x against CE-RAM, 66.85x against FPGA, and 30.8x against GPU as compared to existing work in hardware acceleration of B/FV
Emerging Technology Based Design of Primitives for Hardware Security
Hardware security concerns such as IP piracy and hardware Trojans have triggered research into circuit protection and malicious logic detection from various design perspectives. In this paper, emerging technologies are investigated by leveraging their unique properties for applications in the hardware security domain. Five example circuit structures including camouflaging gates, polymorphic gates, current/voltage based circuit protectors and current-based XOR logic are designed to prove the high efficiency of Silicon NanoWire FETs and Graphene SymFET in applications such as circuit protection and IP piracy prevention. Simulation results indicate that highly efficient and secure circuit structures can be achieved via the use of emerging technologies
Notes on Interconnection Networks for PIM 1
“On-chip interconnection networks are becoming increasingly important for SoCs and CMPs. Power and wire constraints are forcing the adoption of new design methodologies for systems-on-chip (SOC) – namely those that incorporate explicit parallelism. To enable these MP-SoC platforms, researchers have recently pursued scaleable communication centric fabrics, (i.e. networks-on-chip – NOC), which possess many features that are particularl