200 research outputs found
Hardware-Aware Static Optimization of Hyperdimensional Computations
Binary spatter code (BSC)-based hyperdimensional computing (HDC) is a highly
error-resilient approximate computational paradigm suited for error-prone,
emerging hardware platforms. In BSC HDC, the basic datatype is a hypervector, a
typically large binary vector, where the size of the hypervector has a
significant impact on the fidelity and resource usage of the computation.
Typically, the hypervector size is dynamically tuned to deliver the desired
accuracy; this process is time-consuming and often produces hypervector sizes
that lack accuracy guarantees and produce poor results when reused for very
similar workloads. We present Heim, a hardware-aware static analysis and
optimization framework for BSC HD computations. Heim analytically derives the
minimum hypervector size that minimizes resource usage and meets the target
accuracy requirement. Heim guarantees the optimized computation converges to
the user-provided accuracy target on expectation, even in the presence of
hardware error. Heim deploys a novel static analysis procedure that unifies
theoretical results from the neuroscience community to systematically optimize
HD computations.
We evaluate Heim against dynamic tuning-based optimization on 25 benchmark
data structures. Given a 99% accuracy requirement, Heim-optimized computations
achieve a 99.2%-100.0% median accuracy, up to 49.5% higher than dynamic
tuning-based optimization, while achieving 1.15x-7.14x reductions in
hypervector size compared to HD computations that achieve comparable query
accuracy and finding parametrizations 30.0x-100167.4x faster than dynamic
tuning-based approaches. We also use Heim to systematically evaluate the
performance benefits of using analog CAMs and multiple-bit-per-cell ReRAM over
conventional hardware, while maintaining iso-accuracy -- for both emerging
technologies, we find usages where the emerging hardware imparts significant
benefits
Large-scale memristive associative memories
Associative memories, in contrast to conventional address-based memories, are inherently fault-tolerant and allow retrieval of data based on partial search information. This paper considers the possibility of implementing large-scale associative memories through memristive devices jointly with CMOS circuitry. An advantage of a memristive associative memory is that the memory elements are located physically above the CMOS layer, which yields more die area for the processing elements realized in CMOS. This allows for high-capacity memories even while using an older CMOS technology, as the capacity of the memory depends more on the feature size of the memristive crossbar than on that of the CMOS components. In this paper, we propose the memristive implementations, and present simulations and error analysis of the autoassociative content-addressable memory, the Willshaw memory, and the sparse distributed memory. Furthermore, we present a CMOS cell that can be used to implement the proposed memory architectures.</div
QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning
Machine Learning algorithms based on Brain-inspired Hyperdimensional (HD)
computing imitate cognition by exploiting statistical properties of
high-dimensional vector spaces. It is a promising solution for achieving high
energy-efficiency in different machine learning tasks, such as classification,
semi-supervised learning and clustering. A weakness of existing HD
computing-based ML algorithms is the fact that they have to be binarized for
achieving very high energy-efficiency. At the same time, binarized models reach
lower classification accuracies. To solve the problem of the trade-off between
energy-efficiency and classification accuracy, we propose the QubitHD
algorithm. It stochastically binarizes HD-based algorithms, while maintaining
comparable classification accuracies to their non-binarized counterparts. The
FPGA implementation of QubitHD provides a 65% improvement in terms of
energy-efficiency, and a 95% improvement in terms of the training time, as
compared to state-of-the-art HD-based ML algorithms. It also outperforms
state-of-the-art low-cost classifiers (like Binarized Neural Networks) in terms
of speed and energy-efficiency by an order of magnitude during training and
inference.Comment: 8 pages, 7 figures, 3 table
Hardware-Based Authentication for the Internet of Things
Entity authentication is one of the most fundamental problems in computer security. Implementation of any authentication protocol requires the solution of several sub-problems, such as the problems regarding secret sharing, key generation, key storage and key verification. With the advent of the Internet-of-Things(IoT), authentication becomes a pivotal concern in the security of IoT systems. Interconnected components of IoT devices normally contains sensors, actuators, relays, and processing and control equipment that are designed with the limited budget on power, cost and area. As a result, incorporating security protocols in such resource constrained IoT components can be rather challenging. To address this issue, in this dissertation, we design and develop hardware oriented lightweight protocols for the authentication of users, devices and data. These protocols utilize physical properties of memory components, computing units, and hardware clocks on the IoT device.
Recent works on device authentication using physically uncloneable functions can render the problem of entity authentication and verification based on the hardware properties tractable. Our studies reveal that non-linear characteristics of resistive memories can be useful in solving several problems regarding authentication. Therefore, in this dissertation, first we explore the ideas of secret sharing using threshold circuits and non-volatile memory components. Inspired by the concepts of visual cryptography, we identify the promises of resistive memory based circuits in lightweight secret sharing and multi-user authentication. Furthermore, the additive and monotonic properties of non-volatile memory components can be useful in addressing the challenges of key storage. Overall, in the first part of this dissertation, we present our research on design of low-cost, non-crypto based user authentication schemes using physical properties of a resistive memory based system.
In the second part of the dissertation, we demonstrate that in computational units, the emerging voltage over-scaling (VOS)-based computing leaves a process variation dependent error signature in the approximate results. Current research works in VOS focus on reducing these errors to provide acceptable results from the computation point of view. Interestingly, with extreme VOS, these errors can also reveal significant information about the underlying physical system and random variations therein. As a result, these errors can be methodically profiled to extract information about the process variation in a computational unit. Therefore, in this dissertation, we also employ error profiling techniques along with the basic key-based authentication schemes to create lightweight device authentication protocols.
Finally, intrinsic properties of hardware clocks can provide novel ways of device fingerprinting and authentication. The clock signatures can be used for real-time authentication of electromagnetic signals where some temporal properties of the signal are known. In the last part of this dissertation, we elaborate our studies on data authentication using hardware clocks. As an example, we propose a GPS signature authentication and spoofing detection technique using physical properties such as the frequency skew and drift of hardware clocks in GPS receivers
An efficient logic fault diagnosis framework based on effect-cause approach
Fault diagnosis plays an important role in improving the circuit design process and the
manufacturing yield. With the increasing number of gates in modern circuits, determining
the source of failure in a defective circuit is becoming more and more challenging.
In this research, we present an efficient effect-cause diagnosis framework for
combinational VLSI circuits. The framework consists of three stages to obtain an accurate
and reasonably precise diagnosis. First, an improved critical path tracing algorithm is
proposed to identify an initial suspect list by backtracing from faulty primary outputs
toward primary inputs. Compared to the traditional critical path tracing approach, our
algorithm is faster and exact. Second, a novel probabilistic ranking model is applied to
rank the suspects so that the most suspicious one will be ranked at or near the top. Several
fast filtering methods are used to prune unrelated suspects. Finally, to refine the diagnosis,
fault simulation is performed on the top suspect nets using several common fault models.
The difference between the observed faulty behavior and the simulated behavior is used to rank each suspect. Experimental results on ISCAS85 benchmark circuits show that this
diagnosis approach is efficient both in terms of memory space and CPU time and the
diagnosis results are accurate and reasonably precise
WHYPE: A Scale-Out Architecture with Wireless Over-the-Air Majority for Scalable In-memory Hyperdimensional Computing
Hyperdimensional computing (HDC) is an emerging computing paradigm that
represents, manipulates, and communicates data using long random vectors known
as hypervectors. Among different hardware platforms capable of executing HDC
algorithms, in-memory computing (IMC) has shown promise as it is very efficient
in performing matrix-vector multiplications, which are common in the HDC
algebra. Although HDC architectures based on IMC already exist, how to scale
them remains a key challenge due to collective communication patterns that
these architectures required and that traditional chip-scale networks were not
designed for. To cope with this difficulty, we propose a scale-out HDC
architecture called WHYPE, which uses wireless in-package communication
technology to interconnect a large number of physically distributed IMC cores
that either encode hypervectors or perform multiple similarity searches in
parallel. In this context, the key enabler of WHYPE is the opportunistic use of
the wireless network as a medium for over-the-air computation. WHYPE implements
an optimized source coding that allows receivers to calculate the bit-wise
majority of multiple hypervectors (a useful operation in HDC) being transmitted
concurrently over the wireless channel. By doing so, we achieve a joint
broadcast distribution and computation with a performance and efficiency
unattainable with wired interconnects, which in turn enables massive
parallelization of the architecture. Through evaluations at the on-chip network
and complete architecture levels, we demonstrate that WHYPE can bundle and
distribute hypervectors faster and more efficiently than a hypothetical wired
implementation, and that it scales well to tens of receivers. We show that the
average error rate of the majority computation is low, such that it has
negligible impact on the accuracy of HDC classification tasks.Comment: Accepted at IEEE Journal on Emerging and Selected Topics in Circuits
and Systems (JETCAS). arXiv admin note: text overlap with arXiv:2205.1088
Recommended from our members
Threat Analysis, Countermeaures and Design Strategies for Secure Computation in Nanometer CMOS Regime
Advancements in CMOS technologies have led to an era of Internet Of Things (IOT), where the devices have the ability to communicate with each other apart from their computational power. As more and more sensitive data is processed by embedded devices, the trend towards lightweight and efficient cryptographic primitives has gained significant momentum. Achieving a perfect security in silicon is extremely difficult, as the traditional cryptographic implementations are vulnerable to various active and passive attacks. There is also a threat in the form of hardware Trojans inserted into the supply chain by the untrusted third-party manufacturers for economic incentives. Apart from the threats in various forms, some of the embedded security applications such as random number generators (RNGs) suffer from the impacts of process variations and noise in nanometer CMOS. Despite their disadvantages, the random and unique nature of process variations can be exploited for generating unique identifiers and can be of tremendous use in embedded security.
In this dissertation, we explore techniques for precise fault-injection in cryptographic hardware based on voltage/temperature manipulation and hardware Trojan insertion. We demonstrate the effectiveness of these techniques by mounting fault attacks on state-of-the-art ciphers. Physically Unclonable Functions (PUFs) are novel cryptographic primitives for extracting secret keys from complex manufacturing variations in integrated circuits (ICs). We explore the vulnerabilities of some of the popular strong PUF architectures to modeling attacks using Machine Learning (ML) algorithms. The attacks use silicon data from a test chip manufactured in IBM 32nm silicon-on-insulator (SOI) technology. Attack results demonstrate that the majority of strong PUF architectures can be predicted to very high accuracies using limited training data. We also explore the techniques to exploit unreliable data from strong PUF architectures and effectively use them to improve the prediction accuracies of modeling attacks. Motivated by the vulnerabilities of existing PUF architectures, we present a novel modeling attack resistant PUF architecture based on non-linear computing elements. Post-silicon validation results are used to demonstrate the effectiveness of the non-linear PUF architecture against modeling and fault-injection attacks. Apart from the techniques to improve the security of PUF circuits, we also present novel solutions to improve the performance of PUF circuits from the perspectives of IC fabrication and system/protocol design. Finally, we present a statistical benchmark suite to evaluate PUFs in conceptualization phase and also to enable fine-grained security assessments for varying PUF parameters. Data compressibility analyses for validating the statistical benchmark suite are also presented
Memristive Circuits for LDPC Decoding
We present design principles for implementing decoders for low-density parity check codes in CMOL-type memristive circuits. The programmable nonvolatile connectivity enabled by the nanowire arrays in such circuits is used to map the parity check matrix of an LDPC code in the decoder, while decoding operations are realized by a cellular CMOS circuit structure. We perform detailed performance analysis and circuit simulations of example decoders, and estimate how CMOL and memristor characteristics such as the memristor OFF/ON resistance ratio, nanowire resistance, and the total capacitance of the nanowire array affect decoder specification and performance. We also analyze how variation in circuit characteristics and persistent device defects affect the decoders.</div
- …