159 research outputs found
A Touch of Evil: High-Assurance Cryptographic Hardware from Untrusted Components
The semiconductor industry is fully globalized and integrated circuits (ICs)
are commonly defined, designed and fabricated in different premises across the
world. This reduces production costs, but also exposes ICs to supply chain
attacks, where insiders introduce malicious circuitry into the final products.
Additionally, despite extensive post-fabrication testing, it is not uncommon
for ICs with subtle fabrication errors to make it into production systems.
While many systems may be able to tolerate a few byzantine components, this is
not the case for cryptographic hardware, storing and computing on confidential
data. For this reason, many error and backdoor detection techniques have been
proposed over the years. So far all attempts have been either quickly
circumvented, or come with unrealistically high manufacturing costs and
complexity.
This paper proposes Myst, a practical high-assurance architecture, that uses
commercial off-the-shelf (COTS) hardware, and provides strong security
guarantees, even in the presence of multiple malicious or faulty components.
The key idea is to combine protective-redundancy with modern threshold
cryptographic techniques to build a system tolerant to hardware trojans and
errors. To evaluate our design, we build a Hardware Security Module that
provides the highest level of assurance possible with COTS components.
Specifically, we employ more than a hundred COTS secure crypto-coprocessors,
verified to FIPS140-2 Level 4 tamper-resistance standards, and use them to
realize high-confidentiality random number generation, key derivation, public
key decryption and signing. Our experiments show a reasonable computational
overhead (less than 1% for both Decryption and Signing) and an exponential
increase in backdoor-tolerance as more ICs are added
Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism
The demise of Moore's Law and Dennard Scaling has revived interest in
specialized computer architectures and accelerators. Verification and testing
of this hardware heavily uses cycle-accurate simulation of
register-transfer-level (RTL) designs. The best software RTL simulators can
simulate designs at 1--1000~kHz, i.e., more than three orders of magnitude
slower than hardware. Faster simulation can increase productivity by speeding
design iterations and permitting more exhaustive exploration.
One possibility is to use parallelism as RTL exposes considerable fine-grain
concurrency. However, state-of-the-art RTL simulators generally perform best
when single-threaded since modern processors cannot effectively exploit
fine-grain parallelism.
This work presents Manticore: a parallel computer designed to accelerate RTL
simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution
model to eliminate runtime synchronization barriers among many simple
processors. Manticore relies entirely on its compiler to schedule resources and
communication. Because RTL code is practically free of long divergent execution
paths, static scheduling is feasible. Communication and synchronization no
longer incur runtime overhead, enabling efficient fine-grain parallelism.
Moreover, static scheduling dramatically simplifies the physical
implementation, significantly increasing the potential parallelism on a chip.
Our 225-core FPGA prototype running at 475 MHz outperforms a state-of-the-art
RTL simulator on an Intel Xeon processor running at 3.3 GHz by up to
27.9 (geomean 5.3) in nine Verilog benchmarks
Design Space Exploration and Resource Management of Multi/Many-Core Systems
The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
Recommended from our members
Error-efficient computing systems
This survey explores the theory and practice of techniques to make computing systems faster or more energy-efficient by allowing them to make controlled errors. In the same way that systems which only use as much energy as necessary are referred to as being energy-efficient, you can think of the class of systems addressed by this survey as being error-efficient: They only prevent as many errors as they need to. The definition of what constitutes an error varies across the parts of a system. And the errors which are acceptable depend on the application at hand. In computing systems, making errors, when behaving correctly would be too expensive, can conserve resources. The resources conserved may be time: By making some errors, systems may be faster. The resource may also be energy: A system may use less power from its batteries or from the electrical grid by only avoiding certain errors while tolerating benign errors that are associated with reduced power consumption. The resource in question may be an even more abstract quantity such as consistency of ordering of the outputs of a system. This survey is for anyone interested in an end-to-end view of one set of techniques that address the theory and practice of making computing systems more efficient by trading errors for improved efficiency
New Logic Synthesis As Nanotechnology Enabler (invited paper)
Nanoelectronics comprises a variety of devices whose electrical properties are more complex as compared to CMOS, thus enabling new computational paradigms. The potentially large space for innovation has to be explored in the search for technologies that can support large-scale and high- performance circuit design. Within this space, we analyze a set of emerging technologies characterized by a similar computational abstraction at the design level, i.e., a binary comparator or a majority voter. We demonstrate that new logic synthesis techniques, natively supporting this abstraction, are the technology enablers. We describe models and data-structures for logic design using emerging technologies and we show results of applying new synthesis algorithms and tools. We conclude that new logic synthesis methods are required to both evaluate emerging technologies and to achieve the best results in terms of area, power and performance
Business analytics in industry 4.0: a systematic review
Recently, the term “Industry 4.0” has emerged to characterize several Information Technology and Communication (ICT) adoptions in production processes (e.g., Internet-of-Things, implementation of digital production support information technologies). Business Analytics is often used within the Industry 4.0, thus incorporating its data intelligence (e.g., statistical analysis, predictive modelling, optimization) expert system component. In this paper, we perform a Systematic Literature Review (SLR) on the usage of Business Analytics within the Industry 4.0 concept, covering a selection of 169 papers obtained from six major scientific publication sources from 2010 to March 2020. The selected papers were first classified in three major types, namely, Practical Application, Reviews and Framework Proposal. Then, we analysed with more detail the practical application studies which were further divided into three main categories of the Gartner analytical maturity model, Descriptive Analytics, Predictive Analytics and Prescriptive Analytics. In particular, we characterized the distinct analytics studies in terms of the industry application and data context used, impact (in terms of their Technology Readiness Level) and selected data modelling method. Our SLR analysis provides a mapping of how data-based Industry 4.0 expert systems are currently used, disclosing also research gaps and future research opportunities.The work of P. Cortez was supported by FCT - Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020. We
would like to thank to the three anonymous reviewers for their helpful suggestions
Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing
How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one
Dynamic task scheduling and binding for many-core systems through stream rewriting
This thesis proposes a novel model of computation, called stream rewriting, for the specification and implementation of highly concurrent applications. Basically, the active tasks of an application and their dependencies are encoded as a token stream, which is iteratively modified by a set of rewriting rules at runtime. In order to estimate the performance and scalability of stream rewriting, a large number of experiments have been evaluated on many-core systems and the task management has been implemented in software and hardware.In dieser Dissertation wurde Stream Rewriting als eine neue Methode entwickelt, um Anwendungen mit einer großen Anzahl von dynamischen Tasks zu beschreiben und effizient zur Laufzeit verwalten zu können. Dabei werden die aktiven Tasks in einem Datenstrom verpackt, der zur Laufzeit durch wiederholtes Suchen und Ersetzen umgeschrieben wird. Um die Performance und Skalierbarkeit zu bestimmen, wurde eine Vielzahl von Experimenten mit Many-Core-Systemen durchgeführt und die Verwaltung von Tasks über Stream Rewriting in Software und Hardware implementiert
REED: Chiplet-Based Scalable Hardware Accelerator for Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) has emerged as a promising technology for processing encrypted data without the need for decryption. Despite its potential, its practical implementation has faced challenges due to substantial computational overhead. To address this issue, we propose the chiplet-based FHE accelerator design `REED\u27, which enables scalability and offers high throughput, thereby enhancing homomorphic encryption deployment in real-world scenarios. It incorporates well-known wafer yield issues during fabrication which significantly impacts production costs. In contrast to state-of-the-art approaches, we also address data exchange overhead by proposing a non-blocking inter-chiplet communication strategy. We incorporate novel pipelined Number Theoretic Transform and automorphism techniques, leveraging parallelism and providing high throughput.
Experimental results demonstrate that REED 2.5D integrated circuit consumes 177 mm chip area, 82.5 W average power in 7nm technology, and achieves an impressive speedup of up to 5,982 compared to a CPU (24-core 2Intel X5690), and 2 better energy efficiency and 50\% lower development cost than state-of-the-art ASIC accelerator. To evaluate its practical impact, we are the to benchmark an encrypted deep neural network training. Overall, this work successfully enhances the practicality and deployability of fully homomorphic encryption in real-world scenarios
- …