43 research outputs found

    Application live-upgrading and error-recovery using code-data decoupling

    Get PDF
    When applications have critical bugs that present security vulnerabilities or may result in serious failures with potential massive business level impact, these applications have to be updated as fast as possible to minimize the harm of the bug. However, mission-critical or other user-facing applications may maintain critical internal state that has to be serialized and restored during the update process introducing signi1cant cost and delay. Instead of serializing the internal state we propose to implement applications in such a way that the application state is fully decoupled (e.g. in a different address space or shared memory segment) from the application logic. Such a decoupling allows for example that upgrades can happen without serialization of the data, even allowing side-by-side execution of the updated and the failing version of the application and thereby reducing application downtime during the update process. Furthermore, this decoupling also allows applications to recover easily from failures by recovering the previous data of the crashed application instance

    X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs

    Full text link
    Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption

    Embedded Computing

    No full text

    DESIGN AND CHARACTERIZATION OF A STANDARD CELL SET FOR DELAY INSENSITIVE VLSI DESIGN

    No full text
    A working synthesis system for delay insensitive (DI) VLSI design is used as a case study to investigate the correspondence between theoretical formalization and electric circuit operation. Most of the previous research has treated DI VLSI design from a formal point of view. We illustrate the new features involved in the electrical design and characterization of DI cells, reporting circuit schematic and standard cell characterization results. Some integrated circuits built with the cells have been fabricated

    DELAY INSENSITIVE MICRO-PIPELINED COMBINATIONAL LOGIC

    No full text
    Considerable efforts are being done in developing synthesis systems for hybrid asynchronous circuits, that is involving delay insensitive (DI) and non-DI parts. This paper presents a micro-architecture design methodology the gets that benefits of a DI control-path and a self-timed data-path. The control-path is automatically synthesized as a purely DI circuit from a behavioral specification. The data-path is partially designed by using locally clocked functional blocks, for registers and multiplexers, and partially by using DI combinational units. In particular, we focus on pipeline combinational units, composed of dedicated standard cells implementing Boolean functions in the double-rail convention. Each cell has a storage element, governed by request/acknowledge signals, allowing us to realize a DI micropipeline in which a stage is a single logic gate. No hardwired delays are needed. The approach is well suited for automated design using extant synthesis and optimization tools for combinational logic. A first example of utilization is reported, to evaluate performance and cost
    corecore