4,622 research outputs found

    DFT and BIST of a multichip module for high-energy physics experiments

    Get PDF
    Engineers at Politecnico di Torino designed a multichip module for high-energy physics experiments conducted on the Large Hadron Collider. An array of these MCMs handles multichannel data acquisition and signal processing. Testing the MCM from board to die level required a combination of DFT strategie

    dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter

    Get PDF
    Current datacenters are based on server machines, whose mainboard and hardware components form the baseline, monolithic building block that the rest of the system software, middleware and application stack are built upon. This leads to the following limitations: (a) resource proportionality of a multi-tray system is bounded by the basic building block (mainboard), (b) resource allocation to processes or virtual machines (VMs) is bounded by the available resources within the boundary of the mainboard, leading to spare resource fragmentation and inefficiencies, and (c) upgrades must be applied to each and every server even when only a specific component needs to be upgraded. The dRedBox project (Disaggregated Recursive Datacentre-in-a-Box) addresses the above limitations, and proposes the next generation, low-power, across form-factor datacenters, departing from the paradigm of the mainboard-as-a-unit and enabling the creation of function-block-as-a-unit. Hardware-level disaggregation and software-defined wiring of resources is supported by a full-fledged Type-1 hypervisor that can execute commodity virtual machines, which communicate over a low-latency and high-throughput software-defined optical network. To evaluate its novel approach, dRedBox will demonstrate application execution in the domains of network functions virtualization, infrastructure analytics, and real-time video surveillance.This work has been supported in part by EU H2020 ICTproject dRedBox, contract #687632.Peer ReviewedPostprint (author's final draft

    The CMS Event Builder

    Full text link
    The data acquisition system of the CMS experiment at the Large Hadron Collider will employ an event builder which will combine data from about 500 data sources into full events at an aggregate throughput of 100 GByte/s. Several architectures and switch technologies have been evaluated for the DAQ Technical Design Report by measurements with test benches and by simulation. This paper describes studies of an EVB test-bench based on 64 PCs acting as data sources and data consumers and employing both Gigabit Ethernet and Myrinet technologies as the interconnect. In the case of Ethernet, protocols based on Layer-2 frames and on TCP/IP are evaluated. Results from ongoing studies, including measurements on throughput and scaling are presented. The architecture of the baseline CMS event builder will be outlined. The event builder is organised into two stages with intelligent buffers in between. The first stage contains 64 switches performing a first level of data concentration by building super-fragments from fragments of 8 data sources. The second stage combines the 64 super-fragments into full events. This architecture allows installation of the second stage of the event builder in steps, with the overall throughput scaling linearly with the number of switches in the second stage. Possible implementations of the components of the event builder are discussed and the expected performance of the full event builder is outlined.Comment: Conference CHEP0

    A Benes Based NoC Switching Architecture for Mixed Criticality Embedded Systems

    Get PDF
    Multi-core, Mixed Criticality Embedded (MCE) real-time systems require high timing precision and predictability to guarantee there will be no interference between tasks. These guarantees are necessary in application areas such as avionics and automotive, where task interference or missed deadlines could be catastrophic, and safety requirements are strict. In modern multi-core systems, the interconnect becomes a potential point of uncertainty, introducing major challenges in proving behaviour is always within specified constraints, limiting the means of growing system performance to add more tasks, or provide more computational resources to existing tasks. We present MCENoC, a Network-on-Chip (NoC) switching architecture that provides innovations to overcome this with predictable, formally verifiable timing behaviour that is consistent across the whole NoC. We show how the fundamental properties of Benes networks benefit MCE applications and meet our architecture requirements. Using SystemVerilog Assertions (SVA), formal properties are defined that aid the refinement of the specification of the design as well as enabling the implementation to be exhaustively formally verified. We demonstrate the performance of the design in terms of size, throughput and predictability, and discuss the application level considerations needed to exploit this architecture

    Automated CNN pipeline generation for heterogeneous architectures

    Get PDF
    Heterogeneity is a vital feature in emerging processor chip designing. Asymmetric multicore-clusters such as high-performance cluster and power efficient cluster are common in modern edge devices. One example is Intel\u27s Alder Lake featuring Golden Cove high-performance cores and Gracemont power-efficient cores. Chiplet-based technology allows organization of multi cores in form of multi-chip-modules, thus housing large number of cores in a processor. Interposer based packaging has enabled embedding High Bandwidth Memory (HBM) on chip and reduced transmission latency and energy consumption of chiplet-chiplet interconnect.\ua0For Instance Intel\u27s XeHPC Ponte Vecchio package integrates multi-chip GPU organization along with HBM modules.Since new devices feature heterogeneity at the level of cores, memory and on-chip interconnect, it has become important to steer optimization at application level in order to leverage the new heterogeneous, high-performing and power-efficient features of underlying computing platforms. An important high-performance application paradigm is Convolution Neural Networks (CNN). CNNs are widely used in many practical applications. The pipelined parallel implementation of CNN is favored for inference on edge devices. In this Licentiate thesis we present a novel scheme for automatic scheduling of CNN pipelines on heterogeneous devices. A pipeline schedule is a configuration that provides information on depth of pipeline, grouping of CNN layers into pipeline stages and mapping of pipeline stages onto computing units. We utilize simple compile-time hints which consists of workload information of individual CNN layers and performance hints of computing units.The proposed approach provides near optimal solution for a throughput maximizing pipeline. We model the problem as a design space exploration technique. We developed a time-efficient design space navigation through heuristics extracted from the knowledge of CNN structure and underlying computing platform. The proposed search scheme converges faster and utilizes real-time performance measurements as fitness values. The results demonstrate that the proposed scheme converges faster and can scale when used with larger networks and computing platforms. Since the scheme utilizes online performance measurements, one of the challenges is to avoid expensive configurations during online tuning. The results demonstrate that on average, ~80\% of the tested configurations are sub-optimal solutions.Another challenge is to reduce convergence time. The experiments show that proposed approach is 35x faster than stochastic optimization algorithms. Since the design space is large and complex, We show that the proposed scheme explores only ~0.1% of the total design space in case of large CNNs (having 50+ layers) and results in near-optimal solution

    Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment

    Get PDF
    A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control
    • …
    corecore