172,495 research outputs found

    Studies on Core-Based Testing of System-on-Chips Using Functional Bus and Network-on-Chip Interconnects

    Get PDF
    The tests of a complex system such as a microprocessor-based system-onchip (SoC) or a network-on-chip (NoC) are difficult and expensive. In this thesis, we propose three core-based test methods that reuse the existing functional interconnects-a flat bus, hierarchical buses of multiprocessor SoC's (MPSoC), and a N oC-in order to avoid the silicon area cost of a dedicated test access mechanism (TAM). However, the use of functional interconnects as functional TAM's introduces several new problems. During tests, the interconnects-including the bus arbitrator, the bus bridges, and the NoC routers-operate in the functional mode to transport the test stimuli and responses, while the core under tests (CUT) operate in the test mode. Second, the test data is transported to the CUT through the functional bus, and not directly to the test port. Therefore, special core test wrappers that can provide the necessary control signals required by the different functional interconnect are proposed. We developed two types of wrappers, one buffer-based wrapper for the bus-based systems and another pair of complementary wrappers for the NoCbased systems. Using the core test wrappers, we propose test scheduling schemes for the three functionally different types of interconnects. The test scheduling scheme for a flat bus is developed based on an efficient packet scheduling scheme that minimizes both the buffer sizes and the test time under a power constraint. The schedulingscheme is then extended to take advantage of the hierarchical bus architecture of the MPSoC systems. The third test scheduling scheme based on the bandwidth sharing is developed specifically for the NoC-based systems. The test scheduling is performed under the objective of co-optimizing the wrapper area cost and the resulting test application time using the two complementary NoC wrappers. For each of the proposed methodology for the three types of SoC architec .. ture, we conducted a thorough experimental evaluation in order to verify their effectiveness compared to other methods

    A Memory Scheduling Infrastructure for Multi-Core Systems with Re-Programmable Logic

    Get PDF
    The sharp increase in demand for performance has prompted an explosion in the complexity of modern multi-core embedded systems. This has lead to unprecedented temporal unpredictability concerns in Cyber-Physical Systems (CPS). On-chip integration of programmable logic (PL) alongside a conventional Processing System (PS) in modern Systems-on-Chip (SoC) establishes a genuine compromise between specialization, performance, and reconfigurability. In addition to typical use-cases, it has been shown that the PL can be used to observe, manipulate, and ultimately manage memory traffic generated by a traditional multi-core processor. This paper explores the possibility of PL-aided memory scheduling by proposing a Scheduler In-the-Middle (SchIM). We demonstrate that the SchIM enables transaction-level control over the main memory traffic generated by a set of embedded cores. Focusing on extensibility and reconfigurability, we put forward a SchIM design covering two main objectives. First, to provide a safe playground to test innovative memory scheduling mechanisms; and second, to establish a transition path from software-based memory regulation to provably correct hardware-enforced memory scheduling. We evaluate our design through a full-system implementation on a commercial PS-PL platform using synthetic and real-world benchmarks

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs

    Full text link
    Deep neural networks (DNNs) have substantial computational and memory requirements, and the compilation of its computational graphs has a great impact on the performance of resource-constrained (e.g., computation, I/O, and memory-bound) edge computing systems. While efficient execution of their computational graph requires an effective scheduling algorithm, generating the optimal scheduling solution is a challenging NP-hard problem. Furthermore, the complexity of scheduling DNN computational graphs will further increase on pipelined multi-core systems considering memory communication cost, as well as the increasing size of DNNs. Using the synthetic graph for the training dataset, this work presents a reinforcement learning (RL) based scheduling framework RESPECT, which learns the behaviors of optimal optimization algorithms and generates near-optimal scheduling results with short solving runtime overhead. Our framework has demonstrated up to āˆ¼2.5Ɨ\sim2.5\times real-world on-chip inference runtime speedups over the commercial compiler with ten popular ImageNet models deployed on the physical Coral Edge TPUs system. Moreover, compared to the exact optimization methods, the proposed RL scheduling improves the scheduling optimization runtime by up to 683Ɨ\times speedups compared to the commercial compiler and matches the exact optimal solutions with up to 930Ɨ\times speedups. Finally, we perform a comprehensive generalizability test, which demonstrates RESPECT successfully imitates optimal solving behaviors from small synthetic graphs to large real-world DNNs computational graphs.Comment: 6 pages, ACM/IEEE Design Automation Conference (DAC'23

    Thermal-Safe Test Scheduling for Core-Based System-on-a-Chip Integrated Circuits

    No full text
    Overheating has been acknowledged as a major problem during the testing of complex system-on-chip (SOC) integrated circuits. Several power-constrained test scheduling solutions have been recently proposed to tackle this problem during system integration. However, we show that these approaches cannot guarantee hot-spot-free test schedules because they do not take into account the non-uniform distribution of heat dissipation across the die and the physical adjacency of simultaneously active cores. This paper proposes a new test scheduling approach that is able to produce short test schedules and guarantee thermal-safety at the same time. Two thermal-safe test scheduling algorithms are proposed. The first algorithm computes an exact (shortest) test schedule that is guaranteed to satisfy a given maximum temperature constraint. The second algorithm is a heuristic intended for complex systems with a large number of embedded cores, for which the exact thermal-safe test scheduling algorithm may not be feasible. Based on a low-complexity test session thermal cost model, this algorithm produces near-optimal length test schedules with significantly less computational effort compared to the optimal algorithm

    MARACAS: a real-time multicore VCPU scheduling framework

    Full text link
    This paper describes a multicore scheduling and load-balancing framework called MARACAS, to address shared cache and memory bus contention. It builds upon prior work centered around the concept of virtual CPU (VCPU) scheduling. Threads are associated with VCPUs that have periodically replenished time budgets. VCPUs are guaranteed to receive their periodic budgets even if they are migrated between cores. A load balancing algorithm ensures VCPUs are mapped to cores to fairly distribute surplus CPU cycles, after ensuring VCPU timing guarantees. MARACAS uses surplus cycles to throttle the execution of threads running on specific cores when memory contention exceeds a certain threshold. This enables threads on other cores to make better progress without interference from co-runners. Our scheduling framework features a novel memory-aware scheduling approach that uses performance counters to derive an average memory request latency. We show that latency-based memory throttling is more effective than rate-based memory access control in reducing bus contention. MARACAS also supports cache-aware scheduling and migration using page recoloring to improve performance isolation amongst VCPUs. Experiments show how MARACAS reduces multicore resource contention, leading to improved task progress.http://www.cs.bu.edu/fac/richwest/papers/rtss_2016.pdfAccepted manuscrip

    A Hierachical Infrastrucutre for SOC Test Management

    Get PDF
    HD2BIST - a complete hierarchical framework for BIST scheduling, data-patterns delivery, and diagnosis of complex systems - maximizes and simplifies the reuse of built-in test architectures. HD2BIST optimizes the flexibility for chip designers in planning an overall SoC test strategy by defining a test access method that provides direct virtual access to each core of the system

    ATMP: An Adaptive Tolerance-based Mixed-criticality Protocol for Multi-core Systems

    Get PDF
    Ā© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted ncomponent of this work in other works.The challenge of mixed-criticality scheduling is to keep tasks of higher criticality running in case of resource shortages caused by faults. Traditionally, mixedcriticality scheduling has focused on methods to handle faults where tasks overrun their optimistic worst-case execution time (WCET) estimate. In this paper we present the Adaptive Tolerance based Mixed-criticality Protocol (ATMP), which generalises the concept of mixed-criticality scheduling to handle also faults of other nature, like failure of cores in a multi-core system. ATMP is an adaptation method triggered by resource shortage at runtime. The first step of ATMP is to re-partition the task to the available cores and the second step is to optimise the utility at each core using the tolerance-based real-time computing model (TRTCM). The evaluation shows that the utility optimisation of ATMP can achieve a smoother degradation of service compared to just abandoning tasks
    • ā€¦
    corecore