137 research outputs found
U-EDF: An Unfair But Optimal Multiprocessor Scheduling Algorithm for Sporadic Tasks
A multiprocessor scheduling algorithm named U-EDF, was presented in [1] for the scheduling of periodic tasks with implicit deadlines. It was claimed that U-EDF is optimal for periodic tasks (i.e. it can meet all deadlines of every schedulable task set) and extensive simulations showed a drastic improvement in the number of task preemptions and migrations in comparison to state-of-the-art optimal algorithms. However, there was no proof of its optimality and U-EDF was not designed to schedule sporadic tasks. In this work, we propose a generalization of U-EDF for the scheduling of sporadic tasks with implicit deadlines, and we prove its optimality. Contrarily to all other existing optimal multiprocessor scheduling algorithms for sporadic tasks, U-EDF is not based on the fairness property. Instead, it extends the main principles of EDF so that it achieves optimality while benefiting from a substantial reduction in the number of preemptions and migrations. © 2012 IEEE.SCOPUS: cp.pinfo:eu-repo/semantics/publishe
Power-Aware Real-Time Scheduling upon Identical Multiprocessor Platforms
In this paper, we address the power-aware scheduling of sporadic
constrained-deadline hard real-time tasks using dynamic voltage scaling upon
multiprocessor platforms. We propose two distinct algorithms. Our first
algorithm is an off-line speed determination mechanism which provides an
identical speed for each processor. That speed guarantees that all deadlines
are met if the jobs are scheduled using EDF. The second algorithm is an on-line
and adaptive speed adjustment mechanism which reduces the energy consumption
while the system is running.Comment: The manuscript corresponds to the final version of SUTC 2008
conferenc
Impact of gate-level clustering on automated system partitioning of 3D-ICs
When partitioning gate-level netlists using graphs, it is beneficial to
cluster gates to reduce the order of the graph and preserve some
characteristics of the circuit that the partitioning might degrade. Gate
clustering is even more important for netlist partitioning targeting 3D system
integration. In this paper, we make the argument that the choice of clustering
method for 3D-ICs partitioning is not trivial and deserves careful
consideration. To support our claim, we implemented three clustering methods
that were used prior to partitioning two synthetic designs representing two
extremes of the circuits medium/long interconnect diversity spectrum.
Automatically partitioned netlists are then placed and routed in 3D to compare
the impact of clustering methods on several metrics. From our experiments, we
see that the clustering method indeed has a different impact depending on the
design considered and that a circuit-blind, universal partitioning method is
not the way to go, with wire-length savings of up to 31%, total power of up to
22%, and effective frequency of up to 15% compared to other methods.
Furthermore, we highlight that 3D-ICs open new opportunities to design systems
with a denser interconnect, drastically reducing the design utilization of
circuits that would not be considered viable in 2D.Comment: 8 pages, 6 figure
Techniques Optimizing the Number of Processors to Schedule Multi-threaded Tasks
These last years, we have witnessed a dramatic increase in the number of cores available in computational platforms. Concurrently, a new coding paradigm dividing tasks into smaller execution instances called threads, was developed to take advantage of the inherent parallelism of multiprocessor platforms. However, only few methods were proposed to efficiently schedule hard real-time multi-threaded tasks on multiprocessor. In this paper, we propose techniques optimizing the number of processors needed to schedule such sporadic parallel tasks with constrained deadlines. We first define an optimization problem determining, for each thread, an intermediate (artificial) deadline minimizing the number of processors needed to schedule the whole task set. The scheduling algorithm can then schedule threads as if they were independent sequential sporadic tasks. The second contribution is an efficient and nevertheless optimal algorithm that can be executed online to determine the thread's deadlines. Hence, it can be used in dynamic systems were all tasks and their characteristics are not known a priori. We finally prove that our techniques achieve a resource augmentation bound of 2 when the threads are scheduled with algorithms such as U-EDF, PD2, LLREF, DP-Wrap, etc. © 2012 IEEE.SCOPUS: cp.pinfo:eu-repo/semantics/publishe
MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration
Three-dimensional integrated circuits promise power, performance, and
footprint gains compared to their 2D counterparts, thanks to drastic reductions
in the interconnects' length through their smaller form factor. We can leverage
the potential of 3D integration by enhancing MemPool, an open-source many-core
design with 256 cores and a shared pool of L1 scratchpad memory connected with
a low-latency interconnect. MemPool's baseline 2D design is severely limited by
routing congestion and wire propagation delay, making the design ideal for 3D
integration. In architectural terms, we increase MemPool's scratchpad memory
capacity beyond the sweet spot for 2D designs, improving performance in a
common digital signal processing kernel. We propose a 3D MemPool design that
leverages a smart partitioning of the memory resources across two layers to
balance the size and utilization of the stacked dies. In this paper, we explore
the architectural and the technology parameter spaces by analyzing the power,
performance, area, and energy efficiency of MemPool instances in 2D and 3D with
1 MiB, 2 MiB, 4 MiB, and 8 MiB of scratchpad memory in a commercial 28 nm
technology node. We observe a performance gain of 9.1% when running a matrix
multiplication on the MemPool-3D design with 4 MiB of scratchpad memory
compared to the MemPool 2D counterpart. In terms of energy efficiency, we can
implement the MemPool-3D instance with 4 MiB of L1 memory on an energy budget
15% smaller than its 2D counterpart, and even 3.7% smaller than the MemPool-2D
instance with one-fourth of the L1 scratchpad memory capacity.Comment: Accepted for publication in DATE 2022 -- Design, Automation and Test
in Europe Conferenc
PathFinding - Design Methodology for 3D-Stacked Integrated Circuits: Tool Chain and Case Studies
info:eu-repo/semantics/nonPublishe
Mathematical morphology and Parallel reconfigurable systems based on FPGAs
info:eu-repo/semantics/publishe
3D-ICs: Technology, System design challenges, Solutions & Benefits
info:eu-repo/semantics/nonPublishe
- …