1,547 research outputs found
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
VLSI implementation of a multi-mode turbo/LDPC decoder architecture
Flexible and reconfigurable architectures have gained wide popularity in the communications field. In particular, reconfigurable architectures for the physical layer are an attractive solution not only to switch among different coding modes but also to achieve interoperability. This work concentrates on the design of a reconfigurable architecture for both turbo and LDPC codes decoding. The novel contributions of this paper are: i) tackling the reconfiguration issue introducing a formal and systematic treatment that, to the best of our knowledge, was not previously addressed; ii) proposing a reconfigurable NoCbased turbo/LDPC decoder architecture and showing that wide flexibility can be achieved with a small complexity overhead. Obtained results show that dynamic switching between most of considered communication standards is possible without pausing the decoding activity. Moreover, post-layout results show that tailoring the proposed architecture to the WiMAX standard leads to an area occupation of 2.75 mm2 and a power consumption of 101.5 mW in the worst case
Studies on Core-Based Testing of System-on-Chips Using Functional Bus and Network-on-Chip Interconnects
The tests of a complex system such as a microprocessor-based system-onchip
(SoC) or a network-on-chip (NoC) are difficult and expensive. In this thesis,
we propose three core-based test methods that reuse the existing functional
interconnects-a flat bus, hierarchical buses of multiprocessor SoC's (MPSoC),
and a N oC-in order to avoid the silicon area cost of a dedicated test access mechanism
(TAM). However, the use of functional interconnects as functional TAM's
introduces several new problems.
During tests, the interconnects-including the bus arbitrator, the bus bridges,
and the NoC routers-operate in the functional mode to transport the test stimuli
and responses, while the core under tests (CUT) operate in the test mode. Second,
the test data is transported to the CUT through the functional bus, and not
directly to the test port. Therefore, special core test wrappers that can provide
the necessary control signals required by the different functional interconnect are
proposed. We developed two types of wrappers, one buffer-based wrapper for the
bus-based systems and another pair of complementary wrappers for the NoCbased
systems.
Using the core test wrappers, we propose test scheduling schemes for the three
functionally different types of interconnects. The test scheduling scheme for a flat
bus is developed based on an efficient packet scheduling scheme that minimizes
both the buffer sizes and the test time under a power constraint. The schedulingscheme is then extended to take advantage of the hierarchical bus architecture of
the MPSoC systems. The third test scheduling scheme based on the bandwidth
sharing is developed specifically for the NoC-based systems. The test scheduling
is performed under the objective of co-optimizing the wrapper area cost and the
resulting test application time using the two complementary NoC wrappers.
For each of the proposed methodology for the three types of SoC architec ..
ture, we conducted a thorough experimental evaluation in order to verify their
effectiveness compared to other methods
A survey on scheduling and mapping techniques in 3D Network-on-chip
Network-on-Chips (NoCs) have been widely employed in the design of
multiprocessor system-on-chips (MPSoCs) as a scalable communication solution.
NoCs enable communications between on-chip Intellectual Property (IP) cores and
allow those cores to achieve higher performance by outsourcing their
communication tasks. Mapping and Scheduling methodologies are key elements in
assigning application tasks, allocating the tasks to the IPs, and organising
communication among them to achieve some specified objectives. The goal of this
paper is to present a detailed state-of-the-art of research in the field of
mapping and scheduling of applications on 3D NoC, classifying the works based
on several dimensions and giving some potential research directions
Run-time Spatial Mapping of Streaming Applications to Heterogeneous Multi-Processor Systems
In this paper, we define the problem of spatial mapping. We present reasons why performing spatial mappings at run-time is both necessary and desirable. We propose what is—to our knowledge—the first attempt at a formal description of spatial mappings for the embedded real-time streaming application domain. Thereby, we introduce criteria for a qualitative comparison of these spatial mappings. As an illustration of how our formalization relates to practice, we relate our own spatial mapping algorithm to the formal model
A Survey of Prediction and Classification Techniques in Multicore Processor Systems
In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems
Test-Delivery Optimization in Manycore SOCs
We present two test-data delivery optimization algorithms
for system-on-chip (SOC) designs with hundreds of cores,
where a network-on-chip (NOC) is used as the interconnection
fabric. We first present an e ective algorithm based on a subsetsum
formulation to solve the test-delivery problem in NOCs
with arbitrary topology that use dedicated routing. We further
propose an algorithm for the important class of NOCs with grid
topology and XY routing. The proposed algorithm is the first to
co-optimize the number of access points, access-point locations,
pin distribution to access points, and assignment of cores to access
points for optimal test resource utilization of such NOCs. Testtime
minimization is modeled as an NOC partitioning problem
and solved with dynamic programming in polynomial time. Both
the proposed methods yield high-quality results and are scalable
to large SOCs with many cores. We present results on synthetic
grid topology NOC-based SOCs constructed using cores from
the ITC’02 benchmark, and demonstrate the scalability of our
approach for two SOCs of the future, one with nearly 1,000 cores
and the other with 1,600 cores. Test scheduling under power
constraints is also incorporated in the optimization framework
Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks
Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture
- …