553 research outputs found
Energy Efficient Network Generation for Application Specific NoC
Networks-on-Chip is emerging as a communication platform for future complex SoC designs, composed of a large number of homogenous or heterogeneous processing resources. Most SoC platforms are customized to the domainspecific requirements of their applications, which communicate in a specific, mostly irregular way. The specific but often diverse communication requirements among cores of the SoC call for the design of application-specific network of SoC for improved performance in terms of communication energy, latency, and throughput. In this work, we propose a methodology for the design of customized irregular network architecture of SoC. The proposed method exploits priori knowledge of the application2019;s communication characteristic to generate an energy optimized network and corresponding routing tables
Understanding the impact of 3D stacked layouts on ILP
Journal Article3D die-stacked chips can alleviate the penalties imposed by long wires within micro-processor circuits. Many recent studies have attempted to partition each microprocessor structure across three dimensions to reduce their access times. In this paper, we implement each microprocessor structure on a single 2D die and leverage 3D to reduce the lengths of wires that communicate data between microprocessor structures within a single core. We begin with a criticality analysis of inter-structure wire delays and show that for most tra- ditional simple superscalar cores, 2D floorplans are already very efficient at minimizing critical wire delays. For an aggressive wire-constrained clustered superscalar architecture, an exploration of the design space reveals that 3D can yield higher benefit. However, this benefit may be negated by the higher power density and temperature entailed by 3D integration. Overall, we report a negative result and argue against leveraging 3D for higher ILP
FPGA dynamic and partial reconfiguration : a survey of architectures, methods, and applications
Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows, and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption
Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning
Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an
error-free operation after SEU recovering if the affected configuration bits do
belong to feedback loops of the implemented circuits. In this paper, we a)
provide a netlist-based circuit analysis technique to distinguish so-called
critical configuration bits from essential bits in order to identify
configuration bits which will need also state-restoring actions after a
recovered SEU and which not. Furthermore, b) an alternative classification
approach using fault injection is developed in order to compare both
classification techniques. Moreover, c) we will propose a floorplanning
approach for reducing the effective number of scrubbed frames and d),
experimental results will give evidence that our optimization methodology not
only allows to detect errors earlier but also to minimize the
Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show
that by using our approach, the MTTR for datapath-intensive circuits can be
reduced by up to 48.5% in comparison to standard approaches
3D IC optimal layout design. A parallel and distributed topological approach
The task of 3D ICs layout design involves the assembly of millions of
components taking into account many different requirements and constraints such
as topological, wiring or manufacturability ones. It is a NP-hard problem that
requires new non-deterministic and heuristic algorithms. Considering the time
complexity, the commonly applied Fiduccia-Mattheyses partitioning algorithm is
superior to any other local search method. Nevertheless, it can often miss to
reach a quasi-optimal solution in 3D spaces. The presented approach uses an
original 3D layout graph partitioning heuristics implemented with use of the
extremal optimization method. The goal is to minimize the total wire-length in
the chip. In order to improve the time complexity a parallel and distributed
Java implementation is applied. Inside one Java Virtual Machine separate
optimization algorithms are executed by independent threads. The work may also
be shared among different machines by means of The Java Remote Method
Invocation system.Comment: 26 pages, 9 figure
Heurísticas bioinspiradas para el problema de Floorplanning 3D térmico de dispositivos MPSoCs
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 20-06-2013Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu
Real-Time neural signal decoding on heterogeneous MPSocs based on VLIW ASIPs
An important research problem, at the basis of the development of embedded systems for neuroprosthetic applications, is the development of algorithms and platforms able to extract the patient's motion intention by decoding the information encoded in neural signals. At the state of the art, no portable and reliable integrated solutions implementing such a decoding task have been identified. To this aim, in this paper, we investigate the possibility of using the MPSoC paradigm in this application domain. We perform a design space exploration that compares different custom MPSoC embedded architectures, implementing two versions of a on-line neural signal decoding algorithm, respectively targeting decoding of single and multiple acquisition channels. Each considered design points features a different application configuration, with a specific partitioning and mapping of parallel software tasks, executed on customized VLIW ASIP processing cores. Experimental results, obtained by means of FPGA-based prototyping and post-floorplanning power evaluation on a 40nm technology library, assess the performance and hardware-related costs of the considered configurations. The reported power figures demonstrate the usability of the MPSoC paradigm within the processing of bio-electrical signals and show the benefits achievable by the exploitation of the instruction-level parallelism within tasks
- …