450 research outputs found
Virtual Runtime Application Partitions for Resource Management in Massively Parallel Architectures
This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.Siirretty Doriast
Memory hierarchy and data communication in heterogeneous reconfigurable SoCs
The miniaturization race in the hardware industry aiming at continuous increasing
of transistor density on a die does not bring respective application performance
improvements any more. One of the most promising alternatives is to
exploit a heterogeneous nature of common applications in hardware. Supported by
reconfigurable computation, which has already proved its efficiency in accelerating
data intensive applications, this concept promises a breakthrough in contemporary
technology development.
Memory organization in such heterogeneous reconfigurable architectures becomes
very critical. Two primary aspects introduce a sophisticated trade-off. On
the one hand, a memory subsystem should provide well organized distributed data
structure and guarantee the required data bandwidth. On the other hand, it should
hide the heterogeneous hardware structure from the end-user, in order to support
feasible high-level programmability of the system.
This thesis work explores the heterogeneous reconfigurable hardware architectures
and presents possible solutions to cope the problem of memory organization
and data structure. By the example of the MORPHEUS heterogeneous platform,
the discussion follows the complete design cycle, starting from decision making
and justification, until hardware realization. Particular emphasis is made on the
methods to support high system performance, meet application requirements, and
provide a user-friendly programmer interface.
As a result, the research introduces a complete heterogeneous platform enhanced
with a hierarchical memory organization, which copes with its task by
means of separating computation from communication, providing reconfigurable
engines with computation and configuration data, and unification of heterogeneous
computational devices using local storage buffers. It is distinguished from the
related solutions by distributed data-flow organization, specifically engineered
mechanisms to operate with data on local domains, particular communication infrastructure
based on Network-on-Chip, and thorough methods to prevent computation
and communication stalls. In addition, a novel advanced technique to accelerate
memory access was developed and implemented
FPGA dynamic and partial reconfiguration : a survey of architectures, methods, and applications
Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows, and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption
Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)
ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
Smart Chips for Smart Surroundings -- 4S
The overall mission of the 4S project (Smart Chips for Smart Surroundings) was to define and develop efficient flexible, reconfigurable core building blocks, including the supporting tools, for future Ambient System Devices. Reconfigurability offers the needed flexibility and adaptability, it provides the efficiency needed for these systems, it enables systems that can adapt to rapidly changing environmental conditions, it enables communication over heterogeneous wireless networks, and it reduces risks: reconfigurable systems can adapt to standards that may vary from place to place or standards that have changed during and after product development. In 4S we focused on heterogeneous building blocks such as analogue, hardwired functions, fine and coarse grain reconfigurable tiles and microprocessors. Such a platform can adapt to a wide application space without the need for specialized ASICs. A novel power aware design flow and runtime system was developed. The runtime system decides dynamically about the near-optimal application mapping to the given hardware platform. The overall concept was verified on hardware platforms based on an existing SoC and in a second step with novel silicon. DRM (Digital Radio Mondiale) and MPEG4 Video applications have been implemented on the platforms demonstrating the adaptability of the 4S concept
High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks
Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed
Generic low power reconfigurable distributed arithmetic processor
Higher performance, lower cost, increasingly minimizing integrated circuit components, and
higher packaging density of chips are ongoing goals of the microelectronic and computer
industry. As these goals are being achieved, however, power consumption and flexibility are
increasingly becoming bottlenecks that need to be addressed with the new technology in Very
Large-Scale Integrated (VLSI) design.
For modern systems, more energy is required to support the powerful computational capability
which accords with the increasing requirements, and these requirements cause the change of
standards not only in audio and video broadcasting but also in communication such as wireless
connection and network protocols. Powerful flexibility and low consumption are repellent, but
their combination in one system is the ultimate goal of designers.
A generic domain-specific low-power reconfigurable processor for the distributed
arithmetic algorithm is presented in this dissertation. This domain reconfigurable processor
features high efficiency in terms of area, power and delay, which approaches the
performance of an ASIC design, while retaining the flexibility of programmable platforms.
The architecture not only supports typical distributed arithmetic algorithms which can be
found in most still picture compression standards and video conferencing standards, but
also offers implementation ability for other distributed arithmetic algorithms found in
digital signal processing, telecommunication protocols and automatic control.
In this processor, a simple reconfigurable low power control unit is implemented with
good performance in area, power and timing. The generic characteristic of the architecture
makes it applicable for any small and medium size finite state machines which can be used
as control units to implement complex system behaviour and can be found in almost all
engineering disciplines. Furthermore, to map target applications efficiently onto the
proposed architecture, a new algorithm is introduced for searching for the best common
sharing terms set and it keeps the area and power consumption of the implementation at
low level. The software implementation of this algorithm is presented, which can be used
not only for the proposed architecture in this dissertation but also for all the
implementations with adder-based distributed arithmetic algorithms. In addition, some low
power design techniques are applied in the architecture, such as unsymmetrical design
style including unsymmetrical interconnection arranging, unsymmetrical PTBs selection
and unsymmetrical mapping basic computing units. All these design techniques achieve
extraordinary power consumption saving. It is believed that they can be extended to more
low power designs and architectures.
The processor presented in this dissertation can be used to implement complex, high
performance distributed arithmetic algorithms for communication and image processing
applications with low cost in area and power compared with the traditional
methods
- …