250 research outputs found
Methodology for complex dataflow application development
This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist.
The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance.
The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms.
The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)
ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
Physical parameter-aware Networks-on-Chip design
PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable
and power-efficient communication fabric for chip multiprocessors
(CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine
both the performance and the reliability of such systems, with a
significant power demand that is expected to increase due to developments
in both technology and architecture. In terms of architecture, an
important trend in many-core systems architecture is to increase the
number of cores on a chip while reducing their individual complexity.
This trend increases communication power relative to computation
power. Moreover, technology-wise, power-hungry wires are dominating
logic as power consumers as technology scales down. For these
reasons, the design of future very large scale integration (VLSI) systems
is moving from being computation-centric to communication-centric.
On the other hand, chip’s physical parameters integrity, especially
power and thermal integrity, is crucial for reliable VLSI systems. However,
guaranteeing this integrity is becoming increasingly difficult with
the higher scale of integration due to increased power density and operating
frequencies that result in continuously increasing temperature
and voltage drops in the chip. This is a challenge that may prevent
further shrinking of devices. Thus, tackling the challenge of power
and thermal integrity of future many-core systems at only one level
of abstraction, the chip and package design for example, is no longer
sufficient to ensure the integrity of physical parameters. New designtime
and run-time strategies may need to work together at different
levels of abstraction, such as package, application, network, to provide
the required physical parameter integrity for these large systems. This
necessitates strategies that work at the level of the on-chip network
with its rising power budget.
This thesis proposes models, techniques and architectures to improve
power and thermal integrity of Network-on-Chip (NoC)-based
many-core systems. The thesis is composed of two major parts: i)
minimization and modelling of power supply variations to improve
power integrity; and ii) dynamic thermal adaptation to improve thermal
integrity. This thesis makes four major contributions. The first is
a computational model of on-chip power supply variations in NoCs.
The proposed model embeds a power delivery model, an NoC activity
simulator and a power model. The model is verified with SPICE simulation
and employed to analyse power supply variations in synthetic
and real NoC workloads. Novel observations regarding power supply
noise correlation with different traffic patterns and routing algorithms
are found. The second is a new application mapping strategy aiming
vii
to minimize power supply noise in NoCs. This is achieved by defining
a new metric, switching activity density, and employing a force-based
objective function that results in minimizing switching density. Significant
reductions in power supply noise (PSN) are achieved with a low
energy penalty. This reduction in PSN also results in a better link timing
accuracy. The third contribution is a new dynamic thermal-adaptive
routing strategy to effectively diffuse heat from the NoC-based threedimensional
(3D) CMPs, using a dynamic programming (DP)-based distributed
control architecture. Moreover, a new approach for efficient extension
of two-dimensional (2D) partially-adaptive routing algorithms
to 3D is presented. This approach improves three-dimensional networkon-
chip (3D NoC) routing adaptivity while ensuring deadlock-freeness.
Finally, the proposed thermal-adaptive routing is implemented in
field-programmable gate array (FPGA), and implementation challenges,
for both thermal sensing and the dynamic control architecture are addressed.
The proposed routing implementation is evaluated in terms
of both functionality and performance.
The methodologies and architectures proposed in this thesis open a
new direction for improving the power and thermal integrity of future
NoC-based 2D and 3D many-core architectures
Digital design techniques for dependable High-Performance Computing
L'abstract è presente nell'allegato / the abstract is in the attachmen
An Analysis of Electromagnetic Interference (EMI) of Ultra Wideband(UWB) and IEEE 802.11A Wireless Local Area Network (WLAN) Employing Orthogonal Frequency Division Multiplexing (OFDM)
Military communications require the rapid deployment of mobile, high-bandwidth systems. These systems must provide anytime, anywhere capabilities with minimal interference to existing military, private, and commercial communications. Ultra Wideband (UWB) technology is being advanced as the next generation radio technology and has the potential to revolutionize indoor wireless communications. The ability of UWB to mitigate multipath fading, provide high-throughput data rates (e.g., greater than 100 Mbps), provide excellent signal penetration (e.g., through walls), and low implementation costs makes it an ideal technology for a wide range of private and public sector applications. Preliminary UWB studies conducted by The Institute for Telecommunications Science (ITS) and the Defense Advanced Research Projects Agency (DARPA) have discovered that potential exists for harmful interference to occur. While these studies have provided initial performance estimates, the interference effects of UWB transmissions on coexisting spectral users are largely unknown. This research characterizes the electromagnetic interference (EMI) effects of UWB on the throughput performance of an IEEE 802.11a ad-hoc network. Radiated measurements in an anechoic chamber investigate interference performance using three modulation schemes (BPSK, BPPM, and OOK) and four pulse repetition frequencies over two Unlicensed National Information Infrastructure (U-NII) channels. Results indicate that OOK and BPPM can degrade throughput performance by up to 20% at lower pulse repetition frequencies (PRFs) in lower U-NII channels. Minimal performance degradation (less than one percent) due to interference was observed for BPSK at the lower PRFs and higher U-NII channels
The Customizable Virtual FPGA: Generation, System Integration and Configuration of Application-Specific Heterogeneous FPGA Architectures
In den vergangenen drei Jahrzehnten wurde die Entwicklung von Field Programmable Gate Arrays (FPGAs) stark von Moore’s Gesetz, Prozesstechnologie (Skalierung) und kommerziellen Märkten beeinflusst. State-of-the-Art FPGAs bewegen sich einerseits dem Allzweck näher, aber andererseits, da FPGAs immer mehr traditionelle Domänen der Anwendungsspezifischen integrierten Schaltungen (ASICs) ersetzt haben, steigen die Effizienzerwartungen. Mit dem Ende der Dennard-Skalierung können Effizienzsteigerungen nicht mehr auf Technologie-Skalierung allein zurückgreifen. Diese Facetten und Trends in Richtung rekonfigurierbarer System-on-Chips (SoCs) und neuen Low-Power-Anwendungen wie Cyber Physical Systems und Internet of Things erfordern eine bessere Anpassung der Ziel-FPGAs. Neben den Trends für den Mainstream-Einsatz von FPGAs in Produkten des täglichen Bedarfs und Services wird es vor allem bei den jüngsten Entwicklungen, FPGAs in Rechenzentren und Cloud-Services einzusetzen, notwendig sein, eine sofortige Portabilität von Applikationen über aktuelle und zukünftige FPGA-Geräte hinweg zu gewährleisten. In diesem Zusammenhang kann die Hardware-Virtualisierung ein nahtloses Mittel für Plattformunabhängigkeit und Portabilität sein. Ehrlich gesagt stehen die Zwecke der Anpassung und der Virtualisierung eigentlich in einem Konfliktfeld, da die Anpassung für die Effizienzsteigerung vorgesehen ist, während jedoch die Virtualisierung zusätzlichen Flächenaufwand hinzufügt. Die Virtualisierung profitiert aber nicht nur von der Anpassung, sondern fügt auch mehr Flexibilität hinzu, da die Architektur jederzeit verändert werden kann. Diese Besonderheit kann für adaptive Systeme ausgenutzt werden.
Sowohl die Anpassung als auch die Virtualisierung von FPGA-Architekturen wurden in der Industrie bisher kaum adressiert. Trotz einiger existierenden akademischen Werke können diese Techniken noch als unerforscht betrachtet werden und sind aufstrebende Forschungsgebiete.
Das Hauptziel dieser Arbeit ist die Generierung von FPGA-Architekturen, die auf eine effiziente Anpassung an die Applikation zugeschnitten sind. Im Gegensatz zum üblichen Ansatz mit kommerziellen FPGAs, bei denen die FPGA-Architektur als gegeben betrachtet wird und die Applikation auf die vorhandenen Ressourcen abgebildet wird, folgt diese Arbeit einem neuen Paradigma, in dem die Applikation oder Applikationsklasse fest steht und die Zielarchitektur auf die effiziente Anpassung an die Applikation zugeschnitten ist. Dies resultiert in angepassten anwendungsspezifischen FPGAs.
Die drei Säulen dieser Arbeit sind die Aspekte der Virtualisierung, der Anpassung und des Frameworks. Das zentrale Element ist eine weitgehend parametrierbare virtuelle FPGA-Architektur, die V-FPGA genannt wird, wobei sie als primäres Ziel auf jeden kommerziellen FPGA abgebildet werden kann, während Anwendungen auf der virtuellen Schicht ausgeführt werden. Dies sorgt für Portabilität und Migration auch auf Bitstream-Ebene, da die Spezifikation der virtuellen Schicht bestehen bleibt, während die physische Plattform ausgetauscht werden kann. Darüber hinaus wird diese Technik genutzt, um eine dynamische und partielle Rekonfiguration auf Plattformen zu ermöglichen, die sie nicht nativ unterstützen. Neben der Virtualisierung soll die V-FPGA-Architektur auch als eingebettetes FPGA in ein ASIC integriert werden, das effiziente und dennoch flexible System-on-Chip-Lösungen bietet. Daher werden Zieltechnologie-Abbildungs-Methoden sowohl für Virtualisierung als auch für die physikalische Umsetzung adressiert und ein Beispiel für die physikalische Umsetzung in einem 45 nm Standardzellen Ansatz aufgezeigt.
Die hochflexible V-FPGA-Architektur kann mit mehr als 20 Parametern angepasst werden, darunter LUT-Grösse, Clustering, 3D-Stacking, Routing-Struktur und vieles mehr. Die Auswirkungen der Parameter auf Fläche und Leistung der Architektur werden untersucht und eine umfangreiche Analyse von über 1400 Benchmarkläufen zeigt eine hohe Parameterempfindlichkeit bei Abweichungen bis zu ±95, 9% in der Fläche und ±78, 1% in der Leistung, was die hohe Bedeutung von Anpassung für Effizienz aufzeigt. Um die Parameter systematisch an die Bedürfnisse der Applikation anzupassen, wird eine parametrische Entwurfsraum-Explorationsmethode auf der Basis geeigneter Flächen- und Zeitmodellen vorgeschlagen.
Eine Herausforderung von angepassten Architekturen ist der Entwurfsaufwand und die Notwendigkeit für angepasste Werkzeuge. Daher umfasst diese Arbeit ein Framework für die Architekturgenerierung, die Entwurfsraumexploration, die Anwendungsabbildung und die Evaluation. Vor allem ist der V-FPGA in einem vollständig synthetisierbaren generischen Very High Speed Integrated Circuit Hardware Description Language (VHDL) Code konzipiert, der sehr flexibel ist und die Notwendigkeit für externe Codegeneratoren eliminiert. Systementwickler können von verschiedenen Arten von generischen SoC-Architekturvorlagen profitieren, um die Entwicklungszeit zu reduzieren. Alle notwendigen Konstruktionsschritte für die Applikationsentwicklung und -abbildung auf den V-FPGA werden durch einen Tool-Flow für Entwurfsautomatisierung unterstützt, der eine Sammlung von vorhandenen kommerziellen und akademischen Werkzeugen ausnutzt, die durch geeignete Modelle angepasst und durch ein neues Werkzeug namens V-FPGA-Explorer ergänzt werden. Dieses neue Tool fungiert nicht nur als Back-End-Tool für die Anwendungsabbildung auf dem V-FPGA sondern ist auch ein grafischer Konfigurations- und Layout-Editor, ein Bitstream-Generator, ein Architekturdatei-Generator für die Place & Route Tools, ein Script-Generator und ein Testbenchgenerator. Eine Besonderheit ist die Unterstützung der Just-in-Time-Kompilierung mit schnellen Algorithmen für die In-System Anwendungsabbildung.
Die Arbeit schliesst mit einigen Anwendungsfällen aus den Bereichen industrielle Prozessautomatisierung, medizinische Bildgebung, adaptive Systeme und Lehre ab, in denen der V-FPGA eingesetzt wird
Recommended from our members
Scalable System-on-Chip Design
The crisis of technology scaling led the industry of semiconductors towards the adoption of disruptive technologies and innovations to sustain the evolution of microprocessors and keep under control the timing of the design cycle. Multi-core and many-core architectures sought more energy-efficient computation by replacing a power-hungry processor with multiple simpler cores exploiting parallelism. Multi-core processors alone, however, turned out to be insufficient to sustain the ever growing demand for energy and power-efficient computation without compromising performance. Therefore, designers were pushed to drift from homogeneous architectures towards more complex heterogeneous systems that employ the large number of available transistors to incorporate a combination of customized energy-efficient accelerators, along with the general-purpose processor cores. Meanwhile, enhancements in manufacturing processes allowed designers to move a variety of peripheral components and analog devices into the chip. This paradigm shift defined the concept of {\em system-on-chip} (SoC) as a single-chip design that integrates several heterogeneous components. The rise of SoCs corresponds to a rapid decrease of the opportunity cost for integrating accelerators. In fact, on one hand, employing more transistors for powerful cores is not feasible anymore, because transistors cannot be active all at once within reasonable power budgets. On the other hand, increasing the number of homogeneous cores incurs more and more diminishing returns. The availability of cost effective silicon area for specialized hardware creates an opportunity to enter the market of semiconductors for new small players: engineers from several different scientific areas can develop competitive algorithms suitable for acceleration for domain-specific applications, such as multimedia systems, self-driving vehicles, robotics, and more. However, turning these algorithms into SoC components, referred to as {\em intellectual property}, still requires expert hardware designers who are typically not familiar with the specific domain of the target application. Furthermore, heterogeneity makes SoC design and programming much more difficult, especially because of the challenges of the integration process. This is a fine art in the hands of few expert engineers who understand system-level trade-offs, know how to design good hardware, how to handle memory and power management, how to shape and balance the traffic over an interconnect, and are able to deal with many different hardware-software interfaces. Designers need solutions enabling them to build scalable and heterogeneous SoCs. My thesis is that {\em the key to scalable SoC designs is a regular and flexible architecture that hides the complexity of heterogeneous integration from designers, while helping them focus on the important aspects of domain-specific applications through a companion system-level design methodology.} I open a path towards this goal by proposing an architecture that mitigates heterogeneity with regularity and addresses the challenges of heterogeneous component integration by implementing a set of {\em platform services}. These are hardware and software interfaces that from a system-level viewpoint give the illusion of working with a homogeneous SoC, thus making it easier to reuse accelerators and port applications across different designs, each with its own target workload and cost-performance trade-off point. A companion system-level design methodology exploits the regularity of the architecture to guide designers in implementing their intellectual property and enables an extensive design-space exploration across multiple levels of abstraction. Throughout the dissertation, I present a fully automated flow to deploy heterogeneous SoCs on single or multiple field-programmable-gate-array devices. The flow provides non-expert designers with a set of knobs for tuning system-level features based on the given mix of accelerators that they have integrated. Many contributions of my dissertation have already influenced other research projects as well as the content of an advanced course for graduate and senior undergraduate students, which aims to form a new generation of system-level designers. These new professionals need not to be circuit or register-transfer level design experts, and not even gurus of operating systems. Instead, they are trained to design efficient intellectual property by considering system-level trade-offs, while the architecture and the methodology that I describe in this dissertation empower them to integrate their components into an SoC. Finally, with the open-source release of the entire infrastructure, including the SoC-deployment flow and the software stack, I hope I will be able to inspire other research groups and help them implement ideas that further reduce the cost and design-time of future heterogeneous systems
- …