13 research outputs found
A hardware-software codesign framework for cellular computing
Until recently, the ever-increasing demand of computing power has been met on one hand by increasing the operating frequency of processors and on the other hand by designing architectures capable of exploiting parallelism at the instruction level through hardware mechanisms such as super-scalar execution. However, both these approaches seem to have reached a plateau, mainly due to issues related to design complexity and cost-effectiveness. To face the stabilization of performance of single-threaded processors, the current trend in processor design seems to favor a switch to coarser-grain parallelization, typically at the thread level. In other words, high computational power is achieved not only by a single, very fast and very complex processor, but through the parallel operation of several processors, each executing a different thread. Extrapolating this trend to take into account the vast amount of on-chip hardware resources that will be available in the next few decades (either through further shrinkage of silicon fabrication processes or by the introduction of molecular-scale devices), together with the predicted features of such devices (e.g., the impossibility of global synchronization or higher failure rates), it seems reasonable to foretell that current design techniques will not be able to cope with the requirements of next-generation electronic devices and that novel design tools and programming methods will have to be devised. A tempting source of inspiration to solve the problems implied by a massively parallel organization and inherently error-prone substrates is biology. In fact, living beings possess characteristics, such as robustness to damage and self-organization, which were shown in previous research as interesting to be implemented in hardware. For instance, it was possible to realize relatively simple systems, such as a self-repairing watch. Overall, these bio-inspired approaches seem very promising but their interest for a wider audience is problematic because their heavily hardware-oriented designs lack some of the flexibility achievable with a general purpose processor. In the context of this thesis, we will introduce a processor-grade processing element at the heart of a bio-inspired hardware system. This processor, based on a single-instruction, features some key properties that allow it to maintain the versatility required by the implementation of bio-inspired mechanisms and to realize general computation. We will also demonstrate that the flexibility of such a processor enables it to be evolved so it can be tailored to different types of applications. In the second half of this thesis, we will analyze how the implementation of a large number of these processors can be used on a hardware platform to explore various bio-inspired mechanisms. Based on an extensible platform of many FPGAs, configured as a networked structure of processors, the hardware part of this computing framework is backed by an open library of software components that provides primitives for efficient inter-processor communication and distributed computation. We will show that this dual software–hardware approach allows a very quick exploration of different ways to solve computational problems using bio-inspired techniques. In addition, we also show that the flexibility of our approach allows it to exploit replication as a solution to issues that concern standard embedded applications
Optimal Global Instruction Scheduling for the Itanium® Processor Architecture
On the Itanium 2 processor, effective global instruction scheduling is crucial to high performance. At the same time, it poses a challenge to the compiler: This code generation subtask involves strongly interdependent decisions and complex trade-offs that are difficult to cope with for heuristics. We tackle this NP-complete problem with integer linear programming (ILP), a search-based method that yields provably optimal results. This promises faster code as well as insights into the potential of the architecture. Our ILP model comprises global code motion with compensation copies, predication, and Itanium-specific features like control/data speculation. In integer linear programming, well-structured models are the key to acceptable solution times. The feasible solutions of an ILP are represented by integer points inside a polytope. If all vertices of this polytope are integral, then the ILP can be solved in polynomial time. We define two subproblems of global scheduling in which some constraint classes are omitted and show that the corresponding two subpolytopes of our ILP model are integral and polynomial sized. This substantiates that the found model is of high efficiency, which is also confirmed by the reasonable solution times. The ILP formulation is extended by further transformations like cyclic code motion, which moves instructions upwards out of a loop, circularly in the opposite direction of the loop backedges. Since the architecture requires instructions to be encoded in fixed-sized bundles of three, a bundler is developed that computes bundle sequences of minimal size by means of precomputed results and dynamic programming. Experiments have been conducted with a postpass tool that implements the ILP scheduler. It parses assembly procedures generated by Intel�s Itanium compiler and reschedules them as a whole. Using this tool, we optimize a selection of hot functions from the SPECint 2000 benchmark. The results show a significant speedup over the original code.Globale Instruktionsanordnung hat beim Itanium-2-Prozessor groĂźen
EinfluĂź auf die Leistung und stellt dabei gleichzeitig eine Herausforderung
fĂĽr den Compiler dar: Sie ist mit zahlreichen komplexen, wechselseitig
voneinander abhängigen Entscheidungen verbunden, die für Heuristiken
nur schwer zu beherrschen sind.Wir lösen diesesNP-vollständige
Problem mit ganzzahliger linearer Programmierung (ILP), einer suchbasierten
Methode mit beweisbar optimalen Ergebnissen. Das ermöglicht
neben schnellerem Code auch Einblicke in das Potential der Itanium-
Prozessorarchitektur. Unser ILP-Modell umfaĂźt globale Codeverschiebungen
mit Kompensationscode, Prädikation und Itanium-spezifische
Techniken wie Kontroll- und Datenspekulation.
Bei ganzzahliger linearer Programmierung sind wohlstrukturierte
Modelle der Schlüssel zu akzeptablen Lösungszeiten. Die zulässigen Lösungen
eines ILPs werden durch ganzzahlige Punkte innerhalb eines
Polytops repräsentiert. Sind die Eckpunkte dieses Polytops ganzzahlig,
kann das ILP in Polynomialzeit gelöst werden. Wir definieren zwei Teilprobleme
globaler Instruktionsanordnung durch Auslassung bestimmter
Klassen von Nebenbedingungen und beweisen, daĂź die korrespondierenden
Teilpolytope unseres ILP-Modells ganzzahlig und von polynomieller
Größe sind. Dies untermauert die hohe Effizienz des gefundenen Modells,
die auch durch moderate Lösungszeiten bestätigt wird.
Das ILP-Modell wird um weitere Transformationen wie zyklische Codeverschiebung
erweitert; letztere bezeichnet das Verschieben von Befehlen
aufwärts aus einer Schleife heraus, in Gegenrichtung ihrer Rückwärtskanten.
Da die Architektur eine Kodierung der Befehle in DreierbĂĽndeln
fester Größe vorschreibt, wird ein Bundler entwickelt, der Bündelsequenzen
minimaler Länge mit Hilfe vorberechneter Teilergebnisse und dynamischer
Programmierung erzeugt.
FĂĽr die Experimente wurde ein Postpassoptimierer erstellt. Er liest
von Intels Itanium-Compiler erzeugte Assemblerroutinen ein und ordnet
die enthaltenen Instruktionen mit Hilfe der ILP-Methode neu an. Angewandt
auf eine Auswahl von Funktionen aus dem Benchmark SPECint
2000 erreicht der Optimierer eine signifikante Beschleunigung gegenĂĽber
dem Originalcode
Understanding Quantum Technologies 2022
Understanding Quantum Technologies 2022 is a creative-commons ebook that
provides a unique 360 degrees overview of quantum technologies from science and
technology to geopolitical and societal issues. It covers quantum physics
history, quantum physics 101, gate-based quantum computing, quantum computing
engineering (including quantum error corrections and quantum computing
energetics), quantum computing hardware (all qubit types, including quantum
annealing and quantum simulation paradigms, history, science, research,
implementation and vendors), quantum enabling technologies (cryogenics, control
electronics, photonics, components fabs, raw materials), quantum computing
algorithms, software development tools and use cases, unconventional computing
(potential alternatives to quantum and classical computing), quantum
telecommunications and cryptography, quantum sensing, quantum technologies
around the world, quantum technologies societal impact and even quantum fake
sciences. The main audience are computer science engineers, developers and IT
specialists as well as quantum scientists and students who want to acquire a
global view of how quantum technologies work, and particularly quantum
computing. This version is an extensive update to the 2021 edition published in
October 2021.Comment: 1132 pages, 920 figures, Letter forma
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout