380 research outputs found
AnICA: Analyzing Inconsistencies in Microarchitectural Code Analyzers
Microarchitectural code analyzers, i.e., tools that estimate the throughput
of machine code basic blocks, are important utensils in the tool belt of
performance engineers. Recent tools like llvm-mca, uiCA, and Ithemal use a
variety of techniques and different models for their throughput predictions.
When put to the test, it is common to see these state-of-the-art tools give
very different results. These inconsistencies are either errors, or they point
to different and rarely documented assumptions made by the tool designers.
In this paper, we present AnICA, a tool taking inspiration from differential
testing and abstract interpretation to systematically analyze inconsistencies
among these code analyzers. Our evaluation shows that AnICA can summarize
thousands of inconsistencies in a few dozen descriptions that directly lead to
high-level insights into the different behavior of the tools. In several case
studies, we further demonstrate how AnICA automatically finds and characterizes
known and unknown bugs in llvm-mca, as well as a quirk in AMD's Zen
microarchitectures.Comment: To appear in Proceedings of the ACM on Programming Languages
(PACMPL), Vol. 6, No. OOPSLA
An approach for automatic design of application specific instruction set processors (ASIP)
Today‟s hectic world is primarily dominated by electronics. The use of large and hard
electronic devices is rapidly being replaced with simple, light and easy-to-carry ones. The
past was primarily filled with the notion of performing varied tasks under one roof and the
gadgets were built to serve that purpose through the introduction of General Purpose
Processors (GPPs) into them. On the contrary, the present tendency is towards devices that
are able to perform specific tasks. In the line of this modern concept, the world of embedded
system is chiefly dominated by Application Specific Instruction set Processors (ASIPs)
because they are geared to perform specific tasks without placing heavy burdens in many
respects on the part of the users. Application Specific Instruction set Processors (ASIP), also
known as customized processorsare processors designed for particular applications or, for a
set of applications.They can be optimized for speed, chip area, and power consumption
taking advantage of the flexibility of a synthesized semi-custom implementation. The
development of application-specific instruction- set processors is currently the exclusive
domain of the semiconductor houses and core vendors. This is due to the fact that building
such an architecture is a difficult task that requires expertise in different domains. The main
aim of this paper is to propose an approach that will automatically design an ASIP based on
the application requirement, which is given as the input to the system. We propose to
achieve the same by analyzing different types of RISCMIPS assembly instructions to map
the corresponding target VHDL processor to customize maximize the memory access and
components of the central processing unit and count the occurrences for the 32-bit RISC
CPU based on MIPS. In this work, we also analyze MIPS instruction format, instruction data
path, RISC CPU instruction set
Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation
Resource balancing has been effectively used to mitigate the long-term aging
effects of Negative Bias Temperature Instability (NBTI) in multi-core and
Graphics Processing Unit (GPU) architectures. In this work, we investigate this
strategy in Coarse-Grained Reconfigurable Arrays (CGRAs) with a novel
application-to-CGRA allocation approach. By introducing important extensions to
the reconfiguration logic and the datapath, we enable the dynamic movement of
configurations throughout the fabric and allow overutilized Functional Units
(FUs) to recover from stress-induced NBTI aging. Implementing the approach in a
resource-constrained state-of-the-art CGRA reveals lifetime
improvement with negligible performance overheads and less than increase
in area.Comment: Please cite this as: M. Brandalero, B. N. Lignati, A. Carlos
Schneider Beck, M. Shafique and M. H\"ubner, "Proactive Aging Mitigation in
CGRAs through Utilization-Aware Allocation," 2020 57th ACM/IEEE Design
Automation Conference (DAC), San Francisco, CA, USA, 2020, pp. 1-6, doi:
10.1109/DAC18072.2020.921858
New Techniques for On-line Testing and Fault Mitigation in GPUs
L'abstract è presente nell'allegato / the abstract is in the attachmen
On static execution-time analysis
Proving timeliness is an integral part of the verification of safety-critical real-time systems. To this end, timing analysis computes upper bounds on the execution times of programs that execute on a given hardware platform. Modern hardware platforms commonly exhibit counter-intuitive timing behaviour: a locally slower execution can lead to a faster overall execution. Such behaviour challenges efficient timing analysis. In this work, we present and discuss a hardware design, the strictly in-order pipeline, that behaves monotonically w.r.t. the progress of a program's execution. Based on monotonicity, we prove the absence of the aforementioned counter-intuitive behaviour. At least since multi-core processors have emerged, timing analysis separates concerns by analysing different aspects of the system's timing behaviour individually. In this work, we validate the underlying assumption that a timing bound can be soundly composed from individual contributions. We show that even simple processors exhibit counter-intuitive behaviour - a locally slow execution can lead to an even slower overall execution - that impedes the soundness of the composition. We present the compositional base bound analysis that accounts for any such amplifying effects within its timing contribution. This enables a sound compositional analysis even for complex processors. Furthermore, we discuss hardware modifications that enable efficient compositional analyses.Echtzeitsysteme müssen unter allen Umständen beweisbar pünktlich arbeiten. Zum Beweis errechnet die Zeitanalyse obere Schranken der für die Ausführung von Programmen auf einer Hardware-Plattform benötigten Zeit. Moderne Hardware-Plattformen sind bekannt für unerwartetes Zeitverhalten bei dem eine lokale Verzögerung in einer global schnelleren Ausführung resultiert. Solches Zeitverhalten erschwert eine effiziente Analyse. Im Rahmen dieser Arbeit diskutieren wir das Design eines Prozessors mit eingeschränkter Fließbandverarbeitung (strictly in-order pipeline), der sich bzgl. des Fortschritts einer Programmausführung monoton verhält. Wir beweisen, dass Monotonie das oben genannte unerwartete Zeitverhalten verhindert. Spätestens seit dem Einsatz von Mehrkernprozessoren besteht die Zeitanalyse aus einzelnen Teilanalysen welche nur bestimmte Aspekte des Zeitverhaltens betrachten. Eine zentrale Annahme ist hierbei, dass sich die Teilergebnisse zu einer korrekten Zeitschranke zusammensetzen lassen. Im Rahmen dieser Arbeit zeigen wir, dass diese Annahme selbst für einfache Prozessoren ungültig ist, da eine lokale Verzögerung zu einer noch größeren globalen Verzögerung führen kann. Für bestehende Prozessoren entwickeln wir eine neuartige Teilanalyse, die solche verstärkenden Effekte berücksichtigt und somit eine korrekte Komposition von Teilergebnissen erlaubt. Für zukünftige Prozessoren beschreiben wir Modifikationen, die eine deutlich effizientere Zeitanalyse ermöglichen
- …