163 research outputs found
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
A Simple MPI Library for Lightweight Manycore Processors
TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação.Nas últimas décadas, melhorar o desempenho de núcleos individuais e aumentar o nú-
mero de núcleos de alta potência por chip foram as principais tendências na construção
de processadores. No entanto, esta combinação levou não apenas a um aumento no poder
computacional, mas também a um aumento considerável no seu consumo de energia. Há
uma preocupação crescente entre a comunidade cientÃfica a respeito da eficiência ener-
gética dos supercomputadores modernos. Nos últimos anos, muitos esforços têm sido
feitos em pesquisas, buscando soluções alternativas capazes de resolver este problema de
escalabilidade e eficiência energética. O desempenho e a eficiência energética providos
pelos manycores leves são inegáveis. Contudo, a falta de suporte avançado e portátil
para esses processadores, como interfaces padrão de alto desempenho para o desenvolvi-
mento de código portável, torna o desenvolvimento de software um desafio. Atualmente,
duas abordagens são empregadas tentando aumentar a programabilidade em manycores
leves: Sistemas operacionais (SOs) e sistemas de execução (runtimes). A primeira fornece
portabilidade mas expõe interfaces de programação complexas no nÃvel do SO aos desen-
volvedores. Já a segunda se concentra em fornecer interfaces ricas e de alto desempenho,
as quais são especÃficas do fabricante e resultam em software não portável. Portanto, as
soluções existentes forçam os desenvolvedores a escolher entre a portabilidade do software
ou um processo de desenvolvimento mais rápido. Para resolver esse dilema, neste traba-
lho é proposta uma biblioteca MPI leve e portável (LWMPI) projetada do zero para lidar
com as restrições e complexidades dos manycores leves. A LWMPI foi integrada a um
SO direcionado a esses processadores, oferecendo assim uma melhor programabilidade e
portabilidade implÃcita para manycores leves, sem incorrer em sobrecargas de desempe-
nho excessivas que inviabilizariam o seu uso. Para fornecer uma avaliação abrangente da
LWMPI, foram utilizadas três aplicações de uma suÃte de benchmarking representativa,
usada para avaliar o desempenho de manycores leves, além de um benchmark sintético.
Os resultados obtidos no processador Kalray MPPA-256 revelaram que a LWMPI atinge
uma performance e uma escalabilidade de desempenho melhor do que uma solução feita
especificamente para essa análise e que se utiliza puramente das abstrações de IPC do
Nanvix, ao mesmo tempo em que oferece uma interface de programação mais rica.In the last decades, improving the performance of individual cores and increasing the
number of high power cores per chip were the main trends in the construction of proces-
sors. However, this combination led not only to an increase in the computing capacity, but
also to a considerable growth in energy consumption. There is a crescent concern among
the scientific community about the energy efficiency of modern supercomputers. In the
last years, many efforts have been made in research, searching for alternative solutions
capable of solving this problem of scalability and energy efficiency. The performance and
energy efficiency provided by lightweight manycores is undeniable. Although, the lack of
rich and portable support for these processors, such as high-performance standard inter-
faces that deliver portable source codes, makes software development a challenging task.
Currently, two approaches are employed trying to improve programmability in lightweight
manycores: Operating Systems (OSes) and baremetal runtime systems. The former pro-
vides portability but exposes complex OS-level programming interfaces to developers.
The latter focuses on providing rich and high performance interfaces, which are vendor-
specific and yield to non-portable software. Thus, the existing solutions force software
engineers to choose between software portability or a faster development process. To
address this dilemma, we propose a portable and lightweight MPI library (LWMPI) de-
signed from scratch to cope with restrictions and intricacies of lightweight manycores. We
integrated LWMPI into a distributed OS that targets these processors, thus featuring bet-
ter programmability and implicit portability for lightweight manycores, without incurring
excessive performance overheads that could hinder its use. To deliver a comprehensive
evaluation of LWMPI, we relied on three applications from a representative benchmark
suite used to assess the performance of lightweight manycores, and a synthetic benchmark.
Our results obtained on the Kalray MPPA-256 processor unveiled that LWMPI present
better performance and scalability when compared with a specifically made solution that
uses the raw Nanvix Inter-Process Communication (IPC) abstractions, while exposing a
richer programming interface
Architectural Exploration of KeyRing Self-Timed Processors
RÉSUMÉ
Les dernières décennies ont vu l’augmentation des performances des processeurs contraintes
par les limites imposées par la consommation d’énergie des systèmes électroniques : des très
basses consommations requises pour les objets connectés, aux budgets de dépenses électriques
des serveurs, en passant par les limitations thermiques et la durée de vie des batteries des
appareils mobiles. Cette forte demande en processeurs efficients en énergie, couplée avec
les limitations de la réduction d’échelle des transistors—qui ne permet plus d’améliorer les
performances à densité de puissance constante—, conduit les concepteurs de circuits intégrés
à explorer de nouvelles microarchitectures permettant d’obtenir de meilleures performances
pour un budget énergétique donné. Cette thèse s’inscrit dans cette tendance en proposant
une nouvelle microarchitecture de processeur, appelée KeyRing, conçue avec l’intention de
réduire la consommation d’énergie des processeurs.
La fréquence d’opération des transistors dans les circuits intégrés est proportionnelle à leur
consommation dynamique d’énergie. Par conséquent, les techniques de conception permettant
de réduire dynamiquement le nombre de transistors en opération sont très largement
adoptées pour améliorer l’efficience énergétique des processeurs. La technique de clock-gating
est particulièrement usitée dans les circuits synchrones, car elle réduit l’impact de l’horloge
globale, qui est la principale source d’activité. La microarchitecture KeyRing présentée dans
cette thèse utilise une méthode de synchronisation décentralisée et asynchrone pour réduire
l’activité des circuits. Elle est dérivée du processeur AnARM, un processeur développé par
Octasic sur la base d’une microarchitecture asynchrone ad hoc. Bien qu’il soit plus efficient
en énergie que des alternatives synchrones, le AnARM est essentiellement incompatible avec
les méthodes de synthèse et d’analyse temporelle statique standards. De plus, sa technique
de conception ad hoc ne s’inscrit que partiellement dans les paradigmes de conceptions asynchrones.
Cette thèse propose une approche rigoureuse pour définir les principes généraux
de cette technique de conception ad hoc, en faisant levier sur la littérature asynchrone. La
microarchitecture KeyRing qui en résulte est développée en association avec une méthode
de conception automatisée, qui permet de s’affranchir des incompatibilités natives existant
entre les outils de conception et les systèmes asynchrones. La méthode proposée permet de
pleinement mettre à profit les flots de conception standards de l’industrie microélectronique
pour réaliser la synthèse et la vérification des circuits KeyRing. Cette thèse propose également
des protocoles expérimentaux, dont le but est de renforcer la relation de causalité
entre la microarchitecture KeyRing et une réduction de la consommation énergétique des
processeurs, comparativement à des alternatives synchrones équivalentes.----------ABSTRACT
Over the last years, microprocessors have had to increase their performances while keeping
their power envelope within tight bounds, as dictated by the needs of various markets: from
the ultra-low power requirements of the IoT, to the electrical power consumption budget
in enterprise servers, by way of passive cooling and day-long battery life in mobile devices.
This high demand for power-efficient processors, coupled with the limitations of technology
scaling—which no longer provides improved performances at constant power densities—, is
leading designers to explore new microarchitectures with the goal of pulling more performances
out of a fixed power budget. This work enters into this trend by proposing a new
processor microarchitecture, called KeyRing, having a low-power design intent.
The switching activity of integrated circuits—i.e. transistors switching on and off—directly
affects their dynamic power consumption. Circuit-level design techniques such as clock-gating
are widely adopted as they dramatically reduce the impact of the global clock in synchronous
circuits, which constitutes the main source of switching activity. The KeyRing microarchitecture
presented in this work uses an asynchronous clocking scheme that relies on decentralized
synchronization mechanisms to reduce the switching activity of circuits. It is derived from
the AnARM, a power-efficient ARM processor developed by Octasic using an ad hoc asynchronous
microarchitecture. Although it delivers better power-efficiency than synchronous
alternatives, it is for the most part incompatible with standard timing-driven synthesis and
Static Timing Analysis (STA). In addition, its design style does not fit well within the existing
asynchronous design paradigms. This work lays the foundations for a more rigorous
definition of this rather unorthodox design style, using circuits and methods coming from the
asynchronous literature. The resulting KeyRing microarchitecture is developed in combination
with Electronic Design Automation (EDA) methods that alleviate incompatibility issues
related to ad hoc clocking, enabling timing-driven optimizations and verifications of KeyRing
circuits using industry-standard design flows. In addition to bridging the gap with standard
design practices, this work also proposes comprehensive experimental protocols that aims to
strengthen the causal relation between the reported asynchronous microarchitecture and a
reduced power consumption compared with synchronous alternatives.
The main achievement of this work is a framework that enables the architectural exploration
of circuits using the KeyRing microarchitecture
- …