24 research outputs found
Architectural Exploration of KeyRing Self-Timed Processors
RÉSUMÉ
Les dernières décennies ont vu l’augmentation des performances des processeurs contraintes
par les limites imposées par la consommation d’énergie des systèmes électroniques : des très
basses consommations requises pour les objets connectés, aux budgets de dépenses électriques
des serveurs, en passant par les limitations thermiques et la durée de vie des batteries des
appareils mobiles. Cette forte demande en processeurs efficients en énergie, couplée avec
les limitations de la réduction d’échelle des transistors—qui ne permet plus d’améliorer les
performances à densité de puissance constante—, conduit les concepteurs de circuits intégrés
à explorer de nouvelles microarchitectures permettant d’obtenir de meilleures performances
pour un budget énergétique donné. Cette thèse s’inscrit dans cette tendance en proposant
une nouvelle microarchitecture de processeur, appelée KeyRing, conçue avec l’intention de
réduire la consommation d’énergie des processeurs.
La fréquence d’opération des transistors dans les circuits intégrés est proportionnelle à leur
consommation dynamique d’énergie. Par conséquent, les techniques de conception permettant
de réduire dynamiquement le nombre de transistors en opération sont très largement
adoptées pour améliorer l’efficience énergétique des processeurs. La technique de clock-gating
est particulièrement usitée dans les circuits synchrones, car elle réduit l’impact de l’horloge
globale, qui est la principale source d’activité. La microarchitecture KeyRing présentée dans
cette thèse utilise une méthode de synchronisation décentralisée et asynchrone pour réduire
l’activité des circuits. Elle est dérivée du processeur AnARM, un processeur développé par
Octasic sur la base d’une microarchitecture asynchrone ad hoc. Bien qu’il soit plus efficient
en énergie que des alternatives synchrones, le AnARM est essentiellement incompatible avec
les méthodes de synthèse et d’analyse temporelle statique standards. De plus, sa technique
de conception ad hoc ne s’inscrit que partiellement dans les paradigmes de conceptions asynchrones.
Cette thèse propose une approche rigoureuse pour définir les principes généraux
de cette technique de conception ad hoc, en faisant levier sur la littérature asynchrone. La
microarchitecture KeyRing qui en résulte est développée en association avec une méthode
de conception automatisée, qui permet de s’affranchir des incompatibilités natives existant
entre les outils de conception et les systèmes asynchrones. La méthode proposée permet de
pleinement mettre à profit les flots de conception standards de l’industrie microélectronique
pour réaliser la synthèse et la vérification des circuits KeyRing. Cette thèse propose également
des protocoles expérimentaux, dont le but est de renforcer la relation de causalité
entre la microarchitecture KeyRing et une réduction de la consommation énergétique des
processeurs, comparativement à des alternatives synchrones équivalentes.----------ABSTRACT
Over the last years, microprocessors have had to increase their performances while keeping
their power envelope within tight bounds, as dictated by the needs of various markets: from
the ultra-low power requirements of the IoT, to the electrical power consumption budget
in enterprise servers, by way of passive cooling and day-long battery life in mobile devices.
This high demand for power-efficient processors, coupled with the limitations of technology
scaling—which no longer provides improved performances at constant power densities—, is
leading designers to explore new microarchitectures with the goal of pulling more performances
out of a fixed power budget. This work enters into this trend by proposing a new
processor microarchitecture, called KeyRing, having a low-power design intent.
The switching activity of integrated circuits—i.e. transistors switching on and off—directly
affects their dynamic power consumption. Circuit-level design techniques such as clock-gating
are widely adopted as they dramatically reduce the impact of the global clock in synchronous
circuits, which constitutes the main source of switching activity. The KeyRing microarchitecture
presented in this work uses an asynchronous clocking scheme that relies on decentralized
synchronization mechanisms to reduce the switching activity of circuits. It is derived from
the AnARM, a power-efficient ARM processor developed by Octasic using an ad hoc asynchronous
microarchitecture. Although it delivers better power-efficiency than synchronous
alternatives, it is for the most part incompatible with standard timing-driven synthesis and
Static Timing Analysis (STA). In addition, its design style does not fit well within the existing
asynchronous design paradigms. This work lays the foundations for a more rigorous
definition of this rather unorthodox design style, using circuits and methods coming from the
asynchronous literature. The resulting KeyRing microarchitecture is developed in combination
with Electronic Design Automation (EDA) methods that alleviate incompatibility issues
related to ad hoc clocking, enabling timing-driven optimizations and verifications of KeyRing
circuits using industry-standard design flows. In addition to bridging the gap with standard
design practices, this work also proposes comprehensive experimental protocols that aims to
strengthen the causal relation between the reported asynchronous microarchitecture and a
reduced power consumption compared with synchronous alternatives.
The main achievement of this work is a framework that enables the architectural exploration
of circuits using the KeyRing microarchitecture
Elastic bundles :modelling and architecting asynchronous circuits with granular rigidity
PhD ThesisIntegrated Circuit (IC) designs these days are predominantly System-on-Chips (SoCs).
The complexity of designing a SoC has increased rapidly over the years due to growing
process and environmental variations coupled with global clock distribution di culty.
Moreover, traditional synchronous design is not apt to handle the heterogeneous timing
nature of modern SoCs. As a countermeasure, the semiconductor industry witnessed
a strong revival of asynchronous design principles. A new paradigm of digital circuits
emerged, as a result, namely mixed synchronous-asynchronous circuits. With a wave
of recent innovations in synchronous-asynchronous CAD integration, this paradigm is
showing signs of commercial adoption in future SoCs mainly due to the scope for reuse
of synchronous functional blocks and IP cores, and the co-existence of synchronous and
asynchronous design styles in a common EDA framework.
However, there is a lack of formal methods and tools to facilitate mixed synchronousasynchronous
design. In this thesis, we propose a formal model based on Petri nets with
step semantics to describe these circuits behaviourally. Implication of this model in the
veri cation and synthesis of mixed synchronous-asynchronous circuits is studied. Till
date, this paradigm has been mainly explored on the basis of Globally Asynchronous
Locally Synchronous (GALS) systems. Despite decades of research, GALS design has
failed to gain traction commercially. To understand its drawbacks, a simulation framework
characterising the physical and functional aspects of GALS SoCs is presented.
A novel method for synthesising mixed synchronous-asynchronous circuits with varying
levels of rigidity is proposed. Starting with a high-level data ow model of a system which
is intrinsically asynchronous, the key idea is to introduce rigidity of chosen granularity
levels in the model without changing functional behaviour. The system is then partitioned
into functional blocks of synchronous and asynchronous elements before being transformed
into an equivalent circuit which can be synthesised using standard EDA tools
Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks
Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture
Computer Aided Verification
This open access two-volume set LNCS 10980 and 10981 constitutes the refereed proceedings of the 30th International Conference on Computer Aided Verification, CAV 2018, held in Oxford, UK, in July 2018. The 52 full and 13 tool papers presented together with 3 invited papers and 2 tutorials were carefully reviewed and selected from 215 submissions. The papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. They are organized in topical sections on model checking, program analysis using polyhedra, synthesis, learning, runtime verification, hybrid and timed systems, tools, probabilistic systems, static analysis, theory and security, SAT, SMT and decisions procedures, concurrency, and CPS, hardware, industrial applications
Computer Aided Verification
This open access two-volume set LNCS 10980 and 10981 constitutes the refereed proceedings of the 30th International Conference on Computer Aided Verification, CAV 2018, held in Oxford, UK, in July 2018. The 52 full and 13 tool papers presented together with 3 invited papers and 2 tutorials were carefully reviewed and selected from 215 submissions. The papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. They are organized in topical sections on model checking, program analysis using polyhedra, synthesis, learning, runtime verification, hybrid and timed systems, tools, probabilistic systems, static analysis, theory and security, SAT, SMT and decisions procedures, concurrency, and CPS, hardware, industrial applications
Automated application-specific optimisation of interconnects in multi-core systems
In embedded computer systems there are often tasks, implemented as stand-alone devices,
that are both application-specific and compute intensive. A recurring problem
in this area is to design these application-specific embedded systems as close to the
power and efficiency envelope as possible. Work has been done on optimizing singlecore
systems and memory organisation, but current methods for achieving system design
goals are proving limited as the system capabilities and system size increase in the
multi- and many-core era. To address this problem, this thesis investigates machine
learning approaches to managing the design space presented in the interconnect design
of embedded multi-core systems. The design space presented is large due to the
system scale and level of interconnectivity, and also feature inter-dependant parameters,
further complicating analysis. The results presented in this thesis demonstrate
that machine learning approaches, particularly wkNN and random forest, work well
in handling the complexity of the design space. The benefits of this approach are in
automation, saving time and effort in the system design phase as well as energy and
execution time in the finished system
Programming Languages and Systems
This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes
Recommended from our members
Model driven certification of Cloud service security based on continuous monitoring
Cloud Computing technology offers an advanced approach for the provision of infrastructure, platform and software services without the need of extensive cost of owning, operating or maintaining the computational infrastructures required. However, despite being cost effective, this technology has raised concerns regarding the security, privacy and compliance of data or services offered through cloud systems. This is mainly due to the lack of transparency of services to the consumers, or due to the fact that service providers are unwilling to take full responsibility for the security of services that they offer through cloud systems, and accept liability for security breaches [18]. In such circumstances, there is a trust deficiency that needs to be addressed.
The potential of certification as a means of addressing the lack of trust regarding the security of different types of services, including the cloud, has been widely recognised [149]. However, the recognition of this potential has not led to a wide adoption, as it was expected. The reason could be that certification has traditionally been carried out through standards and certification schemes (e.g., ISO27001 [149], ISO27002 [149] and Common Criteria [65]), which involve predominantly manual systems for security auditing, testing and inspection processes. Such processes tend to be lengthy and have a significant financial cost, which often prevents small technology vendors from adopting it [87].
In this thesis, we present an automated approach for cloud service certification, where the evidence is gathered through continuous monitoring. This approach can be used to: (a) define and execute automatically certification models, to continuously acquire and analyse evidence regarding the provision of services on cloud infrastructures through continuous monitoring; (b) use this evidence to assess whether the provision is compliant with required security properties; and (c) generate and manage digital certificates to confirm the compliance of services with specific security properties