35 research outputs found
Petri nets based components within globally asynchronous locally synchronous systems
Dissertação apresentada na Faculdade de Ciências e Tecnologias da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia Electrotécnica e ComputadoresThe main goal is to develop a solution for the interconnection of components constituent of a GALS - Globally Asynchronous, Locally Synchronous – system. The components are implemented in parallel obtained as a result of the partition of a model expressed a Petri net (PN), performed using the PNs editor SNOOPY-IOPT in conjunction with the Split tool and the tools to automatically generate the VHDL code from the representations of the PNML models resulting from the partition (these tools were developed under the project FORDESIGN and are available at http://www.uninova.pt/FORDESIGN). Typical solutions will be analyzed to ensure proper communication between components of the GALS system, as well as characterized and developed an appropriate solution for the interconnection of the components associated with the PN sub-models. The final goal (not attained with this thesis) would be to acquire a tool that allows generation of code for the interconnection solution from the associated components, considering a specific application. The solution proposed for componentes interconnection was coded in VHDL and the implementation platforms used for testing include the Xilinx FPGA Spartan-3 and Virtex-II
Developing Globally-Asynchronous Locally- Synchronous Systems through the IOPT-Flow Framework
Throughout the years, synchronous circuits have increased in size and com-plexity, consequently, distributing a global clock signal has become a laborious task. Globally-Asynchronous Locally-Synchronous (GALS) systems emerge as a possible solution; however, these new systems require new tools.
The DS-Pnet language formalism and the IOPT-Flow framework aim to support and accelerate the development of cyber-physical systems. To do so it offers a tool chain that comprises a graphical editor, a simulator and code gener-ation tools capable of generating C, JavaScript and VHDL code. However, DS-Pnets and IOPT-Flow are not yet tuned to handle GALS systems, allowing for partial specification, but not a complete one.
This dissertation proposes extensions to the DS-Pnet language and the IOPT-Flow framework in order to allow development of GALS systems. Addi-tionally, some asynchronous components were created, these form interfaces that allow synchronous blocks within a GALS system to communicate with each other
Inter-module Interfacing techniques for SoCs with multiple clock domains to address challenges in modern deep sub-micron technologies
Miniaturization of integrated circuits (ICs) due to the improvement in lithographic techniques in modem deep sub-micron (DSM) technologies allows several complex processing elements to coexist in one IC, which are called System-on-Chip. As a first contribution, this thesis quantitatively analyzes the severity of timing constraints associated with Clock Distribution Network (CDN) in modem DSM technologies and shows that different processing elements may work in different dock domains to alleviate these constraints. Such systems are known as Globally Asynchronous Locally Synchronous (GALS) systems. It is imperative that different processing elements of a GALS system need to communicate with each other through some interfacing technique, and these interfaces can be asynchronous or synchronous. Conventionally, the asynchronous interfaces are described at the Register Transfer Logic (RTL) or system level. Such designs are susceptible to certain design constraints that cannot be addressed at higher abstraction levels; crosstalk glitch is one such constraint. This thesis initially identifies, using an analytical model, the possibility of asynchronous interface malfunction due to crosstalk glitch propagation. Next, we characterize crosstalk glitch propagation under normal operating conditions for two different classes of asynchronous protocols, namely bundled data protocol based and delay insensitive asynchronous designs. Subsequently, we propose a logic abstraction level modeling technique, which provides a framework to the designer to verify the asynchronous protocols against crosstalk glitches. The utility of this modeling technique is demonstrated experimentally on a Xilinx Virtex-II Pro FPGA. Furthermore, a novel methodology is proposed to quench such crosstalk glitch propagation through gating the asynchronous interface from sending the signal during potential glitch vulnerable instances. This methodology is termed as crosstalk glitch gating. This technique is successfully applied to obtain crosstalk glitch quenching in the representative interfaces. This thesis also addresses the dock skew challenges faced by high-performance synchronous interfacing methodologies in modem DSM technologies. The proposed methodology allows communicating modules to run at a frequency that is independent of the dock skew. Leveraging a novel clock-scheduling algorithm, our technique permits a faster module to communicate safely with a slower module without slowing down. Safe data communications for mesochronous schemes and for the cases when communicating modules have dock frequency ratios of integer or coprime numbers are theoretically explained and experimentally demonstrated. A clock-scheduling technique to dynamically accommodate phase variations is also proposed. These methods are implemented to the Xilinx Virtex II Pro technology. Experiments prove that the proposed interfacing scheme allows modules to communicate data safely, for mesochronous schemes, at 350 MHz, which is the limit of the technology used, under a dock skew of more than twice the time period (i.e. a dock skew of 12 ns
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
PALS: Distributed Gradient Clocking on Chip
Consider an arbitrary network of communicating modules on a chip, each
requiring a local signal telling it when to execute a computational step. There
are three common solutions to generating such a local clock signal: (i) by
deriving it from a single, central clock source, (ii) by local, free-running
oscillators, or (iii) by handshaking between neighboring modules. Conceptually,
each of these solutions is the result of a perceived dichotomy in which
(sub)systems are either clocked or asynchronous. We present a solution and its
implementation that lies between these extremes. Based on a distributed
gradient clock synchronization algorithm, we show a novel design providing
modules with local clocks, the frequency bounds of which are almost as good as
those of free-running oscillators, yet neighboring modules are guaranteed to
have a phase offset substantially smaller than one clock cycle. Concretely,
parameters obtained from a 15nm ASIC simulation running at 2GHz yield
mathematical worst-case bounds of 20ps on the phase offset for a
node grid network
Practical advances in asynchronous design and in asynchronous/synchronous interfaces
Journal ArticleAsynchronous systems are being viewed as an increasingly viable alternative to purely synchronous systems. This paper gives an overview of the current state of the art in practical asynchronous circuit and system design in four areas: controllers, datapaths, processors, and the design of asynchronous/synchronous interfaces
Elastic bundles :modelling and architecting asynchronous circuits with granular rigidity
PhD ThesisIntegrated Circuit (IC) designs these days are predominantly System-on-Chips (SoCs).
The complexity of designing a SoC has increased rapidly over the years due to growing
process and environmental variations coupled with global clock distribution di culty.
Moreover, traditional synchronous design is not apt to handle the heterogeneous timing
nature of modern SoCs. As a countermeasure, the semiconductor industry witnessed
a strong revival of asynchronous design principles. A new paradigm of digital circuits
emerged, as a result, namely mixed synchronous-asynchronous circuits. With a wave
of recent innovations in synchronous-asynchronous CAD integration, this paradigm is
showing signs of commercial adoption in future SoCs mainly due to the scope for reuse
of synchronous functional blocks and IP cores, and the co-existence of synchronous and
asynchronous design styles in a common EDA framework.
However, there is a lack of formal methods and tools to facilitate mixed synchronousasynchronous
design. In this thesis, we propose a formal model based on Petri nets with
step semantics to describe these circuits behaviourally. Implication of this model in the
veri cation and synthesis of mixed synchronous-asynchronous circuits is studied. Till
date, this paradigm has been mainly explored on the basis of Globally Asynchronous
Locally Synchronous (GALS) systems. Despite decades of research, GALS design has
failed to gain traction commercially. To understand its drawbacks, a simulation framework
characterising the physical and functional aspects of GALS SoCs is presented.
A novel method for synthesising mixed synchronous-asynchronous circuits with varying
levels of rigidity is proposed. Starting with a high-level data ow model of a system which
is intrinsically asynchronous, the key idea is to introduce rigidity of chosen granularity
levels in the model without changing functional behaviour. The system is then partitioned
into functional blocks of synchronous and asynchronous elements before being transformed
into an equivalent circuit which can be synthesised using standard EDA tools
An Asynchronous GALS Interface with Applications
A low-latency asynchronous interface for use in globally-asynchronous locally-synchronous (GALS) integrated circuits is presented. The interface is compact and does not alter the local clocks of the interfaced local clock domains in any way (unlike many existing GALS interfaces). Two applications of the interface to GALS systems are shown. The first is a single-chip shared-memory multiprocessor for generic supercomputing use. The second is an application-specific coprocessor for hardware acceleration of the Smith-Waterman algorithm. This is a bioinformatics algorithm used for sequence alignment (similarity searching) between DNA or amino acid (protein) sequences and sequence databases such as the recently completed human genome database
Metastability Tolerant Computing
International audienceSynchronization using flip-flop chains imposes a latency of a few clock cycles when transferring data and control signals between clock domains. We propose a design scheme that avoids this latency by performing synchronization as part of state/data computations while guaranteeing that metastability is contained and its effects tolerated (with an acceptable failure probability). We present a theoretical framework for modeling synchronous state machines in the presence of metastability and use it to prove properties that guarantee some form of reliability. Specifically, we show that the inevitable state/data corruption resulting from propagating metastable states can be confined to a subset of computations. Applications that can tolerate certain failures can exploit this property to leverage low-latency and quasi-reliable operation simultaneously. We demonstrate the approach by designing a Network-on-Chip router with zero-latency asynchronous ports and show via simulation that it outperforms a variant with two flip-flop synchronizers at a negligible cost in packet transfer reliability
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes