3,389 research outputs found

    Placement driven retiming with a coupled edge timing model

    Get PDF
    Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the predicted performance improvement will still be valid after placement has been performed. This paper presents a new retiming algorithm using a highly accurate timing model taking into account the effect of retiming on capacitive loads of single wires as well as fanout systems. We propose the integration of retiming into a timing-driven standard cell placement environment based on simulated annealing. Retiming is used as an optimization technique throughout the whole placement process. The experimental results show the benefit of the proposed approach. In comparison with the conventional design flow based on standard FEAS our approach achieved an improvement in cycle time of up to 34% and 17% on the average

    Tight coupling of timing driven placement and retiming

    Get PDF
    Retiming is a widely investigated technique for performance optimization. In general, it performs extensive modifications on a circuit netlist, leaving it unclear, whether the achieved performance improvement will still be valid after placement has been performed. This paper presents an approach for integrating retiming into a timing-driven placement environment. The experimental results show the benefit of the proposed approach on circuit performance in comparison with design flows using retiming only as a pre- or postplacement optimization method

    All-optical pulse reshaping and retiming systems incorporating pulse shaping fiber Bragg grating

    No full text
    This paper demonstrates two optical pulse retiming and reshaping systems incorporating superstructured fiber Bragg gratings (SSFBGs) as pulse shaping elements. A rectangular switching window is implemented to avoid conversion of the timing jitter on the original data pulses into pulse amplitude noise at the output of a nonlinear optical switch. In a first configuration, the rectangular pulse generator is used at the (low power) data input to a nonlinear optical loop mirror (NOLM) to perform retiming of an incident noisy data signal using a clean local clock signal to control the switch. In a second configuration, the authors further amplify the data signal and use it to switch a (low power) clean local clock signal. The S-shaped nonlinear characteristic of the NOLM results in this instance in a reduction of both timing and amplitude jitter on the data signal. The underlying technologies required for the implementation of this technique are such that an upgrade of the scheme for the regeneration of ultrahigh bit rate signals at data rates in excess of 320 Gb/s should be achievable

    All-optical 160 Gbit/s RZ data retiming system incorporating a pulse shaping fibre Bragg grating

    No full text
    We characterize a 160Gbit/s retimer based on flat-topped pulses shaped using a superstructured fibre Bragg grating. The benefits of using shaped rather than conventional pulse forms in terms of timing jitter reduction are confirmed by bit-error-rate measurements

    Effect of Jitter on the Settling Time of Mesochronous Clock Retiming Circuits

    Full text link
    It is well known that timing jitter can degrade the bit error rate (BER) of receivers that recover the clock from input data. However, timing jitter can also result in an indefinite increase in the settling time of clock recovery circuits, particularly in low swing mesochronous systems. Mesochronous clock retiming circuits are required in repeaterless low swing on-chip interconnects. We first discuss how timing jitter can result in a large increase in the settling time of the clock recovery circuit. Next, the circuit is modelled as a Markov chain with absorbing states. The mean time to absorption of the Markov chain, which represents the mean settling time of the circuit, is determined. The model is validated through behavioural simulations of the circuit, the results of which match well with the model predictions. We consider circuits with (i) data dependent jitter, (ii) random jitter, and (iii) combination of both of them. We show that a mismatch between the strengths of up and down corrections of the retiming can reduce the settling time. In particular, a 10% mismatch can reduce the mean settling time by up to 40%. We leverage this fact toward improving the settling time performance, and propose useful techniques based on biased training sequences and mismatched charge pumps. We also present a coarse+fine clock retiming circuit, which can operate in coarse first mode, to reduce the settling time substantially. These fast settling retiming circuits are verified with circuit simulations.Comment: 23 pages, 40 figure

    Loop pipelining with resource and timing constraints

    Get PDF
    Developing efficient programs for many of the current parallel computers is not easy due to the architectural complexity of those machines. The wide variety of machine organizations often makes it more difficult to port an existing program than to reprogram it completely. Therefore, powerful translators are necessary to generate effective code and free the programmer from concerns about the specific characteristics of the target machine. This work focuses on techniques to be used by an important class of translators, whose objective is to transform sequential programs into equivalent more parallel programs. The transformations are performed at instruction level in order to exploit low level parallelism and increase memory locality.Most of the current applications are programmed in languages which do not allow us to express parallelism between high-level sentences (as Pascal, C or Fortran). Furthermore, a lot of applications written ten or more years ago are still used today, and it is not feasible to rewrite such applications for many reasons (not only technical reasons, but also economic ones). Translators enable programmers to write the application in a familiar sequential programming language, without concerning their selves with the architecture of the target machine. Current compilers for parallel architectures not only translate a program written on a high-level language to the appropriate machine language, but also perform some transformations in the final code in order to execute the program in a more parallel way. The transformations improve the performance in the execution of the program by making use of the knowledge that the compiler has about the machine architecture. The semantics of the program remain intact after any transformation.Experiments show that limiting parallelization to basic blocks not included in loops limits maximum speedup. This is because loops often comprise a large portion of the parallelism available to be exploited in a program. For this reason, a lot of effort has been devoted in the recent years to parallelize loop execution. Several parallel computer architectures and compilation techniques have been proposed to exploit such a parallelism at different granularities. Multiprocessors exploit coarse grained parallelism by distributing entire loop iterations to different processors. Systems oriented to the high-level synthesis (HLS) of VLSI circuits, superscalar processors and very long instruction word (VLIW) processors exploit fine-grained parallelism at instruction level. This work addresses fine-grained parallelization of loops addressed to the HLS of VLSI circuits. Two algorithms are proposed for resource constraints and for timing constraints. An algorithm to reduce the number of registers required to execute a loop in a given architecture is also proposed.Postprint (published version
    corecore