5,137 research outputs found
Cycle time optimization by timing driven placement with simultaneous netlist transformations
We present new concepts to integrate logic synthesis and physical design. Our methodology uses general Boolean transformations as known from technology-independent synthesis, and a recursive bi-partitioning placement algorithm. In each partitioning step, the precision of the layout data increases. This allows effective guidance of the logic synthesis operations for cycle time optimization. An additional advantage of our approach is that no complicated layout corrections are needed when the netlist is changed
PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of Spiking Neural Network
We present PyCARL, a PyNN-based common Python programming interface for
hardware-software co-simulation of spiking neural network (SNN). Through
PyCARL, we make the following two key contributions. First, we provide an
interface of PyNN to CARLsim, a computationally-efficient, GPU-accelerated and
biophysically-detailed SNN simulator. PyCARL facilitates joint development of
machine learning models and code sharing between CARLsim and PyNN users,
promoting an integrated and larger neuromorphic community. Second, we integrate
cycle-accurate models of state-of-the-art neuromorphic hardware such as
TrueNorth, Loihi, and DynapSE in PyCARL, to accurately model hardware latencies
that delay spikes between communicating neurons and degrade performance. PyCARL
allows users to analyze and optimize the performance difference between
software-only simulation and hardware-software co-simulation of their machine
learning models. We show that system designers can also use PyCARL to perform
design-space exploration early in the product development stage, facilitating
faster time-to-deployment of neuromorphic products. We evaluate the memory
usage and simulation time of PyCARL using functionality tests, synthetic SNNs,
and realistic applications. Our results demonstrate that for large SNNs, PyCARL
does not lead to any significant overhead compared to CARLsim. We also use
PyCARL to analyze these SNNs for a state-of-the-art neuromorphic hardware and
demonstrate a significant performance deviation from software-only simulations.
PyCARL allows to evaluate and minimize such differences early during model
development.Comment: 10 pages, 25 figures. Accepted for publication at International Joint
Conference on Neural Networks (IJCNN) 202
DD-AMG on QPACE 3
We describe our experience porting the Regensburg implementation of the
DD-AMG solver from QPACE 2 to QPACE 3. We first review how the code was
ported from the first generation Intel Xeon Phi processor (Knights Corner) to
its successor (Knights Landing). We then describe the modifications in the
communication library necessitated by the switch from InfiniBand to Omni-Path.
Finally, we present the performance of the code on a single processor as well
as the scaling on many nodes, where in both cases the speedup factor is close
to the theoretical expectations.Comment: 12 pages, 6 figures, Proceedings of Lattice 201
On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective
We implement and benchmark parallel I/O methods for the fully-manycore driven
particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as
a major challenge for applications on today's and future HPC systems, we
present a scaling law characterizing performance bottlenecks in
state-of-the-art approaches for data reduction. Consequently, we propose,
implement and verify multi-threaded data-transformations for the I/O library
ADIOS as a feasible way to trade underutilized host-side compute potential on
heterogeneous systems for reduced I/O latency.Comment: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'1
Cell replication and redundancy elimination during placement for cycle time optimization
This paper presents a new timing driven approach for cell replication tailored to the practical needs of standard cell layout design. Cell replication methods have been studied extensively in the context of generic partitioning problems. However, until now it has remained unclear what practical benefit can be obtained from this concept in a realistic environment for timing driven layout synthesis. Therefore, this paper presents a timing driven cell replication procedure, demonstrates its incorporation into a standard cell placement and routing tool and examines its benefit on the final circuit performance in comparison with conventional gate or transistor sizing techniques. Furthermore, we demonstrate that cell replication can deteriorate the stuck-at fault testability of circuits and show that stuck-at redundancy elimination must be integrated into the placement procedure. Experimental results demonstrate the usefulness of the proposed methodology and suggest that cell replication should be an integral part of the physical design flow complementing traditional gate sizing techniques
Very Large Scale Integration Cell Based Path Extractor For Physical To Layout Mapping In Fault Isolation Work
Debug and diagnosis in post-silicon challenges the technological advancement in Physical-to-Layout Mapping capabilities. Areas that require such innovation are fault isolation work in failure analysis of semiconductor devices, at post-silicon stage. Since fault isolation work begins at Register Transfer Level (RTL) level to form a suspected boundary consisting of multiple logics from one end to the other, layout to schematic mapping automation tool helps to identify fault in design within given boundary. Therefore the development of a path extractor program which is capable of extracting all possible paths from these start to end signals can save engineers time in tracing components involved between a fault line. This feature is extremely significant in Electronic Design Automation (EDA) as it can provide results of net name sequences stored in a database of mapper files. These mapper files can be used in layout design debug as the net sequence represents schematic signals. To be able to retrieve all possible signals involved within a suspected boundary is a popular search computational problem. Therefore the path extractor program proposed incorporates the characteristics of a depth-first search algorithm by considering the specifications of a cell-based design. The objectives achieved in this research are proven reliable with path extraction results consistent even with search depth manipulation. Performance differs an average of 12.6 % (iteration count) with keeping maximum allowable depth of search constant. Paths of net sequences were consistent throughout the verification of the path extractor program. This development and study of the path extract method carries significance in areas of EDA and debug diagnosis work
Placement driven retiming with a coupled edge timing model
Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the predicted performance improvement will still be valid after placement has been performed. This paper presents a new retiming algorithm using a highly accurate timing model taking into account the effect of retiming on capacitive loads of single wires as well as fanout systems. We propose the integration of retiming into a timing-driven standard cell placement environment based on simulated annealing. Retiming is used as an optimization technique throughout the whole placement process. The experimental results show the benefit of the proposed approach. In comparison with the conventional design flow based on standard FEAS our approach achieved an improvement in cycle time of up to 34% and 17% on the average
- …