2,197 research outputs found

    Speeding Up Technology-Independent Timing Optimization by Network Partitioning

    Get PDF
    Abstract Technology-independenttiming optimizationis an importantproblem in logic synthesis. Although many promising techniques have been proposed in the past, unfortunately they are quite slow and thus impractical for large networks. In this paper, we propose DE-PART, a delay-basedpartitioner-cum-optimizer, which purports to solve this problem. Given a combinational logic network that is to be optimized for timing, DEPART divides it into sub-networks using timing information and a constraint on the maximum number of gates allowed in a single sub-network. These sub-networks are then dispatched, one by one, to a standard timing optimizer. The optimized sub-networks are re-glued, generating an optimized network. The challenge is how to partition the original network into sub-networks so that the nal solution quality after partitioning and optimization is comparable to that from the timing optimizer. We propose a partitioning technique that is timing-driven and is simple yet e ective. We compare DEPART with speed up 21 , a state-of-the-art timing optimization tool, and with various partitioning techniques such as min-cut based and region growing, on a suite of large industrial and ISCAS circuits. On more than half of the benchmarks, DEPART yields run-time improvements of 20 to 450 times over a normal invocation of speed up the overall average improvement being 8 times, without compromising the solution quality m uch. Min-cut and region growing partitioning schemes, not being timing-driven, perform poorly in terms of the nal circuit delay

    New FPGA design tools and architectures

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationWith the explosion of chip transistor counts, the semiconductor industry has struggled with ways to continue scaling computing performance in line with historical trends. In recent years, the de facto solution to utilize excess transistors has been to increase the size of the on-chip data cache, allowing fast access to an increased portion of main memory. These large caches allowed the continued scaling of single thread performance, which had not yet reached the limit of instruction level parallelism (ILP). As we approach the potential limits of parallelism within a single threaded application, new approaches such as chip multiprocessors (CMP) have become popular for scaling performance utilizing thread level parallelism (TLP). This dissertation identifies the operating system as a ubiquitous area where single threaded performance and multithreaded performance have often been ignored by computer architects. We propose that novel hardware and OS co-design has the potential to significantly improve current chip multiprocessor designs, enabling increased performance and improved power efficiency. We show that the operating system contributes a nontrivial overhead to even the most computationally intense workloads and that this OS contribution grows to a significant fraction of total instructions when executing several common applications found in the datacenter. We demonstrate that architectural improvements have had little to no effect on the performance of the OS over the last 15 years, leaving ample room for improvements. We specifically consider three potential solutions to improve OS execution on modern processors. First, we consider the potential of a separate operating system processor (OSP) operating concurrently with general purpose processors (GPP) in a chip multiprocessor organization, with several specialized structures acting as efficient conduits between these processors. Second, we consider the potential of segregating existing caching structures to decrease cache interference between the OS and application. Third, we propose that there are components within the OS itself that should be refactored to be both multithreaded and cache topology aware, which in turn, improves the performance and scalability of many-threaded applications
    corecore