3,111 research outputs found

    The effectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures

    Get PDF
    Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover when the cycle time is taken into account, a 4-cluster configurations is 3.6 times faster than the unified architecture.Peer ReviewedPostprint (published version

    Linear Encodings of Bounded LTL Model Checking

    Full text link
    We consider the problem of bounded model checking (BMC) for linear temporal logic (LTL). We present several efficient encodings that have size linear in the bound. Furthermore, we show how the encodings can be extended to LTL with past operators (PLTL). The generalised encoding is still of linear size, but cannot detect minimal length counterexamples. By using the virtual unrolling technique minimal length counterexamples can be captured, however, the size of the encoding is quadratic in the specification. We also extend virtual unrolling to Buchi automata, enabling them to accept minimal length counterexamples. Our BMC encodings can be made incremental in order to benefit from incremental SAT technology. With fairly small modifications the incremental encoding can be further enhanced with a termination check, allowing us to prove properties with BMC. Experiments clearly show that our new encodings improve performance of BMC considerably, particularly in the case of the incremental encoding, and that they are very competitive for finding bugs. An analysis of the liveness-to-safety transformation reveals many similarities to the BMC encodings in this paper. Using the liveness-to-safety translation with BDD-based invariant checking results in an efficient method to find shortest counterexamples that complements the BMC-based approach.Comment: Final version for Logical Methods in Computer Science CAV 2005 special issu

    Solving Large-Scale Optimization Problems Related to Bell's Theorem

    Get PDF
    Impossibility of finding local realistic models for quantum correlations due to entanglement is an important fact in foundations of quantum physics, gaining now new applications in quantum information theory. We present an in-depth description of a method of testing the existence of such models, which involves two levels of optimization: a higher-level non-linear task and a lower-level linear programming (LP) task. The article compares the performances of the existing implementation of the method, where the LPs are solved with the simplex method, and our new implementation, where the LPs are solved with a matrix-free interior point method. We describe in detail how the latter can be applied to our problem, discuss the basic scenario and possible improvements and how they impact on overall performance. Significant performance advantage of the matrix-free interior point method over the simplex method is confirmed by extensive computational results. The new method is able to solve problems which are orders of magnitude larger. Consequently, the noise resistance of the non-classicality of correlations of several types of quantum states, which has never been computed before, can now be efficiently determined. An extensive set of data in the form of tables and graphics is presented and discussed. The article is intended for all audiences, no quantum-mechanical background is necessary.Comment: 19 pages, 7 tables, 1 figur

    Lanczos eigensolution method for high-performance computers

    Get PDF
    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors
    • 

    corecore