3,111 research outputs found
The effectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures
Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover when the cycle time is taken into account, a 4-cluster configurations is 3.6 times faster than the unified architecture.Peer ReviewedPostprint (published version
Linear Encodings of Bounded LTL Model Checking
We consider the problem of bounded model checking (BMC) for linear temporal
logic (LTL). We present several efficient encodings that have size linear in
the bound. Furthermore, we show how the encodings can be extended to LTL with
past operators (PLTL). The generalised encoding is still of linear size, but
cannot detect minimal length counterexamples. By using the virtual unrolling
technique minimal length counterexamples can be captured, however, the size of
the encoding is quadratic in the specification. We also extend virtual
unrolling to Buchi automata, enabling them to accept minimal length
counterexamples.
Our BMC encodings can be made incremental in order to benefit from
incremental SAT technology. With fairly small modifications the incremental
encoding can be further enhanced with a termination check, allowing us to prove
properties with BMC. Experiments clearly show that our new encodings improve
performance of BMC considerably, particularly in the case of the incremental
encoding, and that they are very competitive for finding bugs. An analysis of
the liveness-to-safety transformation reveals many similarities to the BMC
encodings in this paper. Using the liveness-to-safety translation with
BDD-based invariant checking results in an efficient method to find shortest
counterexamples that complements the BMC-based approach.Comment: Final version for Logical Methods in Computer Science CAV 2005
special issu
Solving Large-Scale Optimization Problems Related to Bell's Theorem
Impossibility of finding local realistic models for quantum correlations due
to entanglement is an important fact in foundations of quantum physics, gaining
now new applications in quantum information theory. We present an in-depth
description of a method of testing the existence of such models, which involves
two levels of optimization: a higher-level non-linear task and a lower-level
linear programming (LP) task. The article compares the performances of the
existing implementation of the method, where the LPs are solved with the
simplex method, and our new implementation, where the LPs are solved with a
matrix-free interior point method. We describe in detail how the latter can be
applied to our problem, discuss the basic scenario and possible improvements
and how they impact on overall performance. Significant performance advantage
of the matrix-free interior point method over the simplex method is confirmed
by extensive computational results. The new method is able to solve problems
which are orders of magnitude larger. Consequently, the noise resistance of the
non-classicality of correlations of several types of quantum states, which has
never been computed before, can now be efficiently determined. An extensive set
of data in the form of tables and graphics is presented and discussed. The
article is intended for all audiences, no quantum-mechanical background is
necessary.Comment: 19 pages, 7 tables, 1 figur
Lanczos eigensolution method for high-performance computers
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors
- âŠ