Search CORE

3,111 research outputs found

The effectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures

Author: González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover when the cycle time is taken into account, a 4-cluster configurations is 3.6 times faster than the unified architecture.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Linear Encodings of Bounded LTL Model Checking

Author: Armin Biere
Keijo Heljanko
Kousha Etessami
Timo Latvala
Tommi Junttila
Viktor Schuppan
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2006
Field of study

We consider the problem of bounded model checking (BMC) for linear temporal logic (LTL). We present several efficient encodings that have size linear in the bound. Furthermore, we show how the encodings can be extended to LTL with past operators (PLTL). The generalised encoding is still of linear size, but cannot detect minimal length counterexamples. By using the virtual unrolling technique minimal length counterexamples can be captured, however, the size of the encoding is quadratic in the specification. We also extend virtual unrolling to Buchi automata, enabling them to accept minimal length counterexamples. Our BMC encodings can be made incremental in order to benefit from incremental SAT technology. With fairly small modifications the incremental encoding can be further enhanced with a termination check, allowing us to prove properties with BMC. Experiments clearly show that our new encodings improve performance of BMC considerably, particularly in the case of the incremental encoding, and that they are very competitive for finding bugs. An analysis of the liveness-to-safety transformation reveals many similarities to the BMC encodings in this paper. Using the liveness-to-safety translation with BDD-based invariant checking results in an efficient method to find shortest counterexamples that complements the BMC-based approach.Comment: Final version for Logical Methods in Computer Science CAV 2005 special issu

arXiv.org e-Print Archive

CiteSeerX

Crossref

Episciences.org

Directory of Open Access Journals

Solving Large-Scale Optimization Problems Related to Bell's Theorem

Author: Altman
Bell
Dantzig
Fine
Gondzio
Gondzio
Greenberger
Gruca
Gruca
Hall
J.A. Julian Hall
Jacek A. Gruca
Jacek Gondzio
Kaszlikowski
Marek Żukowski
McKinnon
Mermin
Nelder
Nielsen
Press
Sarkar
Saunders
Scarani
Wiesław Laskowski
Wright
Publication venue: 'Elsevier BV'
Publication date: 18/01/2014
Field of study

Impossibility of finding local realistic models for quantum correlations due to entanglement is an important fact in foundations of quantum physics, gaining now new applications in quantum information theory. We present an in-depth description of a method of testing the existence of such models, which involves two levels of optimization: a higher-level non-linear task and a lower-level linear programming (LP) task. The article compares the performances of the existing implementation of the method, where the LPs are solved with the simplex method, and our new implementation, where the LPs are solved with a matrix-free interior point method. We describe in detail how the latter can be applied to our problem, discuss the basic scenario and possible improvements and how they impact on overall performance. Significant performance advantage of the matrix-free interior point method over the simplex method is confirmed by extensive computational results. The new method is able to solve problems which are orders of magnitude larger. Consequently, the noise resistance of the non-classicality of correlations of several types of quantum states, which has never been computed before, can now be efficiently determined. An extensive set of data in the form of tables and graphics is presented and discussed. The article is intended for all audiences, no quantum-mechanical background is necessary.Comment: 19 pages, 7 tables, 1 figur

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Lanczos eigensolution method for high-performance computers

Author: Bostic Susan W.
Publication venue
Publication date
Field of study

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

NASA Technical Reports Server