1,632 research outputs found
Performance analysis methods for understanding scaling bottlenecks in multi-threaded applications
In dit proefschrift stellen we drie nieuwe methodes voor om de prestatie van meerdradige programma's te analyseren. Onze eerste methode, criticality stacks, is bruikbaar voor het analyseren van onevenwicht tussen draden. Om deze stacks te construeren stellen we een nieuwe criticaliteitsmetriek voor, die de uitvoeringstijd van een applicatie opsplitst in een deel voor iedere draad. Hoe groter dit deel is voor een draad, hoe kritischer deze draad is voor de applicatie. De tweede methode, bottle graphs, stelt iedere draad van een meerdradig programma voor als een rechthoek in een grafiek. De hoogte van de rechthoek wordt berekend door middel van onze criticaliteitsmetriek, en de breedte stelt het parallellisme van een draad voor. Rechthoeken die bovenaan in de grafiek zitten, als het ware in de hals van de fles, hebben een beperkt parallellisme, waardoor we ze beschouwen als “bottlenecks” voor de applicatie. Onze derde methode, speedup stacks, toont de bereikte speedup van een applicatie en de verschillende componenten die speedup beperken in een gestapelde grafiek. De intuïtie achter dit concept is dat door het reduceren van de invloed van een bepaalde component, de speedup van een applicatie proportioneel toeneemt met de grootte van die component in de stapel
Quantum resource estimates for computing elliptic curve discrete logarithms
We give precise quantum resource estimates for Shor's algorithm to compute
discrete logarithms on elliptic curves over prime fields. The estimates are
derived from a simulation of a Toffoli gate network for controlled elliptic
curve point addition, implemented within the framework of the quantum computing
software tool suite LIQ. We determine circuit implementations for
reversible modular arithmetic, including modular addition, multiplication and
inversion, as well as reversible elliptic curve point addition. We conclude
that elliptic curve discrete logarithms on an elliptic curve defined over an
-bit prime field can be computed on a quantum computer with at most qubits using a quantum circuit of at most Toffoli gates. We are able to classically simulate the
Toffoli networks corresponding to the controlled elliptic curve point addition
as the core piece of Shor's algorithm for the NIST standard curves P-192,
P-224, P-256, P-384 and P-521. Our approach allows gate-level comparisons to
recent resource estimates for Shor's factoring algorithm. The results also
support estimates given earlier by Proos and Zalka and indicate that, for
current parameters at comparable classical security levels, the number of
qubits required to tackle elliptic curves is less than for attacking RSA,
suggesting that indeed ECC is an easier target than RSA.Comment: 24 pages, 2 tables, 11 figures. v2: typos fixed and reference added.
ASIACRYPT 201
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
Recommended from our members
Methods for Performance Evaluation of Parallel Computer Systems
Although parallel computers have existed for many years, recently there has been a surge of academic, industrial and governmental interest in parallel computing. Commercially manufactured parallel computers have started to become available. Many new experimental parallel architectures are reported in the literature every year. Software for many types of applications, from scientific number crunching to artificial intelligence, is being written to run on parallel machines. Performance is an essential consideration both in the design of new systems and the deployment of existing systems. Users of computers wish to utilize their hardware and software systems as efficiently as possible. Over the years, a field known as computer performance evaluation has arisen to address the problem of quantifying and predicting computer performance. Methods exist that can determine how efficiently a system's resources are being used. These can help track down the probable causes of performance problems
Full-Stack Performance Model Evaluation using Probabilistic Garbage Collection Simulation
ABSTRACT Performance models can represent the performance relevant aspects of an enterprise application. Corresponding simulation engines use such models for simulating performance metrics (e.g., response times, resource utilization, throughput) and allow for performance evaluations without load testing the actual system. Creating such models manually often outweighs their benefits. Therefore, recent research created performance model generators, which can generate such models out of Application Performance Management software. However, a full-stack evaluation containing all relevant resources of an enterprise application (Central Processing Unit, memory, network and Hard Disk Drive) has not been conducted to the best of our knowledge. This work closes this gap using a pre-release version of the next generation industry benchmark SPECjEnterpriseNEXT of the Standard Performance Evaluation Corporation as example enterprise application, the Palladio Component Model as performance model and the performance model generator of the RETIT Capacity Manager. Furthermore, this work extends the generated model with a probabilistic garbage collection model to simulate memory allocation and releases more accurately
Declassification: transforming java programs to remove intermediate classes
Computer applications are increasingly being written in object-oriented languages like Java and C++ Object-onented programming encourages the use of small methods and classes. However, this style of programming introduces much overhead as each method call results in a dynamic dispatch and each field access becomes a pointer dereference to the heap allocated object. Many of the classes in these programs are included to provide structure rather than to act as reusable code, and can therefore be regarded as intermediate. We have therefore developed an optimisation technique, called declassification, which will transform Java programs into equivalent programs from which these intermediate classes have been removed.
The optimisation technique developed involves two phases, analysis and transformation. The analysis involves the identification of intermediate classes for removal. A suitable class is defined to be a class which is used exactly once within a program. Such classes are identified by this analysis The subsequent transformation involves eliminating these intermediate classes from the program. This involves inlinmg the fields and methods of each intermediate class within the enclosing class which uses it.
In theory, declassification reduces the number of classes which are instantiated and used in a program during its execution. This should reduce the overhead of object creation and maintenance as child objects are no longer created, and it should also reduce the number of field accesses and dynamic dispatches required by a program to execute. An important feature of the declassification technique, as opposed to other similar techniques, is that it guarantees there will be no increase in code size. An empirical study was conducted on a number of reasonable-sized Java programs and it was found that very few suitable classes were identified for miming. The results showed that the declassification technique had a small influence on the memory consumption and a negligible influence on the run-time performance of these programs. It is therefore concluded that the declassification technique was not successful in optimizing the test programs but further extensions to this technique combined with an intrinsically object-onented set of test programs could greatly improve its success
- …