Search CORE

35,009 research outputs found

Efficient methods for trace analysis parallelization

Author: Dagenais Michel R.
Ezzati-Jivan Naser
Reumont-Locke Fabien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/02/2019
Field of study

Tracing provides a low-impact, high-resolution way to observe the execution of a system. As the amount of parallelism in traced systems increases, so does the data generated by the trace. Most trace analysis tools work in a single thread, which hinders their performance as the scale of data increases. In this paper, we explore parallelization as an approach to speedup system trace analysis. We propose a solution which uses the inherent aspects of the CTF trace format to create balanced and parallelizable workloads. Our solution takes into account key factors of parallelization, such as good load balancing, low synchronization overhead and an efficient resolution of data dependencies. We also propose an algorithm to detect and resolve data dependencies during trace analysis, with minimal locking and synchronization. Using this approach, we implement three different trace analysis programs: event counting, CPU usage analysis and I/O usage analysis, to assess the scalability in terms of parallel efficiency. The parallel implementations achieve parallel efficiency above 56% with 32 cores, which translates to a speedup of 18 times the serial speed, when running the parallel trace analyses and using trace data stored on consumer-grade solid state storage devices. We also show the scalability and potential of our approach by measuring the effect of future improvements to trace decoding on parallel efficiency

PolyPublie

Deterministic Consistency: A Programming Model for Shared Memory Parallelism

Author: Aviram Amittai
Ford Bryan
Publication venue
Publication date: 01/01/2010
Field of study

The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code, and runtime systems can impose synthetic schedules on legacy parallel code. To parallelize existing serial code, however, we would like a programming model that is naturally deterministic without language restrictions or artificial scheduling. We propose "deterministic consistency", a parallel programming model as easy to understand as the "parallel assignment" construct in sequential languages such as Perl and JavaScript, where concurrent threads always read their inputs before writing shared outputs. DC supports common data- and task-parallel synchronization abstractions such as fork/join and barriers, as well as non-hierarchical structures such as producer/consumer pipelines and futures. A preliminary prototype suggests that software-only implementations of DC can run applications written for popular parallel environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors

Author: Akram Shoaib
De Pestel Sander
Eeckhout Lieven
Van den Steen Sam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance

Crossref

Ghent University Academic Bibliography

Easier Debugging of Multithreaded Software

Author: Kathare Sampada
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2012
Field of study

Software activation is a technique designed to avoid illegal use of a licensed software. This is achieved by having a legitimate user enter a software activation key to validate the purchase of the software. Generally, a software is a single-threaded program. From an attacker’s perspective, who does not wish to pay for this software, it is not hard to reverse engineer such a single threaded program and trace its path of execution. With tools such as OllyDbg, the attacker can look into the disassembled code of this software and find out where the verification logic is being performed and then patch it to skip the verification altogether. In order to make the attacker’s task difficult, a multi-threaded approach towards software development was proposed [1]. According to this approach, you should break the verification logic into several pieces, each of which should run in a separate thread. Any debugger, such as OllyDbg, is capable of single-stepping through only one thread at a time, although it is aware of the existence of other threads. This makes it difficult for an attacker to trace the verification logic. Not just for an attacker, it is also difficult for any ethical developer to debug a multithreaded program. The motivation behind this project is to develop the prototype of a debugger that will make it easer to trace the execution path of a multi-threaded program. The intended debugger has to be able to single-step through all of the threads in lockstep

SJSU ScholarWorks

Communicating Java Threads

Author: Bakkers A.
Bakkers André
Broenink Johannes F.
Hilderink G.H.
Vervoort Wiek
Publication venue: IOS Press
Publication date: 01/01/1997
Field of study

The incorporation of multithreading in Java may be considered a significant part of the Java language, because it provides udimentary facilities for concurrent programming. However, we belief that the use of channels is a fundamental concept for concurrent programming. The channel approach as described in this paper is a realization of a systematic design method for concurrent programming in Java based on the CSP paradigm. CSP requires the availability of a Channel class and the addition of composition constructs for sequential, parallel and alternative processes. The Channel class and the constructs have been implemented in Java in compliance with the definitions in CSP. As a result, implementing communication between processes is facilitated, enabling the programmer to avoid deadlock more easily, and freeing the programmer from synchronization and scheduling constructs. The use of the Channel class and the additional constructs is illustrated in a simple application

CiteSeerX

University of Twente Research Information

TRACTABLE DATA-FLOW ANALYSIS FOR DISTRIBUTED SYSTEMS

Author: CHEUNG SC
KRAMER J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

Automated behavior analysis is a valuable technique in the development and maintainence of distributed systems. In this paper, we present a tractable dataflow analysis technique for the detection of unreachable states and actions in distributed systems. The technique follows an approximate approach described by Reif and Smolka, but delivers a more accurate result in assessing unreachable states and actions. The higher accuracy is achieved by the use of two concepts: action dependency and history sets. Although the technique, does not exhaustively detect all possible errors, it detects nontrivial errors with a worst-case complexity quadratic to the system size. It can be automated and applied to systems with arbitrary loops and nondeterministic structures. The technique thus provides practical and tractable behavior analysis for preliminary designs of distributed systems. This makes it an ideal candidate for an interactive checker in software development tools. The technique is illustrated with case studies of a pump control system and an erroneous distributed program. Results from a prototype implementation are presented

Spiral - Imperial College Digital Repository

Hong Kong University of Science and Technology Institutional Repository

Safe and Verifiable Design of Concurrent Java Programs

Author: Bakkers A.W.P.
Hilderink G.H.
Stiles G.S.
Welch P.H.
Publication venue: Acta Press
Publication date: 01/01/2001
Field of study

The design of concurrent programs has a reputation for being difficult, and thus potentially dangerous in safetycritical real-time and embedded systems. The recent appearance of Java, whilst cleaning up many insecure aspects of OO programming endemic in C++, suffers from a deceptively simple threads model that is an insecure variant of ideas that are over 25 years old [1]. Consequently, we cannot directly exploit a range of new CASE tools -- based upon modern developments in parallel computing theory -- that can verify and check the design of concurrent systems for a variety of dangers\ud such as deadlock and livelock that otherwise plague us during testing and maintenance and, more seriously, cause catastrophic failure in service. \ud Our approach uses recently developed Java class\ud libraries based on Hoare's Communicating Sequential Processes (CSP); the use of CSP greatly simplifies the design of concurrent systems and, in many cases, a parallel approach often significantly simplifies systems originally approached sequentially. New CSP CASE tools permit designs to be verified against formal specifications\ud and checked for deadlock and livelock. Below we introduce CSP and its implementation in Java and develop a small concurrent application. The formal CSP description of the application is provided, as well as that of an equivalent sequential version. FDR is used to verify the correctness of both implementations, their\ud equivalence, and their freedom from deadlock and livelock

Crossref

University of Twente Research Information