Search CORE

412 research outputs found

Online Modeling and Tuning of Parallel Stream Processing Systems

Author: Beard Jonathan Curtis
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2015
Field of study

Writing performant computer programs is hard. Code for high performance applications is profiled, tweaked, and re-factored for months specifically for the hardware for which it is to run. Consumer application code doesn\u27t get the benefit of endless massaging that benefits high performance code, even though heterogeneous processor environments are beginning to resemble those in more performance oriented arenas. This thesis offers a path to performant, parallel code (through stream processing) which is tuned online and automatically adapts to the environment it is given. This approach has the potential to reduce the tuning costs associated with high performance code and brings the benefit of performance tuning to consumer applications where otherwise it would be cost prohibitive. This thesis introduces a stream processing library and multiple techniques to enable its online modeling and tuning. Stream processing (also termed data-flow programming) is a compute paradigm that views an application as a set of logical kernels connected via communications links or streams. Stream processing is increasingly used by computational-x and x-informatics fields (e.g., biology, astrophysics) where the focus is on safe and fast parallelization of specific big-data applications. A major advantage of stream processing is that it enables parallelization without necessitating manual end-user management of non-deterministic behavior often characteristic of more traditional parallel processing methods. Many big-data and high performance applications involve high throughput processing, necessitating usage of many parallel compute kernels on several compute cores. Optimizing the orchestration of kernels has been the focus of much theoretical and empirical modeling work. Purely theoretical parallel programming models can fail when the assumptions implicit within the model are mis-matched with reality (i.e., the model is incorrectly applied). Often it is unclear if the assumptions are actually being met, even when verified under controlled conditions. Full empirical optimization solves this problem by extensively searching the range of likely configurations under native operating conditions. This, however, is expensive in both time and energy. For large, massively parallel systems, even deciding which modeling paradigm to use is often prohibitively expensive and unfortunately transient (with workload and hardware). In an ideal world, a parallel run-time will re-optimize an application continuously to match its environment, with little additional overhead. This work presents methods aimed at doing just that through low overhead instrumentation, modeling, and optimization. Online optimization provides a good trade-off between static optimization and online heuristics. To enable online optimization, modeling decisions must be fast and relatively accurate. Online modeling and optimization of a stream processing system first requires the existence of a stream processing framework that is amenable to the intended type of dynamic manipulation. To fill this void, we developed the RaftLib C++ template library, which enables usage of the stream processing paradigm for C++ applications (it is the run-time which is the basis of almost all the work within this dissertation). An application topology is specified by the user, however almost everything else is optimizable by the run-time. RaftLib takes advantage of the knowledge gained during the design of several prior streaming languages (notably Auto-Pipe). The resultant framework enables online migration of tasks, auto-parallelization, online buffer-reallocation, and other useful dynamic behaviors that were not available in many previous stream processing systems. Several benchmark applications have been designed to assess the performance gains through our approaches and compare performance to other leading stream processing frameworks. Information is essential to any modeling task, to that end a low-overhead instrumentation framework has been developed which is both dynamic and adaptive. Discovering a fast and relatively optimal configuration for a stream processing application often necessitates solving for buffer sizes within a finite capacity queueing network. We show that a generalized gain/loss network flow model can bootstrap the process under certain conditions. Any modeling effort, requires that a model be selected; often a highly manual task, involving many expensive operations. This dissertation demonstrates that machine learning methods (such as a support vector machine) can successfully select models at run-time for a streaming application. The full set of approaches are incorporated into the open source RaftLib framework

Washington University St. Louis: Open Scholarship

Process algebra approach to parallel DBMS performance modelling

Author: Pua Chai Seng
Publication venue: Computing and Electrical Engineering
Publication date: 01/01/1999
Field of study

Abstract unavailable please refer to PD

ROS: The Research Output Service. Heriot-Watt University Edinburgh

OpenGrey Repository

Analytical models of a fault-tolerant multiple module microprocessor system

Author: Haeri-Gharavi A. A.
Haeri-Gharavi A. A.
Publication venue: Department of Electrical Engineering, Imperial College London
Publication date: 01/01/1984
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

The embedded operating system project

Author: Campbell R. H.
Publication venue
Publication date
Field of study

This progress report describes research towards the design and construction of embedded operating systems for real-time advanced aerospace applications. The applications concerned require reliable operating system support that must accommodate networks of computers. The report addresses the problems of constructing such operating systems, the communications media, reconfiguration, consistency and recovery in a distributed system, and the issues of realtime processing. A discussion is included on suitable theoretical foundations for the use of atomic actions to support fault tolerance and data consistency in real-time object-based systems. In particular, this report addresses: atomic actions, fault tolerance, operating system structure, program development, reliability and availability, and networking issues. This document reports the status of various experiments designed and conducted to investigate embedded operating system design issues

NASA Technical Reports Server

Performance Problem Diagnostics by Systematic Experimentation

Author: Wert Alexander
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

Diagnostics of performance problems requires deep expertise in performance engineering and entails a high manual effort. As a consequence, performance evaluations are postponed to the last minute of the development process. In this thesis, we introduce an automatic, experiment-based approach for performance problem diagnostics in enterprise software systems. With this approach, performance engineers can concentrate on their core competences instead of conducting repeating tasks

KITopen

NASA RECON: Course Development, Administration, and Evaluation

Author: Dominick W. D.
Roquemore L.
Publication venue
Publication date
Field of study

The R and D activities addressing the development, administration, and evaluation of a set of transportable, college-level courses to educate science and engineering students in the effective use of automated scientific and technical information storage and retrieval systems, and, in particular, in the use of the NASA RECON system, are discussed. The long-range scope and objectives of these contracted activities are overviewed and the progress which has been made toward these objectives during FY 1983-1984 is highlighted. In addition, the results of a survey of 237 colleges and universities addressing course needs are presented

NASA Technical Reports Server

Space station data system analysis/architecture study. Task 2: Options development, DR-5. Volume 2: Design options

Author
Publication venue
Publication date
Field of study

The primary objective of Task 2 is the development of an information base that will support the conduct of trade studies and provide sufficient data to make key design/programmatic decisions. This includes: (1) the establishment of option categories that are most likely to influence Space Station Data System (SSDS) definition; (2) the identification of preferred options in each category; and (3) the characterization of these options with respect to performance attributes, constraints, cost and risk. This volume contains the options development for the design category. This category comprises alternative structures, configurations and techniques that can be used to develop designs that are responsive to the SSDS requirements. The specific areas discussed are software, including data base management and distributed operating systems; system architecture, including fault tolerance and system growth/automation/autonomy and system interfaces; time management; and system security/privacy. Also discussed are space communications and local area networking

NASA Technical Reports Server

Software Performance Engineering using Virtual Time Program Execution

Author: Baltas Nikolaos
Publication venue: Computing, Imperial College London
Publication date: 01/07/2013
Field of study

In this thesis we introduce a novel approach to software performance engineering that is based on the execution of code in virtual time. Virtual time execution models the timing-behaviour of unmodified applications by scaling observed method times or replacing them with results acquired from performance model simulation. This facilitates the investigation of "what-if" performance predictions of applications comprising an arbitrary combination of real code and performance models. The ability to analyse code and models in a single framework enables performance testing throughout the software lifecycle, without the need to to extract performance models from code. This is accomplished by forcing thread scheduling decisions to take into account the hypothetical time-scaling or model-based performance specifications of each method. The virtual time execution of I/O operations or multicore targets is also investigated. We explore these ideas using a Virtual EXecution (VEX) framework, which provides performance predictions for multi-threaded applications. The language-independent VEX core is driven by an instrumentation layer that notifies it of thread state changes and method profiling events; it is then up to VEX to control the progress of application threads in virtual time on top of the operating system scheduler. We also describe a Java Instrumentation Environment (JINE), demonstrating the challenges involved in virtual time execution at the JVM level. We evaluate the VEX/JINE tools by executing client-side Java benchmarks in virtual time and identifying the causes of deviations from observed real times. Our results show that VEX and JINE transparently provide predictions for the response time of unmodified applications with typically good accuracy (within 5-10%) and low simulation overheads (25-50% additional time). We conclude this thesis with a case study that shows how models and code can be integrated, thus illustrating our vision on how virtual time execution can support performance testing throughout the software lifecycle

Spiral - Imperial College Digital Repository

2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

Author: Warren Michael S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/10/2013
Field of study

We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (

2^{18}

) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (

4096^3

) particle cosmological simulations, accounting for

4 \times 10^{20}

floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC '1

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Parameter dependencies for reusable performance specifications of software components

Author: Koziolek Heiko
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2008
Field of study

To avoid design-related performance problems, model-driven performance prediction methods analyse the response times, throughputs, and resource utilizations of software architectures before and during implementation. This thesis proposes new modeling languages and according model transformations, which allow a reusable description of usage profile dependencies to the performance of software components. Predictions based on this new methods can support performance-related design decisions

KITopen

Directory of Open Access Books (DOAB)