29,183 research outputs found

    Advanced Memory Data Structures for Scalable Event Trace Analysis

    Get PDF
    The thesis presents a contribution to the analysis and visualization of computational performance based on event traces with a particular focus on parallel programs and High Performance Computing (HPC). Event traces contain detailed information about specified incidents (events) during run-time of programs and allow minute investigation of dynamic program behavior, various performance metrics, and possible causes of performance flaws. Due to long running and highly parallel programs and very fine detail resolutions, event traces can accumulate huge amounts of data which become a challenge for interactive as well as automatic analysis and visualization tools. The thesis proposes a method of exploiting redundancy in the event traces in order to reduce the memory requirements and the computational complexity of event trace analysis. The sources of redundancy are repeated segments of the original program, either through iterative or recursive algorithms or through SPMD-style parallel programs, which produce equal or similar repeated event sequences. The data reduction technique is based on the novel Complete Call Graph (CCG) data structure which allows domain specific data compression for event traces in a combination of lossless and lossy methods. All deviations due to lossy data compression can be controlled by constant bounds. The compression of the CCG data structure is incorporated in the construction process, such that at no point substantial uncompressed parts have to be stored. Experiments with real-world example traces reveal the potential for very high data compression. The results range from factors of 3 to 15 for small scale compression with minimum deviation of the data to factors > 100 for large scale compression with moderate deviation. Based on the CCG data structure, new algorithms for the most common evaluation and analysis methods for event traces are presented, which require no explicit decompression. By avoiding repeated evaluation of formerly redundant event sequences, the computational effort of the new algorithms can be reduced in the same extent as memory consumption. The thesis includes a comprehensive discussion of the state-of-the-art and related work, a detailed presentation of the design of the CCG data structure, an elaborate description of algorithms for construction, compression, and analysis of CCGs, and an extensive experimental validation of all components.Diese Dissertation stellt einen neuartigen Ansatz für die Analyse und Visualisierung der Berechnungs-Performance vor, der auf dem Ereignis-Tracing basiert und insbesondere auf parallele Programme und das Hochleistungsrechnen (High Performance Computing, HPC) zugeschnitten ist. Ereignis-Traces (Ereignis-Spuren) enthalten detaillierte Informationen über spezifizierte Ereignisse während der Laufzeit eines Programms und erlauben eine sehr genaue Untersuchung des dynamischen Verhaltens, verschiedener Performance-Metriken und potentieller Performance-Probleme. Aufgrund lang laufender und hoch paralleler Anwendungen und dem hohen Detailgrad kann das Ereignis-Tracing sehr große Datenmengen produzieren. Diese stellen ihrerseits eine Herausforderung für interaktive und automatische Analyse- und Visualisierungswerkzeuge dar. Die vorliegende Arbeit präsentiert eine Methode, die Redundanzen in den Ereignis-Traces ausnutzt, um sowohl die Speicheranforderungen als auch die Laufzeitkomplexität der Trace-Analyse zu reduzieren. Die Ursachen für Redundanzen sind wiederholt ausgeführte Programmabschnitte, entweder durch iterative oder rekursive Algorithmen oder durch SPMD-Parallelisierung, die gleiche oder ähnliche Ereignis-Sequenzen erzeugen. Die Datenreduktion basiert auf der neuartigen Datenstruktur der "Vollständigen Aufruf-Graphen" (Complete Call Graph, CCG) und erlaubt eine Kombination von verlustfreier und verlustbehafteter Datenkompression. Dabei können konstante Grenzen für alle Abweichungen durch verlustbehaftete Kompression vorgegeben werden. Die Datenkompression ist in den Aufbau der Datenstruktur integriert, so dass keine umfangreichen unkomprimierten Teile vor der Kompression im Hauptspeicher gehalten werden müssen. Das enorme Kompressionsvermögen des neuen Ansatzes wird anhand einer Reihe von Beispielen aus realen Anwendungsszenarien nachgewiesen. Die dabei erzielten Resultate reichen von Kompressionsfaktoren von 3 bis 5 mit nur minimalen Abweichungen aufgrund der verlustbehafteten Kompression bis zu Faktoren > 100 für hochgradige Kompression. Basierend auf der CCG_Datenstruktur werden außerdem neue Auswertungs- und Analyseverfahren für Ereignis-Traces vorgestellt, die ohne explizite Dekompression auskommen. Damit kann die Laufzeitkomplexität der Analyse im selben Maß gesenkt werden wie der Hauptspeicherbedarf, indem komprimierte Ereignis-Sequenzen nicht mehrmals analysiert werden. Die vorliegende Dissertation enthält eine ausführliche Vorstellung des Stands der Technik und verwandter Arbeiten in diesem Bereich, eine detaillierte Herleitung der neu eingeführten Daten-strukturen, der Konstruktions-, Kompressions- und Analysealgorithmen sowie eine umfangreiche experimentelle Auswertung und Validierung aller Bestandteile

    WMTrace : a lightweight memory allocation tracker and analysis framework

    Get PDF
    The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes

    A Novel Framework for Online Amnesic Trajectory Compression in Resource-constrained Environments

    Full text link
    State-of-the-art trajectory compression methods usually involve high space-time complexity or yield unsatisfactory compression rates, leading to rapid exhaustion of memory, computation, storage and energy resources. Their ability is commonly limited when operating in a resource-constrained environment especially when the data volume (even when compressed) far exceeds the storage limit. Hence we propose a novel online framework for error-bounded trajectory compression and ageing called the Amnesic Bounded Quadrant System (ABQS), whose core is the Bounded Quadrant System (BQS) algorithm family that includes a normal version (BQS), Fast version (FBQS), and a Progressive version (PBQS). ABQS intelligently manages a given storage and compresses the trajectories with different error tolerances subject to their ages. In the experiments, we conduct comprehensive evaluations for the BQS algorithm family and the ABQS framework. Using empirical GPS traces from flying foxes and cars, and synthetic data from simulation, we demonstrate the effectiveness of the standalone BQS algorithms in significantly reducing the time and space complexity of trajectory compression, while greatly improving the compression rates of the state-of-the-art algorithms (up to 45%). We also show that the operational time of the target resource-constrained hardware platform can be prolonged by up to 41%. We then verify that with ABQS, given data volumes that are far greater than storage space, ABQS is able to achieve 15 to 400 times smaller errors than the baselines. We also show that the algorithm is robust to extreme trajectory shapes.Comment: arXiv admin note: substantial text overlap with arXiv:1412.032

    SimpleTrack:Adaptive Trajectory Compression with Deterministic Projection Matrix for Mobile Sensor Networks

    Full text link
    Some mobile sensor network applications require the sensor nodes to transfer their trajectories to a data sink. This paper proposes an adaptive trajectory (lossy) compression algorithm based on compressive sensing. The algorithm has two innovative elements. First, we propose a method to compute a deterministic projection matrix from a learnt dictionary. Second, we propose a method for the mobile nodes to adaptively predict the number of projections needed based on the speed of the mobile nodes. Extensive evaluation of the proposed algorithm using 6 datasets shows that our proposed algorithm can achieve sub-metre accuracy. In addition, our method of computing projection matrices outperforms two existing methods. Finally, comparison of our algorithm against a state-of-the-art trajectory compression algorithm show that our algorithm can reduce the error by 10-60 cm for the same compression ratio

    Hydrogen SI and HCCI Combustion in a Direct-Injection Optical Engine

    Get PDF
    Hydrogen has been largely proposed as a possible alternative fuel for internal combustion engines. Its wide flammability range allows higher engine efficiency with leaner operation than conventional fuels, for both reduced toxic emissions and no CO2 gases. Independently, Homogenous Charge Compression Ignition (HCCI) also allows higher thermal efficiency and lower fuel consumption with reduced NOX emissions when compared to Spark-Ignition (SI) engine operation. For HCCI combustion, a mixture of air and fuel is supplied to the cylinder and autoignition occurs from compression; engine is operated throttle-less and load is controlled by the quality of the mixture, avoiding the large fluid-dynamic losses in the intake manifold of SI engines. HCCI can be induced and controlled by varying the mixture temperature, either by Exhaust Gas Recirculation (EGR) or intake air pre-heating. A combination of HCCI combustion with hydrogen fuelling has great potential for virtually zero CO2 and NOX emissions. Nevertheless, combustion on such a fast burning fuel with wide flammability limits and high octane number implies many disadvantages, such as control of backfiring and speed of autoignition and there is almost no literature on the subject, particularly in optical engines. Experiments were conducted in a single-cylinder research engine equipped with both Port Fuel Injection (PFI) and Direct Injection (DI) systems running at 1000 RPM. Optical access to in-cylinder phenomena was enabled through an extended piston and optical crown. Combustion images were acquired by a high-speed camera at 1°or 2°crank angle resolution for a series of engine cycles. Spark-ignition tests were initially carried out to benchmark the operation of the engine with hydrogen against gasoline. DI of hydrogen after intake valve closure was found to be preferable in order to overcome problems related to backfiring and air displacement from hydrogens low density. HCCI combustion of hydrogen was initially enabled by means of a pilot port injection of n-heptane preceding the main direct injection of hydrogen, along with intake air preheating. Sole hydrogen fuelling HCCI was finally achieved and made sustainable, even at the low compression ratio of the optical engine by means of closed-valve DI, in synergy with air-pre-heating and negative valve overlap to promote internal EGR. Various operating conditions were analysed, such as fuelling in the range of air excess ratio 1.2-3.0 and intake air temperatures of 200-400°C. Finally, both single and double injections per cycle were compared to identify their effects on combustion development. Copyright © 2009 SAE International
    corecore