296 research outputs found

    Kernel and user level execution trace analysis for multi-core distributed Linux systems

    Get PDF
    The presentation will briefly introduce the organization of the DORSAL laboratory and the collaborative research and development projects undertaken with the financial support of industrial partners, including Ericsson, and the Canadian and Quebec governments. Then, several recent results in the area of kernel and user level tracing and trace analysis will be outlined, with emphasis on real-time, distributed, multi-core and heterogeneous Linux systems. Different widely used open-source tools developed collaboratively in the DORSAL laboratory, such as LTTng, Trace Compass and the Common Trace Format, will be described as well as how they were exploited to quickly find the critical path and uncover hard to find problems in several real industrial systems

    Fast Recompilation of Object Oriented Modules

    Full text link
    Once a program file is modified, the recompilation time should be minimized, without sacrificing execution speed or high level object oriented features. The recompilation time is often a problem for the large graphical interactive distributed applications tackled by modern OO languages. A compilation server and fast code generator were developed and integrated with the SRC Modula-3 compiler and Linux ELF dynamic linker. The resulting compilation and recompilation speedups are impressive. The impact of different language features, processor speed, and application size are discussed

    Disks, Partitions, Volumes and RAID Performance with the Linux Operating System

    Full text link
    Block devices in computer operating systems typically correspond to disks or disk partitions, and are used to store files in a filesystem. Disks are not the only real or virtual device which adhere to the block accessible stream of bytes block device model. Files, remote devices, or even RAM may be used as a virtual disks. This article examines several common combinations of block device layers used as virtual disks in the Linux operating system: disk partitions, loopback files, software RAID, Logical Volume Manager, and Network Block Devices. It measures their relative performance using different filesystems: Ext2, Ext3, ReiserFS, JFS, XFS,NFS

    virtFlow: guest independent execution flow analysis across virtualized environments

    Get PDF
    An agent-less technique to understand virtual machines (VMs) behavior and their changes during the VM life-cycle is essential for many performance analysis and debugging tasks in the cloud environment. Because of privacy and security issues, ease of deployment and execution overhead, the method preferably limits its data collection to the physical host level, without internal access to the VMs. We propose a host-based, precise method to recover execution flow of virtualized environments, regardless of the level of virtualization. Given a VM, the Any-Level VM Detection Algorithm (ADA) and Nested VM State Detection (NSD) Algorithm compute its execution path along with the state of virtual CPUs (vCPUs) from the host kernel trace. The state of vCPUs is displayed in an interactive trace viewer (TraceCompass) for further inspection. Then, a new approach for profiling threads and processes inside the VMs is proposed. Our proposed VM trace analysis algorithms have been open-sourced for further enhancements and to the benefit of other developers. Our new techniques are being evaluated with workloads generated by different benchmarking tools. These approaches are based on host hypervisor tracing, which brings a lower overhead (around 1%) as compared to other approaches

    Linux Low-Latency Tracing for Multicore Hard Real-Time Systems

    Get PDF
    Real-time systems have always been difficult to monitor and debug because of the timing constraints which rule out any tool significantly impacting the system latency and performance. Tracing is often the most reliable tool available for studying real-time systems. The real-time behavior of Linux systems has improved recently and it is possible to have latencies in the low microsecond range. Therefore, tracers must ensure that their overhead is within that range and predictable and scales well to multiple cores. The LTTng 2.0 tools have been optimized for multicore performance, scalability, and flexibility. We used and extended the real-time verification tool rteval to study the impact of LTTng on the maximum latency on hard real-time applications. We introduced a new real-time analysis tool to establish the baseline of real-time system performance and then to measure the impact added by tracing the kernel and userspace (UST) with LTTng. We then identified latency problems and accordingly modified LTTng-UST and the procedure to isolate the shielded real-time cores from the RCU interprocess synchronization routines. This work resulted in extended tools to measure the real-time properties of multicore Linux systems, a characterization of the impact of LTTng kernel and UST tracing tools, and improvements to LTTng

    Tracing and profiling machine learning dataflow applications on GPU

    Get PDF
    In this paper, we propose a profiling and tracing method for dataflow applications with GPU acceleration. Dataflow models can be represented by graphs and are widely used in many domains like signal processing or machine learning. Within the graph, the data flows along the edges, and the nodes correspond to the computing units that process the data. To accelerate the execution, some co-processing units, like GPUs, are often used for computing intensive nodes. The work in this paper aims at providing useful information about the execution of the dataflow graph on the available hardware, in order to understand and possibly improve the performance. The collected traces include low-level information about the CPU, from the Linux Kernel (system calls), as well as mid-level and high-level information respectively about intermediate libraries like CUDA, HIP or HSA, and the dataflow model. This is followed by post-mortem analysis and visualization steps in order to enhance the trace and show useful information to the user. To demonstrate the effectiveness of the method, it was evaluated for TensorFlow, a well-known machine learning library that uses a dataflow computational graph to represent the algorithms. We present a few examples of machine learning applications that can be optimized with the help of the information provided by our proposed method. For example, we reduce the execution time of a face recognition application by a factor of 5X. We suggest a better placement of the computation nodes on the available hardware components for a distributed application. Finally, we also enhance the memory management of an application to speed up the execution

    Hypertracing: Tracing through virtualization layers

    Get PDF
    Cloud computing enables on-demand access to remote computing resources. It provides dynamic scalability and elasticity with a low upfront cost. As the adoption of this computing model is rapidly growing, this increases the system complexity, since virtual machines (VMs) running on multiple virtualization layers become very difficult to monitor without interfering with their performance. In this paper, we present hypertracing, a novel method for tracing VMs by using various paravirtualization techniques, enabling efficient monitoring across virtualization boundaries. Hypertracing is a monitoring infrastructure that facilitates seamless trace sharing among host and guests. Our toolchain can detect latencies and their root causes within VMs, even for boot-up and shutdown sequences, whereas existing tools fail to handle these cases. We propose a new hypervisor optimization, for handling efficient nested paravirtualization, which allows hypertracing to be enabled in any nested environment without triggering VM exit multiplication. This is a significant improvement over current monitoring tools, with their large I/O overhead associated with activating monitoring within each virtualization layer

    ros2_tracing: Multipurpose Low-Overhead Framework for Real-Time Tracing of ROS 2

    Full text link
    Testing and debugging have become major obstacles for robot software development, because of high system complexity and dynamic environments. Standard, middleware-based data recording does not provide sufficient information on internal computation and performance bottlenecks. Other existing methods also target very specific problems and thus cannot be used for multipurpose analysis. Moreover, they are not suitable for real-time applications. In this paper, we present ros2_tracing, a collection of flexible tracing tools and multipurpose instrumentation for ROS 2. It allows collecting runtime execution information on real-time distributed systems, using the low-overhead LTTng tracer. Tools also integrate tracing into the invaluable ROS 2 orchestration system and other usability tools. A message latency experiment shows that the end-to-end message latency overhead, when enabling all ROS 2 instrumentation, is on average 0.0033 ms, which we believe is suitable for production real-time systems. ROS 2 execution information obtained using ros2_tracing can be combined with trace data from the operating system, enabling a wider range of precise analyses, that help understand an application execution, to find the cause of performance bottlenecks and other issues. The source code is available at: https://github.com/ros2/ros2_tracing.Comment: 8 pages, 8 figures, 3 table

    A Stateful Approach to Generate Synthetic Events from Kernel Traces

    Get PDF
    We propose a generic synthetic event generator from kernel trace events. The proposed method makes use of patterns of system states and environment-independent semantic events rather than platform-specific raw events. This method can be applied to different kernel and user level trace formats. We use a state model to store intermediate states and events. This stateful method supports partial trace abstraction and enables users to seek and navigate through the trace events and to abstract out the desired part. Since it uses the current and previous values of the system states and has more knowledge of the underlying system execution, it can generate a wide range of synthetic events. One of the obvious applications of this method is the identification of system faults and problems that will appear later in this paper. We will discuss the architecture of the method, its implementation, and the performance results

    Vnode: Low-overhead Transparent Tracing of Node.js-based Microservice Architectures

    Full text link
    Tracing serves as a key method for evaluating the performance of microservices-based architectures, which are renowned for their scalability, resource efficiency, and high availability. Despite their advantages, these architectures often pose unique debugging challenges that necessitate trade-offs, including the burden of instrumentation overhead. With Node.js emerging as a leading development environment, recognized for its rapidly growing ecosystem, there is a pressing need for innovative approaches that reduce the telemetry data collection efforts, and the overhead incurred by the environment instrumentation. In response, we introduce a new approach designed for transparent tracing and seamless deployment of microservices in cloud settings. This approach is centered around our newly developed Internal Transparent Tracing and Context Reconstruction (ITTCR) algorithm. ITTCR is adept at correlating internal metrics from various distributed trace files, to reconstruct the intricate execution contexts of microservices operating in a Node.js environment. Our method achieves transparency by directly instrumenting the Node.js virtual machine, enabling the collection and analysis of trace events in a transparent manner. This process facilitates the creation of visualization tools, enhancing the understanding and analysis of microservice performance in cloud environments
    • …
    corecore