1,703 research outputs found

    Bio-inspired call-stack reconstruction for performance analysis

    Get PDF
    The correlation of performance bottlenecks and their associated source code has become a cornerstone of performance analysis. It allows understanding why the efficiency of an application falls behind the computer's peak performance and enabling optimizations on the code ultimately. To this end, performance analysis tools collect the processor call-stack and then combine this information with measurements to allow the analyst comprehend the application behavior. Some tools modify the call-stack during run-time to diminish the collection expense but at the cost of resulting in non-portable solutions. In this paper, we present a novel portable approach to associate performance issues with their source code counterpart. To address it, we capture a reduced segment of the call-stack (up to three levels) and then process the segments using an algorithm inspired by multi-sequence alignment techniques. The results of our approach are easily mapped to detailed performance views, enabling the analyst to unveil the application behavior and its corresponding region of code. To demonstrate the usefulness of our approach, we have applied the algorithm to several first-time seen in-production applications to describe them finely, and optimize them by using tiny modifications based on the analyses.We thankfully acknowledge Mathis Bode for giving us access to the Arts CF binaries, and Miguel Castrillo and Kim Serradell for their valuable insight regarding Nemo. We would like to thank Forschungszentrum Jülich for the computation time on their Blue Gene/Q system. This research has been partially funded by the CICYT under contracts No. TIN2012-34557 and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

    Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster

    Full text link
    The NEMO High Performance Computing Cluster at the University of Freiburg has been made available to researchers of the ATLAS and CMS experiments. Users access the cluster from external machines connected to the World-wide LHC Computing Grid (WLCG). This paper describes how the full software environment of the WLCG is provided in a virtual machine image. The interplay between the schedulers for NEMO and for the external clusters is coordinated through the ROCED service. A cloud computing infrastructure is deployed at NEMO to orchestrate the simultaneous usage by bare metal and virtualized jobs. Through the setup, resources are provided to users in a transparent, automatized, and on-demand way. The performance of the virtualized environment has been evaluated for particle physics applications

    Nemo: a computational tool for analyzing nematode locomotion

    Get PDF
    The nematode Caenorhabditis elegans responds to an impressive range of chemical, mechanical and thermal stimuli and is extensively used to investigate the molecular mechanisms that mediate chemosensation, mechanotransduction and thermosensation. The main behavioral output of these responses is manifested as alterations in animal locomotion. Monitoring and examination of such alterations requires tools to capture and quantify features of nematode movement. In this paper, we introduce Nemo (nematode movement), a computationally efficient and robust two-dimensional object tracking algorithm for automated detection and analysis of C. elegans locomotion. This algorithm enables precise measurement and feature extraction of nematode movement components. In addition, we develop a Graphical User Interface designed to facilitate processing and interpretation of movement data. While, in this study, we focus on the simple sinusoidal locomotion of C. elegans, our approach can be readily adapted to handle complicated locomotory behaviour patterns by including additional movement characteristics and parameters subject to quantification. Our software tool offers the capacity to extract, analyze and measure nematode locomotion features by processing simple video files. By allowing precise and quantitative assessment of behavioral traits, this tool will assist the genetic dissection and elucidation of the molecular mechanisms underlying specific behavioral responses.Comment: 12 pages, 2 figures. accepted by BMC Neuroscience 2007, 8:8

    Proceedings of the 5th bwHPC Symposium

    Get PDF
    In modern science, the demand for more powerful and integrated research infrastructures is growing constantly to address computational challenges in data analysis, modeling and simulation. The bwHPC initiative, founded by the Ministry of Science, Research and the Arts and the universities in Baden-Württemberg, is a state-wide federated approach aimed at assisting scientists with mastering these challenges. At the 5th bwHPC Symposium in September 2018, scientific users, technical operators and government representatives came together for two days at the University of Freiburg. The symposium provided an opportunity to present scientific results that were obtained with the help of bwHPC resources. Additionally, the symposium served as a platform for discussing and exchanging ideas concerning the use of these large scientific infrastructures as well as its further development

    ECG-TCN: Wearable Cardiac Arrhythmia Detection with a Temporal Convolutional Network

    Full text link
    Personalized ubiquitous healthcare solutions require energy-efficient wearable platforms that provide an accurate classification of bio-signals while consuming low average power for long-term battery-operated use. Single lead electrocardiogram (ECG) signals provide the ability to detect, classify, and even predict cardiac arrhythmia. In this paper, we propose a novel temporal convolutional network (TCN) that achieves high accuracy while still being feasible for wearable platform use. Experimental results on the ECG5000 dataset show that the TCN has a similar accuracy (94.2%) score as the state-of-the-art (SoA) network while achieving an improvement of 16.5% in the balanced accuracy score. This accurate classification is done with 27 times fewer parameters and 37 times less multiply-accumulate operations. We test our implementation on two publicly available platforms, the STM32L475, which is based on ARM Cortex M4F, and the GreenWaves Technologies GAP8 on the GAPuino board, based on 1+8 RISC-V CV32E40P cores. Measurements show that the GAP8 implementation respects the real-time constraints while consuming 0.10 mJ per inference. With 9.91 GMAC/s/W, it is 23.0 times more energy-efficient and 46.85 times faster than an implementation on the ARM Cortex M4F (0.43 GMAC/s/W). Overall, we obtain 8.1% higher accuracy while consuming 19.6 times less energy and being 35.1 times faster compared to a previous SoA embedded implementation.Comment: 4 pages, 1 figure, 2 table

    A Survey on Handover Management in Mobility Architectures

    Full text link
    This work presents a comprehensive and structured taxonomy of available techniques for managing the handover process in mobility architectures. Representative works from the existing literature have been divided into appropriate categories, based on their ability to support horizontal handovers, vertical handovers and multihoming. We describe approaches designed to work on the current Internet (i.e. IPv4-based networks), as well as those that have been devised for the "future" Internet (e.g. IPv6-based networks and extensions). Quantitative measures and qualitative indicators are also presented and used to evaluate and compare the examined approaches. This critical review provides some valuable guidelines and suggestions for designing and developing mobility architectures, including some practical expedients (e.g. those required in the current Internet environment), aimed to cope with the presence of NAT/firewalls and to provide support to legacy systems and several communication protocols working at the application layer
    corecore