1,703 research outputs found
Bio-inspired call-stack reconstruction for performance analysis
The correlation of performance bottlenecks and their associated source code has become a cornerstone of performance analysis. It allows understanding why the efficiency of an application falls behind the computer's peak performance and enabling optimizations on the code ultimately. To this end, performance analysis tools collect the processor call-stack and then combine this information with measurements to allow the analyst comprehend the application behavior. Some tools modify the call-stack during run-time to diminish the collection expense but at the cost of resulting in non-portable solutions. In this paper, we present a novel portable approach to associate performance issues with their source code counterpart. To address it, we capture a reduced segment of the call-stack (up to three levels) and then process the segments using an algorithm inspired by multi-sequence alignment techniques. The results of our approach are easily mapped to detailed performance views, enabling the analyst to unveil the application behavior and its corresponding region of code. To demonstrate the usefulness of our approach, we have applied the algorithm to several first-time seen in-production applications to describe them finely, and optimize them by using tiny modifications based on the analyses.We thankfully acknowledge Mathis Bode for giving us access to the Arts CF binaries, and Miguel Castrillo and Kim Serradell for their valuable insight regarding Nemo. We would like to thank Forschungszentrum Jülich for the computation time on their Blue Gene/Q system. This research has been partially funded by the CICYT under contracts No. TIN2012-34557 and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster
The NEMO High Performance Computing Cluster at the University of Freiburg has
been made available to researchers of the ATLAS and CMS experiments. Users
access the cluster from external machines connected to the World-wide LHC
Computing Grid (WLCG). This paper describes how the full software environment
of the WLCG is provided in a virtual machine image. The interplay between the
schedulers for NEMO and for the external clusters is coordinated through the
ROCED service. A cloud computing infrastructure is deployed at NEMO to
orchestrate the simultaneous usage by bare metal and virtualized jobs. Through
the setup, resources are provided to users in a transparent, automatized, and
on-demand way. The performance of the virtualized environment has been
evaluated for particle physics applications
Nemo: a computational tool for analyzing nematode locomotion
The nematode Caenorhabditis elegans responds to an impressive range of
chemical, mechanical and thermal stimuli and is extensively used to investigate
the molecular mechanisms that mediate chemosensation, mechanotransduction and
thermosensation. The main behavioral output of these responses is manifested as
alterations in animal locomotion. Monitoring and examination of such
alterations requires tools to capture and quantify features of nematode
movement. In this paper, we introduce Nemo (nematode movement), a
computationally efficient and robust two-dimensional object tracking algorithm
for automated detection and analysis of C. elegans locomotion. This algorithm
enables precise measurement and feature extraction of nematode movement
components. In addition, we develop a Graphical User Interface designed to
facilitate processing and interpretation of movement data. While, in this
study, we focus on the simple sinusoidal locomotion of C. elegans, our approach
can be readily adapted to handle complicated locomotory behaviour patterns by
including additional movement characteristics and parameters subject to
quantification. Our software tool offers the capacity to extract, analyze and
measure nematode locomotion features by processing simple video files. By
allowing precise and quantitative assessment of behavioral traits, this tool
will assist the genetic dissection and elucidation of the molecular mechanisms
underlying specific behavioral responses.Comment: 12 pages, 2 figures. accepted by BMC Neuroscience 2007, 8:8
Proceedings of the 5th bwHPC Symposium
In modern science, the demand for more powerful and integrated research
infrastructures is growing constantly to address computational challenges
in data analysis, modeling and simulation. The bwHPC initiative, founded
by the Ministry of Science, Research and the Arts and the universities in
Baden-Württemberg, is a state-wide federated approach aimed at assisting
scientists with mastering these challenges. At the 5th bwHPC Symposium
in September 2018, scientific users, technical operators and government
representatives came together for two days at the University of Freiburg. The
symposium provided an opportunity to present scientific results that were
obtained with the help of bwHPC resources. Additionally, the symposium served
as a platform for discussing and exchanging ideas concerning the use of these
large scientific infrastructures as well as its further development
ECG-TCN: Wearable Cardiac Arrhythmia Detection with a Temporal Convolutional Network
Personalized ubiquitous healthcare solutions require energy-efficient
wearable platforms that provide an accurate classification of bio-signals while
consuming low average power for long-term battery-operated use. Single lead
electrocardiogram (ECG) signals provide the ability to detect, classify, and
even predict cardiac arrhythmia. In this paper, we propose a novel temporal
convolutional network (TCN) that achieves high accuracy while still being
feasible for wearable platform use. Experimental results on the ECG5000 dataset
show that the TCN has a similar accuracy (94.2%) score as the state-of-the-art
(SoA) network while achieving an improvement of 16.5% in the balanced accuracy
score. This accurate classification is done with 27 times fewer parameters and
37 times less multiply-accumulate operations. We test our implementation on two
publicly available platforms, the STM32L475, which is based on ARM Cortex M4F,
and the GreenWaves Technologies GAP8 on the GAPuino board, based on 1+8 RISC-V
CV32E40P cores. Measurements show that the GAP8 implementation respects the
real-time constraints while consuming 0.10 mJ per inference. With 9.91
GMAC/s/W, it is 23.0 times more energy-efficient and 46.85 times faster than an
implementation on the ARM Cortex M4F (0.43 GMAC/s/W). Overall, we obtain 8.1%
higher accuracy while consuming 19.6 times less energy and being 35.1 times
faster compared to a previous SoA embedded implementation.Comment: 4 pages, 1 figure, 2 table
A Survey on Handover Management in Mobility Architectures
This work presents a comprehensive and structured taxonomy of available
techniques for managing the handover process in mobility architectures.
Representative works from the existing literature have been divided into
appropriate categories, based on their ability to support horizontal handovers,
vertical handovers and multihoming. We describe approaches designed to work on
the current Internet (i.e. IPv4-based networks), as well as those that have
been devised for the "future" Internet (e.g. IPv6-based networks and
extensions). Quantitative measures and qualitative indicators are also
presented and used to evaluate and compare the examined approaches. This
critical review provides some valuable guidelines and suggestions for designing
and developing mobility architectures, including some practical expedients
(e.g. those required in the current Internet environment), aimed to cope with
the presence of NAT/firewalls and to provide support to legacy systems and
several communication protocols working at the application layer
- …