129,522 research outputs found
Memory usage verification using Hip/Sleek.
Embedded systems often come with constrained memory footprints. It is therefore essential to ensure that software running on such platforms fulfils memory usage specifications at compile-time, to prevent memory-related software failure after deployment. Previous proposals on memory usage verification are not satisfactory as they usually can only handle restricted subsets of programs, especially when shared mutable data structures are involved. In this paper, we propose a simple but novel solution. We instrument programs with explicit memory operations so that memory usage verification can be done along with the verification of other properties, using an automated verification system Hip/Sleek developed recently by Chin et al.[10,19]. The instrumentation can be done automatically and is proven sound with respect to an underlying semantics. One immediate benefit is that we do not need to develop from scratch a specific system for memory usage verification. Another benefit is that we can verify more programs, especially those involving shared mutable data structures, which previous systems failed to handle, as evidenced by our experimental results
tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads
Machine Learning applications on HPC systems have been gaining popularity in
recent years. The upcoming large scale systems will offer tremendous
parallelism for training through GPUs. However, another heavy aspect of Machine
Learning is I/O, and this can potentially be a performance bottleneck.
TensorFlow, one of the most popular Deep-Learning platforms, now offers a new
profiler interface and allows instrumentation of TensorFlow operations.
However, the current profiler only enables analysis at the TensorFlow platform
level and does not provide system-level information. In this paper, we extend
TensorFlow Profiler and introduce tf-Darshan, both a profiler and tracer, that
performs instrumentation through Darshan. We use the same Darshan shared
instrumentation library and implement a runtime attachment without using a
system preload. We can extract Darshan profiling data structures during
TensorFlow execution to enable analysis through the TensorFlow profiler. We
visualize the performance results through TensorBoard, the web-based TensorFlow
visualization tool. At the same time, we do not alter Darshan's existing
implementation. We illustrate tf-Darshan by performing two case studies on
ImageNet image and Malware classification. We show that by guiding optimization
using data from tf-Darshan, we increase POSIX I/O bandwidth by up to 19% by
selecting data for staging on fast tier storage. We also show that Darshan has
the potential of being used as a runtime library for profiling and providing
information for future optimization.Comment: Accepted for publication at the 2020 International Conference on
Cluster Computing (CLUSTER 2020
An Integrated Framework to Achieve Interoperability in Person-Centric Health Management
The need for high-quality out-of-hospital healthcare is a known socioeconomic problem. Exploiting ICT's evolution, ad-hoc telemedicine solutions have been proposed in the past. Integrating such ad-hoc solutions in order to cost-effectively support the entire healthcare cycle is still a research challenge. In order to handle the heterogeneity of relevant information and to overcome the fragmentation of out-of-hospital instrumentation in person-centric healthcare systems, a shared and open source interoperability component can be adopted, which is ontology driven and based on the semantic web data model. The feasibility and the advantages of the proposed approach are demonstrated by presenting the use case of real-time monitoring of patients' health and their environmental context
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
© 2015 IEEE.Concurrency errors, such as data races, make device drivers notoriously hard to develop and debug without automated tool support. We present Whoop, a new automated approach that statically analyzes drivers for data races. Whoop is empowered by symbolic pairwise lockset analysis, a novel analysis that can soundly detect all potential races in a driver. Our analysis avoids reasoning about thread interleavings and thus scales well. Exploiting the race-freedom guarantees provided by Whoop, we achieve a sound partial-order reduction that significantly accelerates Corral, an industrial-strength bug-finder for concurrent programs. Using the combination of Whoop and Corral, we analyzed 16 drivers from the Linux 4.0 kernel, achieving 1.5 - 20× speedups over standalone Corral
Instrumenting self-modifying code
Adding small code snippets at key points to existing code fragments is called
instrumentation. It is an established technique to debug certain otherwise hard
to solve faults, such as memory management issues and data races. Dynamic
instrumentation can already be used to analyse code which is loaded or even
generated at run time.With the advent of environments such as the Java Virtual
Machine with optimizing Just-In-Time compilers, a new obstacle arises:
self-modifying code. In order to instrument this kind of code correctly, one
must be able to detect modifications and adapt the instrumentation code
accordingly, preferably without incurring a high penalty speedwise. In this
paper we propose an innovative technique that uses the hardware page protection
mechanism of modern processors to detect such modifications. We also show how
an instrumentor can adapt the instrumented version depending on the kind of
modificiations as well as an experimental evaluation of said techniques.Comment: In M. Ronsse, K. De Bosschere (eds), proceedings of the Fifth
International Workshop on Automated Debugging (AADEBUG 2003), September 2003,
Ghent. cs.SE/030902
Non-intrusive on-the-fly data race detection using execution replay
This paper presents a practical solution for detecting data races in parallel
programs. The solution consists of a combination of execution replay (RecPlay)
with automatic on-the-fly data race detection. This combination enables us to
perform the data race detection on an unaltered execution (almost no probe
effect). Furthermore, the usage of multilevel bitmaps and snooped matrix clocks
limits the amount of memory used. As the record phase of RecPlay is highly
efficient, there is no need to switch it off, hereby eliminating the
possibility of Heisenbugs because tracing can be left on all the time.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AAdebug 2000), August 2000, Munich. cs.SE/001003
- …