17,706 research outputs found
LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses
System monitoring is an established tool to measure the utilization and
health of HPC systems. Usually system monitoring infrastructures make no
connection to job information and do not utilize hardware performance
monitoring (HPM) data. To increase the efficient use of HPC systems automatic
and continuous performance monitoring of jobs is an essential component. It can
help to identify pathological cases, provides instant performance feedback to
the users, offers initial data to judge on the optimization potential of
applications and helps to build a statistical foundation about application
specific system usage. The LIKWID monitoring stack is a modular framework build
on top of the LIKWID tools library. It aims on enabling job specific
performance monitoring using HPM data, system metrics and application-level
data for small to medium sized commodity clusters. Moreover, it is designed to
integrate in existing monitoring infrastructures to speed up the change from
pure system monitoring to job-aware monitoring.Comment: 4 pages, 4 figures. Accepted for HPCMASPA 2017, the Workshop on
Monitoring and Analysis for High Performance Computing Systems Plus
Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI,
September 5, 201
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images
Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images
of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL
maps are derived through computational staining using a convolutional neural network trained to
classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and
correlation with overall survival. TIL map structural patterns were grouped using standard
histopathological parameters. These patterns are enriched in particular T cell subpopulations
derived from molecular measures. TIL densities and spatial structure were differentially enriched
among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial
infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic
patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for
the TCGA image archives with insights into the tumor-immune microenvironment
Optimizing egalitarian performance in the side-effects model of colocation for data center resource management
In data centers, up to dozens of tasks are colocated on a single physical
machine. Machines are used more efficiently, but tasks' performance
deteriorates, as colocated tasks compete for shared resources. As tasks are
heterogeneous, the resulting performance dependencies are complex. In our
previous work [18] we proposed a new combinatorial optimization model that uses
two parameters of a task - its size and its type - to characterize how a task
influences the performance of other tasks allocated to the same machine.
In this paper, we study the egalitarian optimization goal: maximizing the
worst-off performance. This problem generalizes the classic makespan
minimization on multiple processors (P||Cmax). We prove that
polynomially-solvable variants of multiprocessor scheduling are NP-hard and
hard to approximate when the number of types is not constant. For a constant
number of types, we propose a PTAS, a fast approximation algorithm, and a
series of heuristics. We simulate the algorithms on instances derived from a
trace of one of Google clusters. Algorithms aware of jobs' types lead to better
performance compared with algorithms solving P||Cmax.
The notion of type enables us to model degeneration of performance caused by
using standard combinatorial optimization methods. Types add a layer of
additional complexity. However, our results - approximation algorithms and good
average-case performance - show that types can be handled efficiently.Comment: Author's version of a paper published in Euro-Par 2017 Proceedings,
extends the published paper with addtional results and proof
- …