1,455 research outputs found
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
Why High-Performance Modelling and Simulation for Big Data Applications Matters
Modelling and Simulation (M&S) offer adequate abstractions to manage the complexity of analysing big data in scientific and engineering domains. Unfortunately, big data problems are often not easily amenable to efficient and effective use of High Performance Computing (HPC) facilities and technologies. Furthermore, M&S communities typically lack the detailed expertise required to exploit the full potential of HPC solutions while HPC specialists may not be fully aware of specific modelling and simulation requirements and applications. The COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications has created a strategic framework to foster interaction between M&S experts from various application domains on the one hand and HPC experts on the other hand to develop effective solutions for big data applications. One of the tangible outcomes of the COST Action is a collection of case studies from various computing domains. Each case study brought together both HPC and M&S experts, giving witness of the effective cross-pollination facilitated by the COST Action. In this introductory article we argue why joining forces between M&S and HPC communities is both timely in the big data era and crucial for success in many application domains. Moreover, we provide an overview on the state of the art in the various research areas concerned
GOPHER, an HPC framework for large scale graph exploration and inference
Biological ontologies, such as the Human Phenotype Ontology (HPO) and the Gene Ontology (GO), are extensively used in biomedical research to investigate the complex relationship that exists between the phenome and the genome. The interpretation of the encoded information requires methods that efficiently interoperate between multiple ontologies providing molecular details of disease-related features. To this aim, we present GenOtype PHenotype ExplOrer (GOPHER), a framework to infer associations between HPO and GO terms harnessing machine learning and large-scale parallelism and scalability in High-Performance Computing. The method enables to map genotypic features to phenotypic features thus providing a valid tool for bridging functional and pathological annotations. GOPHER can improve the interpretation of molecular processes involved in pathological conditions, displaying a vast range of applications in biomedicine.This work has been developed with the support of the Severo Ochoa Program (SEV-2015-0493); the Spanish Ministry of Science and Innovation (TIN2015- 65316-P); and the Joint Study Agreement no. W156463 under the IBM/BSC Deep Learning Center agreement.Peer ReviewedPostprint (author's final draft
The role of concurrency in an evolutionary view of programming abstractions
In this paper we examine how concurrency has been embodied in mainstream
programming languages. In particular, we rely on the evolutionary talking
borrowed from biology to discuss major historical landmarks and crucial
concepts that shaped the development of programming languages. We examine the
general development process, occasionally deepening into some language, trying
to uncover evolutionary lineages related to specific programming traits. We
mainly focus on concurrency, discussing the different abstraction levels
involved in present-day concurrent programming and emphasizing the fact that
they correspond to different levels of explanation. We then comment on the role
of theoretical research on the quest for suitable programming abstractions,
recalling the importance of changing the working framework and the way of
looking every so often. This paper is not meant to be a survey of modern
mainstream programming languages: it would be very incomplete in that sense. It
aims instead at pointing out a number of remarks and connect them under an
evolutionary perspective, in order to grasp a unifying, but not simplistic,
view of the programming languages development process
Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures
The protein-folding problem has been extensively studied during the last
fifty years. The understanding of the dynamics of global shape of a protein and the influence
on its biological function can help us to discover new and more effective
drugs to deal with diseases of pharmacological relevance. Different computational approaches
have been developed by different researchers in order to foresee the threedimensional
arrangement of atoms of proteins from their sequences. However, the
computational complexity of this problem makes mandatory the search for new models,
novel algorithmic strategies and hardware platforms that provide solutions in a
reasonable time frame. We present in this revision work the past and last tendencies
regarding protein folding simulations from both perspectives; hardware and software.
Of particular interest to us are both the use of inexact solutions to this computationally hard problem as
well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció
New technologies to bridge the gap between High Performance Computing (HPC) and Big Data
The unification of HPC and Big Data has received increasing attention in the
last years. It is a common belief that exascale computing and Big Data are closely associated since HPC requires
processing large-scale data from scientific instruments and simulations. But, at the same time, it was observed that
tools and cultures of HPC and Big Data communities differ significantly. One of the most important issues in the
path to the convergence is caused by the differences in their software stacks. This thesis will address the research
challenge of bridging the gap between Big Data and HPC worlds. With this goal in mind, a set of tools and
technologies will be developed and integrated into a new unified Big Data-HPC framework that will allow the
execution of scientific multi-language applications on both environments using containers
- …