1,869 research outputs found
Large Scale In Silico Screening on Grid Infrastructures
Large-scale grid infrastructures for in silico drug discovery open
opportunities of particular interest to neglected and emerging diseases. In
2005 and 2006, we have been able to deploy large scale in silico docking within
the framework of the WISDOM initiative against Malaria and Avian Flu requiring
about 105 years of CPU on the EGEE, Auvergrid and TWGrid infrastructures. These
achievements demonstrated the relevance of large-scale grid infrastructures for
the virtual screening by molecular docking. This also allowed evaluating the
performances of the grid infrastructures and to identify specific issues raised
by large-scale deployment.Comment: 14 pages, 2 figures, 2 tables, The Third International Life Science
Grid Workshop, LSGrid 2006, Yokohama, Japan, 13-14 october 2006, to appear in
the proceeding
Virtual Screening on Large Scale Grids
PCSV, article in press in Parallel ComputingLarge scale grids for in silico drug discovery open opportunities of particular interest to neglected and emerging diseases. In 2005 and 2006, we have been able to deploy large scale virtual docking within the framework of the WISDOM initiative against malaria and avian influenza requiring about 100 years of CPU on the EGEE, Auvergrid and TWGrid infrastructures. These achievements demonstrated the relevance of large scale grids for the virtual screening by molecular docking. This also allowed evaluating the performances of the grid infrastructures and to identify specific issues raised by large scale deployment
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and
platforms, and ubiquitous connectivity demand new approaches to computation in
the sciences. These new approaches must enable computation to be mobile, so
that, for example, it can occur near data, be triggered by events (e.g.,
arrival of new data), be offloaded to specialized accelerators, or run remotely
where resources are available. They also require new design approaches in which
monolithic applications can be decomposed into smaller components, that may in
turn be executed separately and on the most suitable resources. To address
these needs we present funcX---a distributed function as a service (FaaS)
platform that enables flexible, scalable, and high performance remote function
execution. funcX's endpoint software can transform existing clouds, clusters,
and supercomputers into function serving systems, while funcX's cloud-hosted
service provides transparent, secure, and reliable function execution across a
federated ecosystem of endpoints. We motivate the need for funcX with several
scientific case studies, present our prototype design and implementation, show
optimizations that deliver throughput in excess of 1 million functions per
second, and demonstrate, via experiments on two supercomputers, that funcX can
scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap
with arXiv:1908.0490
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
WISDOM: A Grid-Enabled Drug Discovery Initiative Against Malaria
The goal of this chapter is to present the WISDOM initiative, which is one of
the main accomplishments in the use of grids for biomedical sciences
achieved on grid infrastructures in Europe. Researchers in life sciences are
among the most active scientifi c communities on the EGEE infrastructure.
As a consequence, the biomedical virtual organization stands fourth in
terms of resources consumed in 2007, with an average of 7000 jobs submitted
every day to the grid and more than 4 million hours of CPU consumed in
the last 12 months. Only three experiments on the CERN Large Hadron
Collider have used more resources. Compared to particle physics, the use of
resources is much less centralized as about 40 different scientifi c applications
are now currently deployed on EGEE. Each of them requires an amount
of CPU which ranges from a few to a few hundred CPU years. Thanks to the
20,000 processors available to the users of the biomedical virtual organization,
crunching factors in the hundreds are witnessed routinely. Such
performances were already achieved on supercomputers but at the cost of
reservation and long delays in the access to resources. On the contrary, grid
infrastructures are constantly open to the user communities.
Such changes in the scale of the computing resources made continuously
available to the researchers in biomedical sciences open opportunities for
exploring new fi elds or changing the approach to existing challenges. In
this chapter, we would like to show the potential impact of grids in the fi eld
of drug discovery through the example of the WISDOM initiative
- …