401 research outputs found
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
EST-PAC HPC - a web portal for high-throughput EST annotation and protein sequence prediction
Expressed Sequence Tags (ESTs) are short DNA sequences generated by sequencing the transcribed cDNAs coming from a gene expression. They can provide significant functional, structural and evolutionary information and thus are a primary resource for gene discovery. EST annotation basically refers to the analysis of unknown ESTs that can be performed by database similarity search for possible identities and database search for functional prediction of translation products. Such kind of annotation typically consists of a series of repetitive tasks which should be automated, and be customizable and amenable to using distributed computing resources. Furthermore, processing of EST data should be done efficiently using a high performance computing platform. In this paper, we describe an EST annotator, EST-PACHPC, which has been developed for harnessing HPC resources potentially from Grid and Cloud systems for high throughput EST annotations. The performance analysis of EST-PACHPC has shown that it provides substantial performance gain in EST annotation.<br /
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Design and evaluation of a genomics variant analysis pipeline using GATK Spark tools
Scalable and efficient processing of genome sequence data, i.e. for variant
discovery, is key to the mainstream adoption of High Throughput technology for
disease prevention and for clinical use. Achieving scalability, however,
requires a significant effort to enable the parallel execution of the analysis
tools that make up the pipelines. This is facilitated by the new Spark versions
of the well-known GATK toolkit, which offer a black-box approach by
transparently exploiting the underlying Map Reduce architecture. In this paper
we report on our experience implementing a standard variant discovery pipeline
using GATK 4.0 with Docker-based deployment over a cluster. We provide a
preliminary performance analysis, comparing the processing times and cost to
those of the new Microsoft Genomics Services
On Evaluating Commercial Cloud Services: A Systematic Review
Background: Cloud Computing is increasingly booming in industry with many
competing providers and services. Accordingly, evaluation of commercial Cloud
services is necessary. However, the existing evaluation studies are relatively
chaotic. There exists tremendous confusion and gap between practices and theory
about Cloud services evaluation. Aim: To facilitate relieving the
aforementioned chaos, this work aims to synthesize the existing evaluation
implementations to outline the state-of-the-practice and also identify research
opportunities in Cloud services evaluation. Method: Based on a conceptual
evaluation model comprising six steps, the Systematic Literature Review (SLR)
method was employed to collect relevant evidence to investigate the Cloud
services evaluation step by step. Results: This SLR identified 82 relevant
evaluation studies. The overall data collected from these studies essentially
represent the current practical landscape of implementing Cloud services
evaluation, and in turn can be reused to facilitate future evaluation work.
Conclusions: Evaluation of commercial Cloud services has become a world-wide
research topic. Some of the findings of this SLR identify several research gaps
in the area of Cloud services evaluation (e.g., the Elasticity and Security
evaluation of commercial Cloud services could be a long-term challenge), while
some other findings suggest the trend of applying commercial Cloud services
(e.g., compared with PaaS, IaaS seems more suitable for customers and is
particularly important in industry). This SLR study itself also confirms some
previous experiences and reveals new Evidence-Based Software Engineering (EBSE)
lessons
- ā¦