988 research outputs found
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Recommended from our members
FABRIC: A National-Scale Programmable Experimental Network Infrastructure
FABRIC is a unique national research infrastructure to enable cutting-edge and exploratory research at-scale in networking, cybersecurity, distributed computing and storage systems, machine learning, and science applications. It is an everywhere-programmable nationwide instrument comprised of novel extensible network elements equipped with large amounts of compute and storage, interconnected by high speed, dedicated optical links. It will connect a number of specialized testbeds for cloud research (NSF Cloud testbeds CloudLab and Chameleon), for research beyond 5G technologies (Platforms for Advanced Wireless Research or PAWR), as well as production high-performance computing facilities and science instruments to create a rich fabric for a wide variety of experimental activities
From Bare Metal to Virtual: Lessons Learned when a Supercomputing Institute Deploys its First Cloud
As primary provider for research computing services at the University of
Minnesota, the Minnesota Supercomputing Institute (MSI) has long been
responsible for serving the needs of a user-base numbering in the thousands.
In recent years, MSI---like many other HPC centers---has observed a growing
need for self-service, on-demand, data-intensive research, as well as the
emergence of many new controlled-access datasets for research purposes. In
light of this, MSI constructed a new on-premise cloud service, named Stratus,
which is architected from the ground up to easily satisfy data-use agreements
and fill four gaps left by traditional HPC. The resulting OpenStack cloud,
constructed from HPC-specific compute nodes and backed by Ceph storage, is
designed to fully comply with controls set forth by the NIH Genomic Data
Sharing Policy.
Herein, we present twelve lessons learned during the ambitious sprint to take
Stratus from inception and into production in less than 18 months. Important,
and often overlooked, components of this timeline included the development of
new leadership roles, staff and user training, and user support documentation.
Along the way, the lessons learned extended well beyond the technical
challenges often associated with acquiring, configuring, and maintaining
large-scale systems.Comment: 8 pages, 5 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
HIL: designing an exokernel for the data center
We propose a new Exokernel-like layer to allow mutually untrusting physically deployed services to efficiently share the resources of a data center. We believe that such a layer offers not only efficiency gains, but may also enable new economic models, new applications, and new security-sensitive uses. A prototype (currently in active use) demonstrates that the proposed layer is viable, and can support a variety of existing provisioning tools and use cases.Partial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation awards 1347525 and 1149232 as well as the several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or
An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform
The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform
High-Performance Cloud Computing: A View of Scientific Applications
Scientific computing often requires the availability of a massive number of
computers for performing large scale experiments. Traditionally, these needs
have been addressed by using high-performance computing solutions and installed
facilities such as clusters and super computers, which are difficult to setup,
maintain, and operate. Cloud computing provides scientists with a completely
new model of utilizing the computing infrastructure. Compute resources, storage
resources, as well as applications, can be dynamically provisioned (and
integrated within the existing infrastructure) on a pay per use basis. These
resources can be released when they are no more needed. Such services are often
offered within the context of a Service Level Agreement (SLA), which ensure the
desired Quality of Service (QoS). Aneka, an enterprise Cloud computing
solution, harnesses the power of compute resources by relying on private and
public Clouds and delivers to users the desired QoS. Its flexible and service
based infrastructure supports multiple programming paradigms that make Aneka
address a variety of different scenarios: from finance applications to
computational science. As examples of scientific computing in the Cloud, we
present a preliminary case study on using Aneka for the classification of gene
expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape
Challenges and complexities in application of LCA approaches in the case of ICT for a sustainable future
In this work, three of many ICT-specific challenges of LCA are discussed.
First, the inconsistency versus uncertainty is reviewed with regard to the
meta-technological nature of ICT. As an example, the semiconductor technologies
are used to highlight the complexities especially with respect to energy and
water consumption. The need for specific representations and metric to
separately assess products and technologies is discussed. It is highlighted
that applying product-oriented approaches would result in abandoning or
disfavoring of new technologies that could otherwise help toward a better
world. Second, several believed-untouchable hot spots are highlighted to
emphasize on their importance and footprint. The list includes, but not limited
to, i) User Computer-Interfaces (UCIs), especially screens and displays, ii)
Network-Computer Interlaces (NCIs), such as electronic and optical ports, and
iii) electricity power interfaces. In addition, considering cross-regional
social and economic impacts, and also taking into account the marketing nature
of the need for many ICT's product and services in both forms of hardware and
software, the complexity of End of Life (EoL) stage of ICT products,
technologies, and services is explored. Finally, the impact of smart management
and intelligence, and in general software, in ICT solutions and products is
highlighted. In particular, it is observed that, even using the same
technology, the significance of software could be highly variable depending on
the level of intelligence and awareness deployed. With examples from an
interconnected network of data centers managed using Dynamic Voltage and
Frequency Scaling (DVFS) technology and smart cooling systems, it is shown that
the unadjusted assessments could be highly uncertain, and even inconsistent, in
calculating the management component's significance on the ICT impacts.Comment: 10 pages. Preprint/Accepted of a paper submitted to the ICT4S
Conferenc
- …