988 research outputs found

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    From Bare Metal to Virtual: Lessons Learned when a Supercomputing Institute Deploys its First Cloud

    Full text link
    As primary provider for research computing services at the University of Minnesota, the Minnesota Supercomputing Institute (MSI) has long been responsible for serving the needs of a user-base numbering in the thousands. In recent years, MSI---like many other HPC centers---has observed a growing need for self-service, on-demand, data-intensive research, as well as the emergence of many new controlled-access datasets for research purposes. In light of this, MSI constructed a new on-premise cloud service, named Stratus, which is architected from the ground up to easily satisfy data-use agreements and fill four gaps left by traditional HPC. The resulting OpenStack cloud, constructed from HPC-specific compute nodes and backed by Ceph storage, is designed to fully comply with controls set forth by the NIH Genomic Data Sharing Policy. Herein, we present twelve lessons learned during the ambitious sprint to take Stratus from inception and into production in less than 18 months. Important, and often overlooked, components of this timeline included the development of new leadership roles, staff and user training, and user support documentation. Along the way, the lessons learned extended well beyond the technical challenges often associated with acquiring, configuring, and maintaining large-scale systems.Comment: 8 pages, 5 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    HIL: designing an exokernel for the data center

    Full text link
    We propose a new Exokernel-like layer to allow mutually untrusting physically deployed services to efficiently share the resources of a data center. We believe that such a layer offers not only efficiency gains, but may also enable new economic models, new applications, and new security-sensitive uses. A prototype (currently in active use) demonstrates that the proposed layer is viable, and can support a variety of existing provisioning tools and use cases.Partial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation awards 1347525 and 1149232 as well as the several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or

    An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform

    Get PDF
    The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform

    High-Performance Cloud Computing: A View of Scientific Applications

    Full text link
    Scientific computing often requires the availability of a massive number of computers for performing large scale experiments. Traditionally, these needs have been addressed by using high-performance computing solutions and installed facilities such as clusters and super computers, which are difficult to setup, maintain, and operate. Cloud computing provides scientists with a completely new model of utilizing the computing infrastructure. Compute resources, storage resources, as well as applications, can be dynamically provisioned (and integrated within the existing infrastructure) on a pay per use basis. These resources can be released when they are no more needed. Such services are often offered within the context of a Service Level Agreement (SLA), which ensure the desired Quality of Service (QoS). Aneka, an enterprise Cloud computing solution, harnesses the power of compute resources by relying on private and public Clouds and delivers to users the desired QoS. Its flexible and service based infrastructure supports multiple programming paradigms that make Aneka address a variety of different scenarios: from finance applications to computational science. As examples of scientific computing in the Cloud, we present a preliminary case study on using Aneka for the classification of gene expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape

    Challenges and complexities in application of LCA approaches in the case of ICT for a sustainable future

    Get PDF
    In this work, three of many ICT-specific challenges of LCA are discussed. First, the inconsistency versus uncertainty is reviewed with regard to the meta-technological nature of ICT. As an example, the semiconductor technologies are used to highlight the complexities especially with respect to energy and water consumption. The need for specific representations and metric to separately assess products and technologies is discussed. It is highlighted that applying product-oriented approaches would result in abandoning or disfavoring of new technologies that could otherwise help toward a better world. Second, several believed-untouchable hot spots are highlighted to emphasize on their importance and footprint. The list includes, but not limited to, i) User Computer-Interfaces (UCIs), especially screens and displays, ii) Network-Computer Interlaces (NCIs), such as electronic and optical ports, and iii) electricity power interfaces. In addition, considering cross-regional social and economic impacts, and also taking into account the marketing nature of the need for many ICT's product and services in both forms of hardware and software, the complexity of End of Life (EoL) stage of ICT products, technologies, and services is explored. Finally, the impact of smart management and intelligence, and in general software, in ICT solutions and products is highlighted. In particular, it is observed that, even using the same technology, the significance of software could be highly variable depending on the level of intelligence and awareness deployed. With examples from an interconnected network of data centers managed using Dynamic Voltage and Frequency Scaling (DVFS) technology and smart cooling systems, it is shown that the unadjusted assessments could be highly uncertain, and even inconsistent, in calculating the management component's significance on the ICT impacts.Comment: 10 pages. Preprint/Accepted of a paper submitted to the ICT4S Conferenc
    • …
    corecore