4,769 research outputs found

    HIL: designing an exokernel for the data center

    Full text link
    We propose a new Exokernel-like layer to allow mutually untrusting physically deployed services to efficiently share the resources of a data center. We believe that such a layer offers not only efficiency gains, but may also enable new economic models, new applications, and new security-sensitive uses. A prototype (currently in active use) demonstrates that the proposed layer is viable, and can support a variety of existing provisioning tools and use cases.Partial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation awards 1347525 and 1149232 as well as the several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or

    The state of SQL-on-Hadoop in the cloud

    Get PDF
    Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    IMTeract Tool for Monitoring and Profiling HPC systems and applications

    Get PDF
    Energy usage of computing equipment is an important consideration and energy inefficiency of computer systems is identified as the single biggest obstacle to advances in computing. Research into low-energy computing products ranges from operating system codes, applications and energy-aware schedulers to cooling systems for data centres. To monitor energy consumption in data and HPC centres it is necessary to develop tools for measuring the energy usage of computer equipment and applications. We have developed power measuring apparatus and a tool, IMTeract, for measuring energy consumption of HPC applications. IMTeract was used for energy usage profiling of HPC clusters running FLUENT and DL-POLY software and a GPU cluster running different implementations of an FFT algorithm. Our experimental results are encouraging and suggest that the IMTeract tool can be used to measure the CPU, Memory, Disk I/O and Network I/O for an application or a process and report on the energy used

    On a course on computer cluster configuration and administration

    Full text link
    [EN] Computer clusters are today a cost-effective way of providing either high-performance and/or high-availability. The flexibility of their configuration aims to fit the needs of multiple environments, from small servers to SME and large Internet servers. For these reasons, their usage has expanded not only in academia but also in many companies. However, each environment needs a different Âżcluster flavourÂż. High-performance and high-throughput computing are required in universities and research centres while high-performance service and high-availability are usually reserved to use in companies. Despite this fact, most university cluster computing courses continue to cover only high-performance computing, usually ignoring other possibilities. In this paper, a master-level course which attempts to fill this gap is discussed. It explores the different types of cluster computing as well as their functional basis, from a very practical point of view. As part of the teaching methodology, each student builds from scratch a computer cluster based on a virtualization tool. The entire process is designed to be scalable. The goal is to be able to apply it to an actual computer cluster with a larger number of nodes, such as those the students may subsequently encounter in their professional life.This work was supported in part by the Spanish Ministerio de Economia y Competitividad (MINECO) and by FEDER funds under Grant TIN2015-66972-C5-1-R.LĂłpez RodrĂ­guez, PJ.; Baydal Cardona, ME. (2017). On a course on computer cluster configuration and administration. Journal of Parallel and Distributed Computing. 105:127-137. https://doi.org/10.1016/j.jpdc.2017.01.009S12713710

    Survey and Analysis of Production Distributed Computing Infrastructures

    Full text link
    This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

    The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster

    Get PDF
    UTP High Performance Computing Cluster (HPCC) is a collection of computing nodes using commercially available hardware interconnected within a network to communicate among the nodes. This campus wide cluster is used by researchers from internal UTP and external parties to compute intensive applications. However, the HPCC has never been benchmarked before. It is imperative to carry out a performance study to measure the true computing ability of this cluster. This project aims to test the performance of a campus wide computing cluster using a selected benchmarking tool, the High Performance Linkpack (HPL). HPL is selected as a result of comparative studies and analysis with other HPC performance benhmarking tool. The optimal configuration of parameters of the HPL benchmark will be determined and run in the cluster to obtain the best performance. Through this research project, it is the hope of the author that the outcome of this research project will help to determine the peak potential performance of the computing cluste
    • …
    corecore