4,769 research outputs found
HIL: designing an exokernel for the data center
We propose a new Exokernel-like layer to allow mutually untrusting physically deployed services to efficiently share the resources of a data center. We believe that such a layer offers not only efficiency gains, but may also enable new economic models, new applications, and new security-sensitive uses. A prototype (currently in active use) demonstrates that the proposed layer is viable, and can support a variety of existing provisioning tools and use cases.Partial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation awards 1347525 and 1149232 as well as the several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or
The state of SQL-on-Hadoop in the cloud
Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud,
and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark.
The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines.
The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization.
The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some
providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under
the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat
de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
IMTeract Tool for Monitoring and Profiling HPC systems and applications
Energy usage of computing equipment is an important consideration and energy inefficiency of computer systems is identified as the single biggest obstacle to advances in computing. Research into low-energy computing products ranges from operating system codes, applications and energy-aware schedulers to cooling systems for data centres. To monitor energy consumption in data and HPC centres it is necessary to develop tools for measuring the energy usage of computer equipment and applications. We have developed power measuring apparatus and a tool, IMTeract, for measuring energy consumption of HPC applications. IMTeract was used for energy usage profiling of HPC clusters running FLUENT and DL-POLY software and a GPU cluster running different implementations of an FFT algorithm. Our experimental results are encouraging and suggest that the IMTeract tool can be used to measure the CPU, Memory, Disk I/O and Network I/O for an application or a process and report on the energy used
On a course on computer cluster configuration and administration
[EN] Computer clusters are today a cost-effective way of providing either high-performance and/or high-availability. The flexibility of their configuration aims to fit the needs of multiple environments, from small servers to SME and large Internet servers. For these reasons, their usage has expanded not only in academia but also in many companies. However, each environment needs a different Âżcluster flavourÂż. High-performance and high-throughput computing are required in universities and research centres while high-performance service and high-availability are usually reserved to use in companies. Despite this fact, most university cluster computing courses continue to cover only high-performance computing, usually ignoring other possibilities. In this paper, a master-level course which attempts to fill this gap is discussed.
It explores the different types of cluster computing as well as their functional basis, from a very practical point of view. As part of the teaching methodology, each student builds from scratch a computer cluster based on a virtualization tool. The entire process is designed to be scalable. The goal is to be able to apply it to an actual computer cluster with a larger number of nodes, such as those the students may subsequently encounter in their professional life.This work was supported in part by the Spanish Ministerio de Economia y Competitividad (MINECO) and by FEDER funds under Grant TIN2015-66972-C5-1-R.LĂłpez RodrĂguez, PJ.; Baydal Cardona, ME. (2017). On a course on computer cluster configuration and administration. Journal of Parallel and Distributed Computing. 105:127-137. https://doi.org/10.1016/j.jpdc.2017.01.009S12713710
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster
UTP High Performance Computing Cluster (HPCC) is a collection of computing nodes using commercially available hardware interconnected within a network to communicate among the nodes. This campus wide cluster is used by researchers from internal UTP and external parties to compute intensive applications. However, the HPCC has never been benchmarked before. It is imperative to carry out a performance study to measure the true computing ability of this cluster.
This project aims to test the performance of a campus wide computing cluster using a selected benchmarking tool, the High Performance Linkpack (HPL). HPL is selected as a result of comparative studies and analysis with other HPC performance benhmarking tool. The optimal configuration of parameters of the HPL benchmark will be determined and run in the cluster to obtain the best performance.
Through this research project, it is the hope of the author that the outcome of this research project will help to determine the peak potential performance of the computing cluste
- …