22,691 research outputs found

    Exploring Scientific Application Performance Using Large Scale Object Storage

    Full text link
    One of the major performance and scalability bottlenecks in large scientific applications is parallel reading and writing to supercomputer I/O systems. The usage of parallel file systems and consistency requirements of POSIX, that all the traditional HPC parallel I/O interfaces adhere to, pose limitations to the scalability of scientific applications. Object storage is a widely used storage technology in cloud computing and is more frequently proposed for HPC workload to address and improve the current scalability and performance of I/O in scientific applications. While object storage is a promising technology, it is still unclear how scientific applications will use object storage and what the main performance benefits will be. This work addresses these questions, by emulating an object storage used by a traditional scientific application and evaluating potential performance benefits. We show that scientific applications can benefit from the usage of object storage on large scales.Comment: Preprint submitted to WOPSSS workshop at ISC 201

    Extending Science Gateway Frameworks to Support Big Data Applications in the Cloud

    Get PDF
    Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new oppor-tunities for application developers. This paper investigates how workflow sys-tems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data

    A cloudification methodology for multidimensional analysis: Implementation and application to a railway power simulator

    Get PDF
    Many scientific areas make extensive use of computer simulations to study complex real-world processes. These computations are typically very resource-intensive and present scalability issues as experiments get larger even in dedicated clusters, since these are limited by their own hardware resources. Cloud computing raises as an option to move forward into the ideal unlimited scalability by providing virtually infinite resources, yet applications must be adapted to this new paradigm. This process of converting and/or migrating an application and its data in order to make use of cloud computing is sometimes known as cloudifying the application. We propose a generalist cloudification method based in the MapReduce paradigm to migrate scientific simulations into the cloud to provide greater scalability. We analysed its viability by applying it to a real-world railway power consumption simulatior and running the resulting implementation on Hadoop YARN over Amazon EC2. Our tests show that the cloudified application is highly scalable and there is still a large margin to improve the theoretical model and its implementations, and also to extend it to a wider range of simulations. We also propose and evaluate a multidimensional analysis tool based on the cloudified application. It generates, executes and evaluates several experiments in parallel, for the same simulation kernel. The results we obtained indicate that out methodology is suitable for resource intensive simulations and multidimensional analysis, as it improves infrastructure’s utilization, efficiency and scalability when running many complex experiments.This work has been partially funded under the grant TIN2013-41350-P of the Spanish Ministry of Economics and Competitiveness, and the COST Action IC1305 "Network for Sustainable Ultrascale Computing Platforms" (NESUS)

    ElasticBroker: Combining HPC with Cloud to Provide Realtime Insights into Simulations

    Get PDF
    For large-scale scientific simulations, it is expensive to store raw simulation results to perform post-analysis. To minimize expensive I/O, "in-situ" analysis is often used, where analysis applications are tightly coupled with scientific simulations and can access and process the simulation results in memory. Increasingly, scientific domains employ Big Data approaches to analyze simulations for scientific discoveries. However, it remains a challenge to organize, transform, and transport data at scale between the two semantically different ecosystems (HPC and Cloud systems). In an effort to address these challenges, we design and implement the ElasticBroker software framework, which bridges HPC and Cloud applications to form an "in-situ" scientific workflow. Instead of writing simulation results to parallel file systems, ElasticBroker performs data filtering, aggregation, and format conversions to close the gap between an HPC ecosystem and a distinct Cloud ecosystem. To achieve this goal, ElasticBroker reorganizes simulation snapshots into continuous data streams and send them to the Cloud. In the Cloud, we deploy a distributed stream processing service to perform online data analysis. In our experiments, we use ElasticBroker to setup and execute a cross-ecosystem scientific workflow, which consists of a parallel computational fluid dynamics (CFD) simulation running on a supercomputer, and a parallel dynamic mode decomposition (DMD) analysis application running in a Cloud computing platform. Our results show that running scientific workflows consisting of decoupled HPC and Big Data jobs in their native environments with ElasticBroker, can achieve high quality of service, good scalability, and provide high-quality analytics for ongoing simulations

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    On a Catalogue of Metrics for Evaluating Commercial Cloud Services

    Full text link
    Given the continually increasing amount of commercial Cloud services in the market, evaluation of different services plays a significant role in cost-benefit analysis or decision making for choosing Cloud Computing. In particular, employing suitable metrics is essential in evaluation implementations. However, to the best of our knowledge, there is not any systematic discussion about metrics for evaluating Cloud services. By using the method of Systematic Literature Review (SLR), we have collected the de facto metrics adopted in the existing Cloud services evaluation work. The collected metrics were arranged following different Cloud service features to be evaluated, which essentially constructed an evaluation metrics catalogue, as shown in this paper. This metrics catalogue can be used to facilitate the future practice and research in the area of Cloud services evaluation. Moreover, considering metrics selection is a prerequisite of benchmark selection in evaluation implementations, this work also supplements the existing research in benchmarking the commercial Cloud services.Comment: 10 pages, Proceedings of the 13th ACM/IEEE International Conference on Grid Computing (Grid 2012), pp. 164-173, Beijing, China, September 20-23, 201

    Early Observations on Performance of Google Compute Engine for Scientific Computing

    Full text link
    Although Cloud computing emerged for business applications in industry, public Cloud services have been widely accepted and encouraged for scientific computing in academia. The recently available Google Compute Engine (GCE) is claimed to support high-performance and computationally intensive tasks, while little evaluation studies can be found to reveal GCE's scientific capabilities. Considering that fundamental performance benchmarking is the strategy of early-stage evaluation of new Cloud services, we followed the Cloud Evaluation Experiment Methodology (CEEM) to benchmark GCE and also compare it with Amazon EC2, to help understand the elementary capability of GCE for dealing with scientific problems. The experimental results and analyses show both potential advantages of, and possible threats to applying GCE to scientific computing. For example, compared to Amazon's EC2 service, GCE may better suit applications that require frequent disk operations, while it may not be ready yet for single VM-based parallel computing. Following the same evaluation methodology, different evaluators can replicate and/or supplement this fundamental evaluation of GCE. Based on the fundamental evaluation results, suitable GCE environments can be further established for case studies of solving real science problems.Comment: Proceedings of the 5th International Conference on Cloud Computing Technologies and Science (CloudCom 2013), pp. 1-8, Bristol, UK, December 2-5, 201
    • …
    corecore