1,626 research outputs found

    The state of SQL-on-Hadoop in the cloud

    Get PDF
    Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Performance Benchmarks for Custom Applications: Considerations and Strategies

    Get PDF
    The motivation for this research came from the need to solve a problem affecting not only the company used in this study, but also the many other companies in the information technology industry having similar problem: how to conduct performance benchmarks for custom applications in an effective, unbiased, and accurate manner. This paper presents the pros and cons of existing benchmark methodologies. It proposes a combination of the best characteristics of these benchmarks into a methodology that addresses the problem from an application perspective considering the overall synergy between operating system and software. The author also discusses a software design to implement the proposed methodology. The methodology proposed is generic enough to be adapted to any particular application performance-benchmarking situation

    Network of European Facilities I: European Network of Crisis Management Laboratories (ENCML)

    Get PDF
    This policy report focuses on the Network of European Facilities. It draws attention to requirements to initiate the network, methods to carry out experiments and to how the activities of the ENCML will feed into the Disaster Risk Management Knowledge Centre (DRMKC)JRC.E.1-Disaster Risk Managemen

    Performance evaluation and benchmarking of the JXTA peer-to-peer platform

    Get PDF
    Peer-to-peer (P2P) systems are a relatively new addition to the large area of distributed computer systems. The emphasis on sharing resources, self-organization and use of discovery mechanisms sets the P2P systems apart from other forms of distributed computing. Project JXTA is the first P2P application development platform, consisting of standard protocols, programming tools and multi-language implementations. A JXTA peer network is a complex overlay, constructed on top of the physical network, with its own identification scheme and routing. This thesis investigates the performance of JXTA using benchmarking. The presented work includes the development of the JXTA Performance Model and Benchmark Suite, as well as the collection and analysis of the performance results. By evaluating three major versions of the protocol implementations in a variety of configurations, the performance characteristics, limitations, bottlenecks and trade-offs are observed and discussed. It is shown that the complexity of JXTA allows many factors to affect its performance and that several JXTA components exhibit unintuitive and unexpected behavior. However, the results also reveal the ways to maximize the performance of the deployed and newly designed systems. The evolution of JXTA through several versions shows some notable improvements, especially in search and discovery models and added messaging components, which make JXTA a promising member of the future generation of computer systems

    The global unified parallel file system (GUPFS) project: FY 2002 activities and results

    Full text link

    A framework for the introduction of knowledge management within an engineering environment

    Get PDF
    This research is based on real issues that have been recognised within the global organisation, Rolls-Royce. The first aim concerned an issue that many companies face, the difficulty that employees face in locating the knowledge and information they require, especially in larger organisations. The developed solution of an Information Map proved to be a success in providing people within the Submarines business with the location of' Configuration Management information. The concept of the Information Map is one that can be adopted by any business as the stages in the tools development have been well documented within Chapters Four, Five and Six. Analysis of the success of the Information Map led to the derivation of 10 lessons learned. These were then verified in a second case study of an intranet development. The second aim of the research was to create a Knowledge Management framework that could be adapted by companies looking to invest in Knowledge Management and provide them with a guide to use. This framework was built from the lessons learned from the Information Map and from other best practice derived from available literature and within Rolls-Royce. The work conducted within the Support business tries to fill in gaps in current research by offering companies a new approach to Knowledge Management, which was based upon the way that industries work today. The creation of the Knowledge Management framework simplifies the work conducted and offers practitioners an easy, high-level approach to the adoption of Knowledge Management by grouping the process into ten steps. This is presented in a fashion that is easy to follow and ultimately offers a guide to make the best use of the resources and budget available to Knowledge Management practitioners. Overall the research addresses the 'real' issues faced by Knowledge Management practitioners. The main contributions to the Knowledge Management domain are the Information Map, action research approach, implementation of Knowledge Management tools for the users needs and a Framework as a guide for industry
    • …
    corecore