322 research outputs found

    A storage architecture for data-intensive computing

    Get PDF
    The assimilation of computing into our daily lives is enabling the generation of data at unprecedented rates. In 2008, IDC estimated that the "digital universe" contained 486 exabytes of data [9]. The computing industry is being challenged to develop methods for the cost-effective processing of data at these large scales. The MapReduce programming model has emerged as a scalable way to perform data-intensive computations on commodity cluster computers. Hadoop is a popular open-source implementation of MapReduce. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem --- HDFS --- is written in Java and designed for portability across heterogeneous hardware and software platforms. The efficiency of a Hadoop cluster depends heavily on the performance of this underlying storage system. This thesis is the first to analyze the interactions between Hadoop and storage. It describes how the user-level Hadoop filesystem, instead of efficiently capturing the full performance potential of the underlying cluster hardware, actually degrades application performance significantly. Architectural bottlenecks in the Hadoop implementation result in inefficient HDFS usage due to delays in scheduling new MapReduce tasks. Further, HDFS implicitly makes assumptions about how the underlying native platform manages storage resources, even though native filesystems and I/O schedulers vary widely in design and behavior. Methods to eliminate these bottlenecks in HDFS are proposed and evaluated both in terms of their application performance improvement and impact on the portability of the Hadoop framework. In addition to improving the performance and efficiency of the Hadoop storage system, this thesis also focuses on improving its flexibility. The goal is to allow Hadoop to coexist in cluster computers shared with a variety of other applications through the use of virtualization technology. The introduction of virtualization breaks the traditional Hadoop storage architecture, where persistent HDFS data is stored on local disks installed directly in the computation nodes. To overcome this challenge, a new flexible network-based storage architecture is proposed, along with changes to the HDFS framework. Network-based storage enables Hadoop to operate efficiently in a dynamic virtualized environment and furthers the spread of the MapReduce parallel programming model to new applications

    Nas nuvens ou fora delas, eis a questão

    Get PDF
    Mestrado em Sistemas de InformaçãoO proposito desta dissertação é contribuir no sentido de uma melhor compreensão sobre a decisão de ir ou não ir para uma solução na cloud quando uma organização é confrontada com a necessidade de criar ou expandir um sistema de informação. Isto é feito recorrendo à identificação de factores técnicos e económicos que devem ser tomados em conta quando planeamos uma nova solução e desenvolver um framework para ajudar os decisores. Os seguintes aspetos são considerados: • Definição de um modelo de referência genérico para funcionalidades de um Sistemas de Informação. • Identificação de algumas métricas básicas para caracterizar performance e custos de Sistemas de Informação. • Analise e caracterização de Sistemas de Informação on-premises: Arquiteturas Elementos de custo Questões de Performance • Analise e caracterização de Sistemas de Informação Cloud: Topologias Estruturas de custo Questões de Performance • Estabelecimento de framework de comparação para a cloud versus on-premises • Casos de uso comparando soluções na cloud e on-premises; • Produção de guidelines (focadas no caso das clouds publicas) Para ilustrar o procedimento, são usados dois business cases, ambos com duas abordagens: uma dedicada aos Profissionais de IT (abordagem técnica), outra aos Gestores/Decisores (abordagem económica).The purpose of this dissertation is to contribute towards a better understanding about the decision to go or not to go for cloud solutions when an organization is confronted with the need to create or enlarge an information system. This is done resorting to the identification of technical and economic factors that must be taken into account when planning a new solution and developing a framework to help decision makers. The following aspects are considered: • Definition of a generic reference model for Information systems functionalities. • Identification of some basic metrics characterizing information systems performance. • Analysis and characterization of on-premisis information systems: Architectures Cost elements Performance issues • Analysis and characetrization of cloud information systems. Typology Cost structures Performance issues • Establishment of a comparison framework for cloud versus on-premises solutions as possible instances of information systems. • Use cases comparing cloud and on-premises solutions. • Production of guidelines (focus on public cloud case) To illustrate the procedure, two business cases are used, both with two approaches: one dedicated to IT Professionals (Technical approach), other to Managers/Decision Makers (Economic approach)
    • …
    corecore