822 research outputs found

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Live capture and reuse of project knowledge

    Get PDF
    It is important that the knowledge generated on construction projects is captured and shared between project team members for continuous improvement, to prevent the ā€˜reinvention of the wheelā€™ and to avoid repetition of previous mistakes. However, this is undermined mainly by the loss of important insights and knowledge due to time lapse in capturing the knowledge, staff turnover and peopleā€™s reluctance to share knowledge. To address this, it is crucial for knowledge to be captured ā€˜liveā€™ in a collaborative environment while the project is being executed and presented in a format that will facilitate its reuse during and after the project. This paper uses a case study approach to investigate the end-usersā€™ requirements for the ā€˜liveā€™ capture and reuse of knowledge methodology, and shortcomings of current practice in meeting these requirements. A framework for the ā€˜liveā€™ methodology that satisfies the requirements is then presented and discussed

    Active architecture for pervasive contextual services

    Get PDF
    International Workshop on Middleware for Pervasive and Ad-hoc Computing MPAC 2003), ACM/IFIP/USENIX International Middleware Conference (Middleware 2003), Rio de Janeiro, Brazil This work was supported by the FP5 Gloss project IST2000-26070, with partners at Trinity College Dublin and UniversitƩ Joseph Fourier, and by EPSRC grants GR/M78403/GR/M76225, Supporting Internet Computation in Arbitrary Geographical Locations, and GR/R45154, Bulk Storage of XML Documents.Pervasive services may be defined as services that are available "to any client (anytime, anywhere)". Here we focus on the software and network infrastructure required to support pervasive contextual services operating over a wide area. One of the key requirements is a matching service capable of as-similating and filtering information from various sources and determining matches relevant to those services. We consider some of the challenges in engineering a globally distributed matching service that is scalable, manageable, and able to evolve incrementally as usage patterns, data formats, services, network topologies and deployment technologies change. We outline an approach based on the use of a peer-to-peer architecture to distribute user events and data, and to support the deployment and evolution of the infrastructure itself.Peer reviewe

    Active architecture for pervasive contextual services

    Get PDF
    Pervasive services may be defined as services that are available to any client (anytime, anywhere). Here we focus on the software and network infrastructure required to support pervasive contextual services operating over a wide area. One of the key requirements is a matching service capable of assimilating and filtering information from various sources and determining matches relevant to those services. We consider some of the challenges in engineering a globally distributed matching service that is scalable, manageable, and able to evolve incrementally as usage patterns, data formats, services, network topologies and deployment technologies change. We outline an approach based on the use of a peer-to-peer architecture to distribute user events and data, and to support the deployment and evolution of the infrastructure itself

    Early Accurate Results for Advanced Analytics on MapReduce

    Full text link
    Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big data'. Therefore, we proposed and implemented a non-parametric extension of Hadoop which allows the incremental computation of early results for arbitrary work-flows, along with reliable on-line estimates of the degree of accuracy achieved so far in the computation. These estimates are based on a technique called bootstrapping that has been widely employed in statistics and can be applied to arbitrary functions and data distributions. In this paper, we describe our Early Accurate Result Library (EARL) for Hadoop that was designed to minimize the changes required to the MapReduce framework. Various tests of EARL of Hadoop are presented to characterize the frequent situations where EARL can provide major speed-ups over the current version of Hadoop.Comment: VLDB201

    American Foulbrood and the Risk in the Use of Antibiotics as a Treatment

    Get PDF
    Honeybees (Apis mellifera) crucially pollinate agricultural crops and endemic species, in addition to producing various apiculture products. The most economically relevant and abundant beehive product is honey, a sweet substance made from the secretions of melliferous plants. Honey is a natural food rich in nutrients, including certain bioactive compounds inherited from floral nectar and pollen. Among the most dangerous diseases for bees is American foulbrood. Spores of the causative microorganism, Paenibacillus larvae, can contaminate larvae food or the operculum wax in which larval stages of honeybees are kept. Infection is further promoted by common apiculture practices, such as reusing inert material contaminated with spores, even after months of storage. American foulbrood is untreatable, and management implicates completely incinerating the infected hive and all material that could have come into contact with pathogenic spores. The purpose of such drastic measures is to decrease propagation risk for other beehives. While evidence indicates that antibiotics could effectively control and combat this disease; antibiotic use is prohibited in most honey-producing countries due to increased risks to microbial resistance. Antibiotic residues in honey can affect consumer health, since the natural biological attributes of honey can be altered
    • ā€¦
    corecore