983 research outputs found

    Distributed replicated macro-components

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaIn recent years, several approaches have been proposed for improving application performance on multi-core machines. However, exploring the power of multi-core processors remains complex for most programmers. A Macro-component is an abstraction that tries to tackle this problem by allowing to explore the power of multi-core machines without requiring changes in the programs. A Macro-component encapsulates several diverse implementations of the same specification. This allows to take the best performance of all operations and/or distribute load among replicas, while keeping contention and synchronization overhead to the minimum. In real-world applications, relying on only one server to provide a service leads to limited fault-tolerance and scalability. To address this problem, it is common to replicate services in multiple machines. This work addresses the problem os supporting such replication solution, while exploring the power of multi-core machines. To this end, we propose to support the replication of Macro-components in a cluster of machines. In this dissertation we present the design of a middleware solution for achieving such goal. Using the implemented replication middleware we have successfully deployed a replicated Macro-component of in-memory databases which are known to have scalability problems in multi-core machines. The proposed solution combines multi-master replication across nodes with primary-secondary replication within a node, where several instances of the database are running on a single machine. This approach deals with the lack of scalability of databases on multi-core systems while minimizing communication costs that ultimately results in an overall improvement of the services. Results show that the proposed solution is able to scale as the number of nodes and clients increases. It also shows that the solution is able to take advantage of multi-core architectures.RepComp project (PTDC/EIAEIA/108963/2008

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    McRunjob: A High Energy Physics Workflow Planner for Grid Production Processing

    Full text link
    McRunjob is a powerful grid workflow manager used to manage the generation of large numbers of production processing jobs in High Energy Physics. In use at both the DZero and CMS experiments, McRunjob has been used to manage large Monte Carlo production processing since 1999 and is being extended to uses in regular production processing for analysis and reconstruction. Described at CHEP 2001, McRunjob converts core metadata into jobs submittable in a variety of environments. The powerful core metadata description language includes methods for converting the metadata into persistent forms, job descriptions, multi-step workflows, and data provenance information. The language features allow for structure in the metadata by including full expressions, namespaces, functional dependencies, site specific parameters in a grid environment, and ontological definitions. It also has simple control structures for parallelization of large jobs. McRunjob features a modular design which allows for easy expansion to new job description languages or new application level tasks.Comment: CHEP 2003 serial number TUCT00

    ATLAS Data Challenge 1

    Full text link
    In 2002 the ATLAS experiment started a series of Data Challenges (DC) of which the goals are the validation of the Computing Model, of the complete software suite, of the data model, and to ensure the correctness of the technical choices to be made. A major feature of the first Data Challenge (DC1) was the preparation and the deployment of the software required for the production of large event samples for the High Level Trigger (HLT) and physics communities, and the production of those samples as a world-wide distributed activity. The first phase of DC1 was run during summer 2002, and involved 39 institutes in 18 countries. More than 10 million physics events and 30 million single particle events were fully simulated. Over a period of about 40 calendar days 71000 CPU-days were used producing 30 Tbytes of data in about 35000 partitions. In the second phase the next processing step was performed with the participation of 56 institutes in 21 countries (~ 4000 processors used in parallel). The basic elements of the ATLAS Monte Carlo production system are described. We also present how the software suite was validated and the participating sites were certified. These productions were already partly performed by using different flavours of Grid middleware at ~ 20 sites.Comment: 10 pages; 3 figures; CHEP03 Conference, San Diego; Reference MOCT00

    A Coordination Language for Databases

    Get PDF
    We present a coordination language for the modeling of distributed database applications. The language, baptized Klaim-DB, borrows the concepts of localities and nets of the coordination language Klaim but re-incarnates the tuple spaces of Klaim as databases. It provides high-level abstractions and primitives for the access and manipulation of structured data, with integrity and atomicity considerations. We present the formal semantics of Klaim-DB and develop a type system that avoids potential runtime errors such as certain evaluation errors and mismatches of data format in tables, which are monitored in the semantics. The use of the language is illustrated in a scenario where the sales from different branches of a chain of department stores are aggregated from their local databases. Raising the abstraction level and encapsulating integrity checks in the language primitives have benefited the modeling task considerably
    corecore