8,160 research outputs found

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Solving Parity Games in Scala

    Get PDF
    Parity games are two-player games, played on directed graphs, whose nodes are labeled with priorities. Along a play, the maximal priority occurring infinitely often determines the winner. In the last two decades, a variety of algorithms and successive optimizations have been proposed. The majority of them have been implemented in PGSolver, written in OCaml, which has been elected by the community as the de facto platform to solve efficiently parity games as well as evaluate their performance in several specific cases. PGSolver includes the Zielonka Recursive Algorithm that has been shown to perform better than the others in randomly generated games. However, even for arenas with a few thousand of nodes (especially over dense graphs), it requires minutes to solve the corresponding game. In this paper, we deeply revisit the implementation of the recursive algorithm introducing several improvements and making use of Scala Programming Language. These choices have been proved to be very successful, gaining up to two orders of magnitude in running time

    CrocoPat 2.1 Introduction and Reference Manual

    Full text link
    CrocoPat is an efficient, powerful and easy-to-use tool for manipulating relations of arbitrary arity, including directed graphs. This manual provides an introduction to and a reference for CrocoPat and its programming language RML. It includes several application examples, in particular from the analysis of structural models of software systems.Comment: 19 pages + cover, 2 eps figures, uses llncs.cls and cs_techrpt_cover.sty, for downloading the source code, binaries, and RML examples, see http://www.software-systemtechnik.de/CrocoPat

    Dynamic Trace-Based Data Dependency Analysis for Parallelization of C Programs

    Get PDF
    Writing parallel code is traditionally considered a difficult task, even when it is tackled from the beginning of a project. In this paper, we demonstrate an innovative toolset that faces this challenge directly. It provides the software developers with profile data and directs them to possible top-level, pipeline-style parallelization opportunities for an arbitrary sequential C program. This approach is complementary to the methods based on static code analysis and automatic code rewriting and does not impose restrictions on the structure of the sequential code or the parallelization style, even though it is mostly aimed at coarse-grained task-level parallelization. The proposed toolset has been utilized to define parallel code organizations for a number of real-world representative applications and is based on and is provided as free source

    Efficient annotated terms

    Get PDF

    Compressão de imagem médica para arquivos de alto desempenho

    Get PDF
    Information systems and the medical subject are two widespread topics that have interwoven so that medical help could become more efficient. This relation has bred the PACS and the international standard DICOM directed to the organization of digital medical information. The concept of image compression is applied to most images throughout the web. The compression formats used for medical imaging have become outdated. The new formats that have been developed in the past few years are candidates for replacing the old ones in such contexts, possibly enhancing the process. Before they are adopted, an evaluation should be carried out that validates their admissibility. This dissertation reviews the state of the art of medical imaging information systems, namely PACS systems and the DICOM standard. Furthermore, some topics of image compression are covered, such as the metrics for evaluating the algorithms’ performance, finalizing with a survey of four modern formats: JPEG XL, AVIF, and WebP. Two software projects were developed, where the first one carries out an analysis of the formats based on the metrics, using DICOM datasets and producing results that can be used for creating recommendations on the format’s use. The second consists of an application that encodes and decodes medical images with the formats covered in this dissertation. This proof-of-concept works as a medical imaging archive for the storage, distribution, and visualization of compressed data.Os sistemas de informação e o assunto médico são dois temas difundidos que se entrelaçam para que a ajuda médica se torne mais eficiente. Essa relação deu origem ao PACS e ao padrão internacional DICOM direcionado à organização da informação médica digital. O conceito de compressão de imagem é aplicado à maioria das imagens em toda a web. Os formatos de compressão usados para imagens médicas tornaram-se desatualizados. Os novos formatos desenvolvidos nos últimos anos são candidatos a substituir os antigos nesses contextos, possivelmente potencializando o processo. Antes de serem adotados, deve ser realizada uma avaliação que valide sua admissibilidade. Esta dissertação revisa o estado da arte dos sistemas de informação de imagens médicas, nomeadamente os sistemas PACS e a norma DICOM. Além disso, são abordados alguns tópicos de compressão de imagens, como as métricas para avaliação do desempenho dos algoritmos, finalizando com um levantamento de três formatos modernos: JPEG XL, AVIF e WebP. Foram desenvolvidos dois projetos de software, onde o primeiro realiza uma análise dos formatos com base nas métricas, utilizando conjuntos de dados DICOM e produzindo resultados que podem ser utilizados para a criação de recomendações sobre o uso do formato. A segunda consiste numa aplicação capaz de codificar e descodificar imagens médicas com os formatos abordados nesta dissertação. Essa prova de conceito funciona como um arquivo de imagens médicas para armazenamento, distribuição e visualização de dados compactados.Mestrado em Engenharia de Computadores e Telemátic
    corecore