571 research outputs found

    Bridging the gap between design and implementation of components libraries

    Get PDF
    Object-oriented design is usually driven by three main reusability principles: step-by-step design, design for reuse and design with reuse. However, these principles are just partially applied to the subsequent object-oriented implementation, often due to efficienc y constraints, yielding to a gap between design and implementation. In this paper we provide a solution for bridging this gap for a concrete framework, the one of designing and implementing container-like component libraries, such as STL, Booc h Components, etc. Our approach is based on a new design pattern together with its corresponding implementation. The proposal enhances the same principles that drive the design process: step-by--step implementation (adding just what is needed in every step), implementation with reuse (component implementations are reused while library implementation progresses and component hierarchies grow) and implementation for reuse (intermediate component implementations can be reused in many different points o f the hierarchy). We use our approach in two different manners: for building a brand-new container-like component library, and for reengineering an existing one, Booch Components in Ada95.Postprint (published version

    GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

    Full text link
    From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

    Optimizing sequences traversal and extensibility

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia Informática e de ComputadoresGeradores yield são uma característica de programação bem conhecida, disponível na maioria dos ambientes de programação usados, como JavaScript, Python e muitos outros. Permitem uma extensibilidade fácil e compacta em operações de streams, como em iteradores ou tipos enumeráveis. Ainda assim, surgem duas questões sobre a sua utilização: 1) Os geradores são a melhor escolha para estender sequências com novas operações definidas pelo programador? 2) E se as linguagens de programação de desenvolvimento não fornecerem geradores yield, como em Java? O trabalho de pesquisa que descrevo nesta dissertação visa responder a essas duas questões. Para tal, analisei dois desenhos de tipo de sequência de linguagens de programação diferentes, nomeadamente, Java e Javascript. Além disso, estudei as alternativas mais utilizadas às sequências incluídas em cada linguagem, num conjunto de características, criando benchmarks para analisar o desempenho de cada uma em casos de utilização baseados no mundo real, disponíveis para cada programador poder usar quando quiser escolher um tipo de sequência de acordo com suas necessidades. Para além disto, proponho a minha própria solução para um tipo de sequência, baseado num desenho minimalista que permite não só a extensão concisa da sua API como o encadeamento fluente de operações definidas pelo utilizador. A minha proposta tem como objectivo ser quão simples e transparente quanto possível, para que qualquer programador consiga perceber claramente aquilo que está a usar.Por fim, respondo à questão "Quando se deve usar paralelismo?"com um conjunto de benchmarks que comparam o processamento sequencial das Streams do Java com o seu processamento paralelo.Yield generators are a well-known programming feature available in most used programming environments such as JavaScript, Python and many others. They allow easy and compact extensibility on streams operations such as on iterators or enumerable types. Yet, two questions arise about their use: 1) are generators the most efficient choice to extend sequences with new user-defined operations? 2) What if the development programming languages does not provide the yield feature, such as in Java? The research work that I describe in this dissertation aims to answer these two questions. To that end, I analyzed two different programming languages designs for a sequence type, Java and Javascript. Also, I studied the state-of-the-art alternatives to the out-of-the-box sequences included in each language, in a set of features, devising benchmarks to analyze their performance with real world usecases, available for developers to use when choosing a sequence type according to their needs. Not only that but, I also propose my own solution of a sequence type, based on a minimalist design that both allows for verboseless extension as well as fluent chaining of new operations. My proposal aims to be as simple and transparent as possible so the developer may clearly understand what he is using. Finally, I answer the question "When should you use parallelism?" with a set of benchmarks that compare Java Streams sequential processing with its parallel counterpart.N/

    Subtyping with Generics: A Unified Approach

    Get PDF
    Reusable software increases programmers\u27 productivity and reduces repetitive code and software bugs. Variance is a key programming language mechanism for writing reusable software. Variance is concerned with the interplay of parametric polymorphism (i.e., templates, generics) and subtype (inclusion) polymorphism. Parametric polymorphism enables programmers to write abstract types and is known to enhance the readability, maintainability, and reliability of programs. Subtyping promotes software reuse by allowing code to be applied to a larger set of terms. Integrating parametric and subtype polymorphism while maintaining type safety is a difficult problem. Existing variance mechanisms enable greater subtyping between parametric types, but they suffer from severe deficiencies. They are unable to express several common type abstractions. They can cause a proliferation of types and redundant code. They are difficult for programmers to use due to its inherent complexity. This dissertation aims to improve variance mechanisms in programming languages supporting parametric polymorphism. To address the shortcomings of current mechanisms, I will combine two popular approaches, definition-site variance and use-site variance, in a single programming language. I have developed formal languages or calculi for reasoning about variance. The calculi are example languages supporting both notions of definition-site and use-site variance. They enable stating precise properties that can be proved rigorously. The VarLang calculus demonstrates fundamental issues in variance from a language neutral perspective. The VarJ calculus illustrates realistic complications by modeling a mainstream programming language, Java. VarJ not only supports both notions of use-site and definition-site variance but also language features with complex interactions with variance such as F-bounded polymorphism and wildcard capture. A mapping from Java to VarLang was implemented in software that infers definition-site variance for Java. Large, standard Java libraries (e.g. Oracle\u27s JDK 1.6) were analyzed using the software to compute metrics measuring the benefits of adding definition-site variance to Java, which only supports use-site variance. Applying this technique to six Java generic libraries shows that 21-47% (depending on the library) of generic definitions are inferred to have single-variance; 7-29% of method signatures can be relaxed through this inference, and up to 100% of existing wildcard annotations are unnecessary and can be elided. Although the VarJ calculus proposes how to extend Java with definition-site variance, no mainstream language currently supports both definition-site and use-site variance. To assist programmers with utilizing both notions with existing technology, I developed a refactoring tool that refactors Java code by inferring definition-site variance and adding wildcard annotations. This tool is practical and immediately applicable: It assumes no changes to the Java type system, while taking into account all its intricacies. This system allows users to select declarations (variables, method parameters, return types, etc.) to generalize and considers declarations not declared in available source code. I evaluated our technique on six Java generic libraries. I found that 34% of available declarations of variant type signatures can be generalized-i.e., relaxed with more general wildcard types. On average, 146 other declarations need to be updated when a declaration is generalized, showing that this refactoring would be too tedious and error-prone to perform manually. The result of applying this refactoring is a more general interface that supports greater software reuse

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Implementation of an agents based system for anonymous medical database retrieval

    Get PDF
    The aim of this thesis is to describe and develop in detail the implementation of a Dynamic Health Data Aggregation System for research purposes in the framework of ISyPeM2. The motivation of this work responds to the need of simple, efficient and secure systems to perform medical research using patient sensitive data, which for the time being is handicapped by several requirements such as previously established agreements between institutions, heterogeneous databases and privacy issues. This thesis aims to document the implementation of a system that addresses these issues. To implement the system, we used the principles of P2P networks for discovery of data sources (the nodes in the system), the functionality of a Multi-Agent System for coordination between data sources and the use of anonymization techniques to preserve the patient’s privacy. The thesis first examines the motivation and application for such systems, as well as the concepts needed to understand the components that conform the system and then it will proceed to present the state of the art for similar or related systems and technologies being developed for the same purpose. Afterwards, the components in the implementation will be detailed. After that, there is an in-depth explanation on how the system works and the components it is comprised, along with other implementation details, as well as a description of the development environment. In the final chapters, the results from the tests performed and the conclusions of this work will be presented, and in the appendices, there is a description of the datasets used for the performance tests, as well as the last review of the implementation code.L’objectiu d’aquesta tesi és descriure amb detall el desenvolupament e implementació d’un Sistema d’Agregació de Dades Mèdiques per a la recerca en el marc de ISyPeM2. La motivació d’aquest treball respòn a la necessitat de sistemes senzills, efficients i segurs per realitzar investigació mèdica fent servir dades sensibles dels pacients. En aquests moments, aquests sistemes són limitats per varis requeriments com acords preestablerts entre institucions, bases de dades heterogenies i problemes de privacitat. Aqueta tesi documenta la implementació d’un sistema que adreça aquests problemes. Per implementar el sistema, utilitzem els principis de xarxes P2P per descobrir les fonts de dades (els nodes del sistema), la funcionalitat de Sistemes Multi Agent per la coordinació entre fonts de dades i l’ús de tècniques d’anonimització per preservar l’identitat del pacient. Aquesta tesi primer examina la motivació i aplicació d’aquests sistemes, com també els conceptes necessaris per comprendre els components que el formen, després es procedirà a presentar el State of the art per veure tecnologies similars o relacionades que estiguin sent desenvolipades amb el mateix propòsit. A continuació ens capbussarem en els components de la implementació. Després hi ha una explicació detallada de com funciona el sistema i els components que el formen, junt amb altres detalls de la implementació i una descripció de l’entorn de desenvolupament. En els últims capítols, es presentaràn els resultats dels test que s’han dut a terme. En els apèndixs s’inclou una descripció dels datasets que s’han fet servir pels experiments, juntament amb una còpia de l’última revisió del codi de l’implementació final.El objetivo de esta tésis es describir con detalle el desarrollo e implementación de un Sistema de Agregación de Datos Médicos para la investigación en el marco de ISyPeM2. La motivación de este trabajo responde a la necesidad de sistemas sencillos, eficientes y seguros para realizar investiación médica utlizando datos sensibles de los pacientes. Actualmente, estos sistemas estan limitados por varios requerimimentos como acuerdos preestablecidos entre instituciones, bases de datos heterogéneas, y problemas de privacidad. Esta tésis documenta la implementación de un sistema que trata estos problemas. Para implementar el sistema, utilizamos los principios de redes P2P para descubrir las fuentes de datos (los nodos del sistema), la funcionalidad de Sistemas Multi Agente para la coordinación entre fuentes de datos y el uso de técnicas de anonimización para preservar la identidad del paciente. Esta tesis primero indaga en la motivación y la aplicación para estos sistemas, como también en los conceptos necesarios para entender los componentes que conforman el sistema, luego se procederá a presentar el State of the art para ver tecnologías similares o relacionadas que estén siendo desarrolladas con el mismo propósito. A continuación entraremos de cabeza en los componentes de la implementación. Después hay una explicación detallada de como funciona el sistema y los componentes que lo forman, junto con otros detalles de la implementación y una descripción de el entorno de desarrollo. En los últimos capítulos, se presentarán los resultados de los test que se han llevado a cabo. En los apéndices se incluye una descripción de los datasets utilizados para los experimentos, junto con una copia de la última revisión del código de la implementación final

    Adding Reference Immutability to Scala

    Get PDF
    Scala is a multi-paradigm programming language combining the power of functional and object-oriented programming. While Scala has many features promoting immutability, it lacks a built-in mechanism for controlling and enforcing reference immutability. Reference immutability means the state of an object and all other objects reachable from it cannot be mutated through an immutable reference. This thesis presents a system for reference immutability in Scala, along with a simple implementation in the Dotty (Scala 3) compiler. By extending the Scala type system and encoding mutability as types within annotations, my system enables tracking and enforcing reference immutability for any type. It addresses challenges such as the complexities of the Scala type system and context sensitivity with nested classes and functions. The design offers binary compatibility with existing Scala code, and promotes predictable object behavior, reducing the risk of bugs in software development

    Transforming OCL to PVS: Using Theorem Proving Support for Analysing Model Constraints

    Get PDF
    The Unified Modelling Language (UML) is a de facto standard language for describing software systems. UML models are often supplemented with Object Constraint Language (OCL) constraints, to capture detailed properties of components and systems. Sophisticated tools exist for analysing UML models, e.g., to check that well-formedness rules have been satisfied. As well, tools are becoming available to analyse and reason about OCL constraints. Previous work has been done on analysing OCL constraints by translating them to formal languages and then analysing the translated constraints with tools such as theorem provers. This project contributes a transformation from OCL to the specification language of the Prototype Verification System (PVS). PVS can be used to analyse and reason about translated OCL constraints. A particular novelty of this project is that it carries out the transformation of OCL to PVS by using model transformation, as exemplified by the OMG's Model-Driven Architecture. The project implements and automates model transformations from OCL to PVS using the Epsilon Transformation Language (ETL) and tests the results using the Epsilon Comparison Language (ECL )
    • …
    corecore