571 research outputs found
Bridging the gap between design and implementation of components libraries
Object-oriented design is usually driven by three main reusability principles:
step-by-step design, design for reuse and design with reuse. However, these
principles are just partially
applied to the subsequent object-oriented implementation, often due to efficienc
y
constraints, yielding to a gap between design and implementation. In this paper
we provide a solution for bridging this gap for a concrete framework, the one of
designing and implementing container-like component libraries, such as STL, Booc
h
Components, etc. Our approach is based on a new design pattern together with its
corresponding implementation. The proposal enhances the same principles that
drive the design process: step-by--step implementation (adding just what is
needed in every step), implementation with reuse (component implementations are
reused while library implementation
progresses and component hierarchies grow) and implementation for reuse
(intermediate component implementations can be reused in many different points o
f
the hierarchy). We use our approach in two different manners: for building a
brand-new container-like
component library, and for reengineering an existing one, Booch Components in
Ada95.Postprint (published version
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Optimizing sequences traversal and extensibility
Dissertação para obtenção do Grau de Mestre em Engenharia Informática e de ComputadoresGeradores yield são uma caracterÃstica de programação bem conhecida, disponÃvel na maioria dos ambientes de programação usados, como JavaScript, Python e muitos outros. Permitem uma extensibilidade fácil e compacta em operações de streams, como em iteradores ou tipos enumeráveis. Ainda assim, surgem duas
questões sobre a sua utilização: 1) Os geradores são a melhor escolha para estender sequências com novas operações definidas pelo programador? 2) E se as linguagens de programação de desenvolvimento não fornecerem geradores yield, como em Java? O trabalho de pesquisa que descrevo nesta dissertação visa responder a essas duas questões. Para tal, analisei dois desenhos de tipo de sequência de linguagens de programação diferentes, nomeadamente, Java e Javascript. Além disso, estudei as alternativas mais utilizadas à s sequências incluÃdas em cada linguagem, num conjunto de caracterÃsticas, criando benchmarks para analisar o desempenho de cada uma em casos de utilização baseados no mundo real, disponÃveis
para cada programador poder usar quando quiser escolher um tipo de sequência de acordo com suas necessidades. Para além disto, proponho a minha própria solução para um tipo de sequência, baseado num desenho minimalista que permite não só a extensão concisa da sua API como o encadeamento fluente de operações definidas pelo utilizador. A minha proposta tem como objectivo ser quão simples e transparente quanto possÃvel, para que qualquer programador consiga perceber claramente aquilo que está a usar.Por fim, respondo à questão "Quando se deve usar paralelismo?"com um conjunto de benchmarks que comparam o processamento sequencial das Streams do Java com o seu processamento paralelo.Yield generators are a well-known programming feature available in most used programming environments such as JavaScript, Python and many others. They allow easy and compact extensibility on streams operations such as on iterators or enumerable types. Yet, two questions arise about their use: 1) are generators the most efficient choice to extend sequences with new user-defined operations? 2) What if the development programming languages does not provide the yield feature, such as in Java? The research work that I describe in this dissertation aims to answer these two questions. To that end, I analyzed two different programming languages designs for a sequence type, Java and Javascript. Also, I studied the state-of-the-art alternatives to the out-of-the-box sequences included in each language, in a set of features, devising benchmarks to analyze their performance with real world usecases, available for developers to use when choosing a sequence type according to their needs. Not only that but, I also propose my own solution of a sequence type, based on a minimalist design that both allows for verboseless extension as well as fluent chaining of new operations. My proposal aims to be as simple and transparent as possible so the developer may clearly understand what he is using. Finally, I answer the question "When should you use parallelism?" with a set of benchmarks that compare Java Streams sequential processing with its parallel counterpart.N/
Subtyping with Generics: A Unified Approach
Reusable software increases programmers\u27 productivity and reduces repetitive code and software bugs. Variance is a key programming language mechanism for writing reusable software. Variance is concerned with the interplay of parametric polymorphism (i.e., templates, generics) and subtype (inclusion) polymorphism. Parametric polymorphism enables programmers to write abstract types and is known to enhance the readability, maintainability, and reliability of programs. Subtyping promotes software reuse by allowing code to be applied to a larger set of terms. Integrating parametric and subtype polymorphism while maintaining type safety is a difficult problem. Existing variance mechanisms enable greater subtyping between parametric types, but they suffer from severe deficiencies. They are unable to express several common type abstractions. They can cause a proliferation of types and redundant code. They are difficult for programmers to use due to its inherent complexity. This dissertation aims to improve variance mechanisms in programming languages supporting parametric polymorphism. To address the shortcomings of current mechanisms, I will combine two popular approaches, definition-site variance and use-site variance, in a single programming language. I have developed formal languages or calculi for reasoning about variance. The calculi are example languages supporting both notions of definition-site and use-site variance. They enable stating precise properties that can be proved rigorously. The VarLang calculus demonstrates fundamental issues in variance from a language neutral perspective. The VarJ calculus illustrates realistic complications by modeling a mainstream programming language, Java. VarJ not only supports both notions of use-site and definition-site variance but also language features with complex interactions with variance such as F-bounded polymorphism and wildcard capture. A mapping from Java to VarLang was implemented in software that infers definition-site variance for Java. Large, standard Java libraries (e.g. Oracle\u27s JDK 1.6) were analyzed using the software to compute metrics measuring the benefits of adding definition-site variance to Java, which only supports use-site variance. Applying this technique to six Java generic libraries shows that 21-47% (depending on the library) of generic definitions are inferred to have single-variance; 7-29% of method signatures can be relaxed through this inference, and up to 100% of existing wildcard annotations are unnecessary and can be elided. Although the VarJ calculus proposes how to extend Java with definition-site variance, no mainstream language currently supports both definition-site and use-site variance. To assist programmers with utilizing both notions with existing technology, I developed a refactoring tool that refactors Java code by inferring definition-site variance and adding wildcard annotations. This tool is practical and immediately applicable: It assumes no changes to the Java type system, while taking into account all its intricacies. This system allows users to select declarations (variables, method parameters, return types, etc.) to generalize and considers declarations not declared in available source code. I evaluated our technique on six Java generic libraries. I found that 34% of available declarations of variant type signatures can be generalized-i.e., relaxed with more general wildcard types. On average, 146 other declarations need to be updated when a declaration is generalized, showing that this refactoring would be too tedious and error-prone to perform manually. The result of applying this refactoring is a more general interface that supports greater software reuse
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Implementation of an agents based system for anonymous medical database retrieval
The aim of this thesis is to describe and develop in detail the implementation of a Dynamic Health
Data Aggregation System for research purposes in the framework of ISyPeM2.
The motivation of this work responds to the need of simple, efficient and secure systems to perform
medical research using patient sensitive data, which for the time being is handicapped by several requirements
such as previously established agreements between institutions, heterogeneous databases
and privacy issues. This thesis aims to document the implementation of a system that addresses
these issues.
To implement the system, we used the principles of P2P networks for discovery of data sources
(the nodes in the system), the functionality of a Multi-Agent System for coordination between data
sources and the use of anonymization techniques to preserve the patient’s privacy.
The thesis first examines the motivation and application for such systems, as well as the concepts
needed to understand the components that conform the system and then it will proceed to present
the state of the art for similar or related systems and technologies being developed for the same
purpose. Afterwards, the components in the implementation will be detailed. After that, there is
an in-depth explanation on how the system works and the components it is comprised, along with
other implementation details, as well as a description of the development environment.
In the final chapters, the results from the tests performed and the conclusions of this work will be
presented, and in the appendices, there is a description of the datasets used for the performance
tests, as well as the last review of the implementation code.L’objectiu d’aquesta tesi és descriure amb detall el desenvolupament e implementació d’un Sistema
d’Agregació de Dades Mèdiques per a la recerca en el marc de ISyPeM2.
La motivació d’aquest treball respòn a la necessitat de sistemes senzills, efficients i segurs per realitzar
investigació mèdica fent servir dades sensibles dels pacients. En aquests moments, aquests
sistemes són limitats per varis requeriments com acords preestablerts entre institucions, bases de
dades heterogenies i problemes de privacitat. Aqueta tesi documenta la implementació d’un sistema
que adreça aquests problemes.
Per implementar el sistema, utilitzem els principis de xarxes P2P per descobrir les fonts de dades
(els nodes del sistema), la funcionalitat de Sistemes Multi Agent per la coordinació entre fonts de
dades i l’ús de tècniques d’anonimització per preservar l’identitat del pacient.
Aquesta tesi primer examina la motivació i aplicació d’aquests sistemes, com també els conceptes
necessaris per comprendre els components que el formen, després es procedirà a presentar el State of
the art per veure tecnologies similars o relacionades que estiguin sent desenvolipades amb el mateix
propòsit. A continuació ens capbussarem en els components de la implementació. Després hi ha una
explicació detallada de com funciona el sistema i els components que el formen, junt amb altres
detalls de la implementació i una descripció de l’entorn de desenvolupament.
En els últims capÃtols, es presentarà n els resultats dels test que s’han dut a terme. En els apèndixs
s’inclou una descripció dels datasets que s’han fet servir pels experiments, juntament amb una còpia
de l’última revisió del codi de l’implementació final.El objetivo de esta tésis es describir con detalle el desarrollo e implementación de un Sistema de
Agregación de Datos Médicos para la investigación en el marco de ISyPeM2.
La motivación de este trabajo responde a la necesidad de sistemas sencillos, eficientes y seguros para
realizar investiación médica utlizando datos sensibles de los pacientes. Actualmente, estos sistemas
estan limitados por varios requerimimentos como acuerdos preestablecidos entre instituciones, bases
de datos heterogéneas, y problemas de privacidad. Esta tésis documenta la implementación de un
sistema que trata estos problemas.
Para implementar el sistema, utilizamos los principios de redes P2P para descubrir las fuentes de
datos (los nodos del sistema), la funcionalidad de Sistemas Multi Agente para la coordinación entre
fuentes de datos y el uso de técnicas de anonimización para preservar la identidad del paciente.
Esta tesis primero indaga en la motivación y la aplicación para estos sistemas, como también en los
conceptos necesarios para entender los componentes que conforman el sistema, luego se procederá
a presentar el State of the art para ver tecnologÃas similares o relacionadas que estén siendo desarrolladas
con el mismo propósito. A continuación entraremos de cabeza en los componentes de la
implementación. Después hay una explicación detallada de como funciona el sistema y los componentes
que lo forman, junto con otros detalles de la implementación y una descripción de el entorno
de desarrollo.
En los últimos capÃtulos, se presentarán los resultados de los test que se han llevado a cabo. En los
apéndices se incluye una descripción de los datasets utilizados para los experimentos, junto con una
copia de la última revisión del código de la implementación final
Adding Reference Immutability to Scala
Scala is a multi-paradigm programming language combining the power of functional and object-oriented programming. While Scala has many features promoting immutability, it lacks a built-in mechanism for controlling and enforcing reference immutability. Reference immutability means the state of an object and all other objects reachable from it cannot be mutated through an immutable reference. This thesis presents a system for reference immutability in Scala, along with a simple implementation in the Dotty (Scala 3) compiler. By extending the Scala type system and encoding mutability as types within annotations, my system enables tracking and enforcing reference immutability for any type. It addresses challenges such as the complexities of the Scala type system and context sensitivity with nested classes and functions. The design offers binary compatibility with existing Scala code, and promotes predictable object behavior, reducing the risk of bugs in software development
Transforming OCL to PVS: Using Theorem Proving Support for Analysing Model Constraints
The Unified Modelling Language (UML) is a de facto standard language for describing
software systems. UML models are often supplemented with Object Constraint
Language (OCL) constraints, to capture detailed properties of components and systems.
Sophisticated tools exist for analysing UML models, e.g., to check that well-formedness
rules have been satisfied. As well, tools are becoming available to analyse and reason
about OCL constraints. Previous work has been done on analysing OCL constraints by
translating them to formal languages and then analysing the translated constraints with
tools such as theorem provers.
This project contributes a transformation from OCL to the specification language of the
Prototype Verification System (PVS). PVS can be used to analyse and reason about
translated OCL constraints. A particular novelty of this project is that it carries out the
transformation of OCL to PVS by using model transformation, as exemplified by the
OMG's Model-Driven Architecture. The project implements and automates model
transformations from OCL to PVS using the Epsilon Transformation Language (ETL)
and tests the results using the Epsilon Comparison Language (ECL )
- …