965 research outputs found

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    A Configuration Management System for Software Product Lines

    Get PDF
    Software product line engineering (SPLE) is a methodology for developing a family of software products in a particular domain by systematic reuse of shared code in order to improve product quality and reduce development time and cost. Currently, there are no software configuration management (SCM) tools that support software product line evolution. Conventional SCM tools are designed to support single product development. The use of conventional SCM tools forces developers to treat a software product line as a single software project by introducing new programming language constructs or using conditional compilation. We propose a research conguration management prototype called Molhado SPL that is designed specifically to support the evolution of software product lines. Molhado SPL addresses the evolution problem at the configuration level instead of at the code level. We studied the type of operations needed to support the evolution of software product lines and proposed a versioning model and eight cases of change propagation. Molhado SPL supports independent evolution of core assets and products, the sharing of code and the tracking relationships between products and shared code, and the eight cases of change propagation. The Molhado SPL consists of four layers with each layer providing a different type of service. At the heart of Molhado SPL are the versioning model, component object, shared component object, and project objects that allow for independent evolution of products and shared artifacts, for sharing, and for supporting change propagation. Furthermore,they allow product specific changes to shared code without interfering with the core asset that is shared. Products can also introduce product specific assets that only exist in that product. In order to for Molhado SPL to support product line, we implemented XML merging, feature model editing and debugging, and version-aware XML documents. To support merging of XML documents, we implemented a 3-way XML document merging algorithm that uses versioned data structures, change detection, and node identity. To support software product line derivation or modeling of software product line, we implemented support for feature model including editing and debugging. Finally, we created the version-aware XML document framework to support collaborative editing of XML documents without requiring a version repository. The version history is embedded in the documents using XML namespaces, so that the documents remain valid under the XML specification. The version-aware XML framework can also be used to support the exporting of documents from Molhado SPL repository to be edit outside and import back the change history made to the document. We evaluated Molhado SPL with two product lines: a document product line and a the graph data structures product line. This evaluation showed that Molhado SPL supports independently evolution of products and core assets and the eight change propagation cases. We did not evaluate MolhadoSPL in terms of scalability or usability. The main contributions of this dissertation research are: 1) Molhado SPL that supports the evolution of product lines, 2) a fast 3-way XML merge algorithm, 3) a version-aware XML document framework, and 4) a feature model editor and debugger

    Decibel: the relational dataset branching system

    Get PDF
    As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these short-comings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs.National Science Foundation (U.S.) (1513972)National Science Foundation (U.S.) (1513407)National Science Foundation (U.S.) (1513443)Intel Science and Technology Center for Big Dat

    User Interfaces and Difference Visualizations for Alternatives

    Get PDF
    Designers often create multiple iterations to evaluate alternatives. Todays computer-based tools do not support such easy exploration of a design space, despite the fact that such support has been advocated. This dissertation is centered on this. I begin by investigating the effectiveness of various forms of difference visualizations and support for merging changes within a system targeted at diagrams with node and edge attributes. I evaluated the benefits of the introduced difference visualization techniques in two user studies. I found that the basic side-by-side juxtaposition visualization was not effective and also not well received. For comparing diagrams with matching node positions, participants preferred the side-by-side option with a difference layer. For diagrams with non-matching positions animation was beneficial, but the combination with a difference layer was preferred. Thus, the difference layer technique was useful and a good complement to animation. I continue by investigating if explicit support for design alternatives better supports exploration and creativity in a generative design system. To investigate the new techniques to better support exploration, I built a new system that supports parallel exploration of alternative designs and generation of new structural combinations. I investigate the usefulness of my prototype in two user studies and interviews. The results and feedback suggest and confirm that supporting design alternatives explicitly enables designers to work more creatively. Generative models are often represented as DAGs (directed acyclic graphs) in a dataflow programming environment. Existing approaches to compare such DAGs do not generalize to multiple alternatives. Informed by and building on the first part of my dissertation, I introduce a novel user interface that enables visual differencing and editing alternative graphsspecifically more than two alternatives simultaneously, something that has not been presented before. I also explore multi-monitor support to demonstrate that the difference visualization technique scales well to up to 18 alternatives. The novel jamming space feature makes organizing alternatives on a 23 monitor system easier. To investigate the usability of the new difference visualization method I conducted an exploratory interview with three expert designers. The received comments confirmed that it meets their design goals

    Management and Visualisation of Non-linear History of Polygonal 3D Models

    Get PDF
    The research presented in this thesis concerns the problems of maintenance and revision control of large-scale three dimensional (3D) models over the Internet. As the models grow in size and the authoring tools grow in complexity, standard approaches to collaborative asset development become impractical. The prevalent paradigm of sharing files on a file system poses serious risks with regards, but not limited to, ensuring consistency and concurrency of multi-user 3D editing. Although modifications might be tracked manually using naming conventions or automatically in a version control system (VCS), understanding the provenance of a large 3D dataset is hard due to revision metadata not being associated with the underlying scene structures. Some tools and protocols enable seamless synchronisation of file and directory changes in remote locations. However, the existing web-based technologies are not yet fully exploiting the modern design patters for access to and management of alternative shared resources online. Therefore, four distinct but highly interconnected conceptual tools are explored. The first is the organisation of 3D assets within recent document-oriented No Structured Query Language (NoSQL) databases. These "schemaless" databases, unlike their relational counterparts, do not represent data in rigid table structures. Instead, they rely on polymorphic documents composed of key-value pairs that are much better suited to the diverse nature of 3D assets. Hence, a domain-specific non-linear revision control system 3D Repo is built around a NoSQL database to enable asynchronous editing similar to traditional VCSs. The second concept is that of visual 3D differencing and merging. The accompanying 3D Diff tool supports interactive conflict resolution at the level of scene graph nodes that are de facto the delta changes stored in the repository. The third is the utilisation of HyperText Transfer Protocol (HTTP) for the purposes of 3D data management. The XML3DRepo daemon application exposes the contents of the repository and the version control logic in a Representational State Transfer (REST) style of architecture. At the same time, it manifests the effects of various 3D encoding strategies on the file sizes and download times in modern web browsers. The fourth and final concept is the reverse-engineering of an editing history. Even if the models are being version controlled, the extracted provenance is limited to additions, deletions and modifications. The 3D Timeline tool, therefore, implies a plausible history of common modelling operations such as duplications, transformations, etc. Given a collection of 3D models, it estimates a part-based correspondence and visualises it in a temporal flow. The prototype tools developed as part of the research were evaluated in pilot user studies that suggest they are usable by the end users and well suited to their respective tasks. Together, the results constitute a novel framework that demonstrates the feasibility of a domain-specific 3D version control

    Synchronized Architecture Evolution in Software Product Line Using Bidirectional Transformation

    Full text link

    Reengineering legacy software products into software product line

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Handling High-Level Model Changes Using Search Based Software Engineering

    Full text link
    Model-Driven Engineering (MDE) considers models as first-class artifacts during the software lifecycle. The number of available tools, techniques, and approaches for MDE is increasing as its use gains traction in driving quality, and controlling cost in evolution of large software systems. Software models, defined as code abstractions, are iteratively refined, restructured, and evolved. This is due to many reasons such as fixing defects in design, reflecting changes in requirements, and modifying a design to enhance existing features. In this work, we focus on four main problems related to the evolution of software models: 1) the detection of applied model changes, 2) merging parallel evolved models, 3) detection of design defects in merged model, and 4) the recommendation of new changes to fix defects in software models. Regarding the first contribution, a-posteriori multi-objective change detection approach has been proposed for evolved models. The changes are expressed in terms of atomic and composite refactoring operations. The majority of existing approaches detects atomic changes but do not adequately address composite changes which mask atomic operations in intermediate models. For the second contribution, several approaches exist to construct a merged model by incorporating all non-conflicting operations of evolved models. Conflicts arise when the application of one operation disables the applicability of another one. The essence of the problem is to identify and prioritize conflicting operations based on importance and context – a gap in existing approaches. This work proposes a multi-objective formulation of model merging that aims to maximize the number of successfully applied merged operations. For the third and fourth contributions, the majority of existing works focuses on refactoring at source code level, and does not exploit the benefits of software design optimization at model level. However, refactoring at model level is inherently more challenging due to difficulty in assessing the potential impact on structural and behavioral features of the software system. This requires analysis of class and activity diagrams to appraise the overall system quality, feasibility, and inter-diagram consistency. This work focuses on designing, implementing, and evaluating a multi-objective refactoring framework for detection and fixing of design defects in software models.Ph.D.Information Systems Engineering, College of Engineering and Computer ScienceUniversity of Michigan-Dearbornhttp://deepblue.lib.umich.edu/bitstream/2027.42/136077/1/Usman Mansoor Final.pdfDescription of Usman Mansoor Final.pdf : Dissertatio
    corecore