10 research outputs found

    A methodology to take account of diversity in collective adaptive system

    No full text
    Collective Adaptive Systems (CASs) are comprised of a heterogeneous set of components often developed in a distributed manner. Their users are diverse with respect to their profiles, preferences, interests and goals, and hence, have different requirements. We propose a typology for the diversity of these components, users, and their requirements. We then present a methodology which provides steps to integrate features that record diversity to support accountability. The foundation of accountability is provided by provenance data, and a CAS vocabulary, these knowledge representation languages provide the core vocabulary that can be exploited by agents and services

    Detached Provenance Analysis

    Get PDF
    Data provenance is the research field of the algorithmic derivation of the source and processing history of data. In this work, the derivation of Where- and Why-provenance in sub-cell-level granularity is pursued for a rich SQL dialect. For example, we support the provenance analysis for individual elements of nested rows and/or arrays. The SQL dialect incorporates window functions and correlated subqueries. We accomplish this goal using a novel method called detached provenance analysis. This method carries out a SQL-level rewrite of any user query Q, yielding (Q1, Q2). Employing two queries facilitates a low-invasive provenance analysis, i.e. both queries can be evaluated using an unmodified DBMS as backend. The queries implement a split of responsibilities: Q1 carries out a runtime analysis and Q2 derives the actual data provenance. One drawback of this method is that a synchronization overhead between Q1 and Q2 is induced. Experiments quantify the overheads based on the TPC-H benchmark and the PostgreSQL DBMS. A second set of experiments carried out in row–level granularity compares our approach with the PERM approach (as described by B. Glavic et al.). The aggregated results show that basic queries (typically, a single SFW expression with aggregations) perform slightly better in the PERM approach while complex queries (nested SFW expressions and correlated subqueries) perform considerably better in our approach

    Big Data Analytics in Static and Streaming Provenance

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

    Curated Databases

    Get PDF
    Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases

    An Analysis of the Current Program Slicing and Algorithmic Debugging Based Techniques

    Full text link
    This thesis presents a classification of program slicing based techniques. The classification allows us to identify the differences between existing techniques, but it also allows us to predict new slicing techniques. The study identifies and compares the dimensions that influence current techniques.Silva Galiana, JF. (2008). An Analysis of the Current Program Slicing and Algorithmic Debugging Based Techniques. http://hdl.handle.net/10251/14300Archivo delegad

    Workflow Provenance: from Modeling to Reporting

    Get PDF
    Workflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights

    Language-integrated provenance

    Get PDF
    Provenance is metadata about the where, the why, and the how of data. It is evidence which can answer questions such as: Where exactly did this piece of data come from? Why is this row in my result? How was it produced? Answers to these questions are useful for judging the trustworthiness of data, and for finding and correcting mistakes. Most programs that use a database at all, already use one crude form of provenance: they manually propagate row identifiers together with database values, just in case they need to be updated later. More sophisticated forms of provenance are exceedingly rare, because they are more difficult to implement manually. Tools to calculate data provenance systematically, only exist as research prototypes. Even standard database systems are hard to set up, as evidenced by the rise of hosted database services, so there is little suprise that prototypes of provenance systems are not used much. This dissertation shows how a programming language can provide support for provenance. Based on language-integrated query technology, it can systematically rewrite queries to produce various forms of provenance. We describe such query transformations for where-provenance and lineage, and discuss how to enable programmers to define their own forms of provenance. Thanks to query normalization the resulting queries still execute efficiently on mainstream database systems. A programming language can help further by giving provenance metadata precise types to ensure that it is handled appropriately. Language-integrated queries make it easy to write programs that deal with data, no special query language needed. Language-integrated provenance makes it as easy to deal with data provenance, no special database needed

    Language-based Enforcement of User-defined Security Policies (As Applied to Multi-tier Web Programs)

    Get PDF
    Over the last 35 years, researchers have proposed many different forms of security policies to control how information is managed by software, e.g., multi-level information flow policies, role-based or history-based access control, data provenance management etc. A large body of work in programming language design and analysis has aimed to ensure that particular kinds of security policies are properly enforced by an application. However, these approaches typically fix the style of security policy and overall security goal, e.g., information flow policies with a goal of noninterference. This limits the programmer's ability to combine policy styles and to apply customized enforcement techniques while still being assured the system is secure. This dissertation presents a series of programming-language calculi each intended to verify the enforcement of a range of user-defined security policies. Rather than ``bake in'' the semantics of a particular model of security policy, our languages are parameterized by a programmer-provided specification of the policy and enforcement mechanism (in the form of code). Our approach relies on a novel combination of dependent types to correctly associate security policies with the objects they govern, and affine types to account for policy or program operations that include side effects. We have shown that our type systems are expressive enough to verify the enforcement of various forms of access control, provenance, information flow, and automata-based policies. Additionally, our approach facilitates straightforward proofs that programs implementing a particular policy achieve their high-level security goals. We have proved our languages sound and we have proved relevant security properties for each of the policies we have explored. To our knowledge, no prior framework enables the enforcement of such a wide variety of security policies with an equally high level of assurance. To evaluate the practicality of our solution, we have implemented one of our type systems as part of the Links web-programming language; we call the resulting language SELinks. We report on our experience using SELinks to build two substantial applications, a wiki and an on-line store, equipped with a combination of access control and provenance policies. In general, we have found the mechanisms SELinks provides to be both sufficient and relatively easy to use for many common policies, and that the modular separation of user-defined policy code permitted some reuse between the two applications