49 research outputs found

    Provenance Threat Modeling

    Full text link
    Provenance systems are used to capture history metadata, applications include ownership attribution and determining the quality of a particular data set. Provenance systems are also used for debugging, process improvement, understanding data proof of ownership, certification of validity, etc. The provenance of data includes information about the processes and source data that leads to the current representation. In this paper we study the security risks provenance systems might be exposed to and recommend security solutions to better protect the provenance information.Comment: 4 pages, 1 figure, conferenc

    Using schema transformation pathways for data lineage tracing

    Get PDF
    With the increasing amount and diversity of information available on the Internet, there has been a huge growth in information systems that need to integrate data from distributed, heterogeneous data sources. Tracing the lineage of the integrated data is one of the problems being addressed in data warehousing research. This paper presents a data lineage tracing approach based on schema transformation pathways. Our approach is not limited to one specific data model or query language, and would be useful in any data transformation/integration framework based on sequences of primitive schema transformations

    Towards Automatic Capturing of Manual Data Processing Provenance

    Get PDF
    Often data processing is not implemented by a work ow system or an integration application but is performed manually by humans along the lines of a more or less specified procedure. Collecting provenance information during manual data processing can not be automated. Further, manual collection of provenance information is error prone and time consuming. Therefore, we propose to infer provenance information based on the read and write access of users. The derived provenance information is complete, but has a low precision. Therefore, we propose further to introducing organizational guidelines in order to improve the precision of the inferred provenance information

    A framework for detecting unnecessary industrial data in ETL processes

    Get PDF
    Extract transform and load (ETL) is a critical process used by industrial organisations to shift data from one database to another, such as from an operational system to a data warehouse. With the increasing amount of data stored by industrial organisations, some ETL processes can take in excess of 12 hours to complete; this can leave decision makers stranded while they wait for the data needed to support their decisions. After designing the ETL processes, inevitably data requirements can change, and much of the data that goes through the ETL process may not ever be used or needed. This paper therefore proposes a framework for dynamically detecting and predicting unnecessary data and preventing it from slowing down ETL processes - either by removing it entirely or deprioritizing it. Other advantages of the framework include being able to prioritise data cleansing tasks and determining what data should be processed first and placed into fast access memory. We show existing example algorithms that can be used for each component of the framework, and present some initial testing results as part of our research to determine whether the framework can help to reduce ETL time.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/INDIN.2014.694555

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Instructive of Ooze Information

    Get PDF
    We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and bring into being in an unconstitutional place (e.g., on the web or somebody2019;s laptop). The distributor must evaluate the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data distribution strategies (across the agents) that improve the likelihood of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject 201C;realistic but replica201D; data records to further improve our chances of detecting leakage and identifying the guilty party. In the course of doing business, sometimes sensitive data must be handed over to supposedly trusted third parties. For example, a hospital may give patient records to Researchers who will devise new treatments. Similarly, a company may have partnerships with other companies that require sharing customer data. Another enterprise may outsource its data processing, so data must be given to various other companies. There always remains a risk of data getting leaked from the agent. Perturbation is a very valuable technique where the data are modified and made 201C;less sensitive201D; before being handed to agents. For example, one can add random noise to certain attributes, or one can replace exact values by ranges. But this technique requires modification of data. Leakage detection is handled by watermarking, e.g., a unique code is implanted in each distributed copy. If that copy is later discovered in the hands of an unconstitutional party, the leaker can be identified. But again it requires code modification. Watermarks can sometimes be destroyed if the data recipient is malicious

    Data Leakage Detection

    Get PDF

    Middleware non-repudiation service for the data warehouse

    Get PDF
    Nowadays, storing the information is fundamental for the correct functioning of any organization. The critical factor is to guarantee the security of the stored data. In the traditional database systems the security requirements are limited to confidentiality, integrity, availability of the data and user authorization. The criticality of the database system and data repositories for modern business with the new requirements of law and governments, makes the development of new system architecture necessary which ensures sophisticated set of security services. In this paper we propose the database architecture that ensures the non-repudiation of the user queries and data warehouse actions. These security services are accomplished by means of the middleware layer in the data warehouse architecture

    Using domain ontologies to help track data provenance.

    Get PDF
    Motivating example. POESIA ontologies and ontological coverages. Ontological estimation of data provenance. Ontological nets for data integration. Data integration operators. Data reconciling through articulation of ontologies. Semantic workflows. Related work. Conclusions
    corecore