22,688 research outputs found

    PRIVAS - automatic anonymization of databases

    Get PDF
    Currently, given the technological evolution, data and information are increasingly valuable in the most diverse areas for the most various purposes. Although the information and knowledge discovered by the exploration and use of data can be very valuable in many applications, people have been increasingly concerned about the other side, that is, the privacy threats that these processes bring. The system Privas, described in this paper, will aid the Data Publisher to pre-process the database before publishing. For that, a DSL is used to define the database schema description, identify the sensitive data and the desired privacy level. After that a Privas processor will process the DSL program and interpret it to automatically transform the repository schema. The automatization of the anonymization process is the main contribution and novelty of this work.info:eu-repo/semantics/publishedVersio

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    Using schema transformation pathways for data lineage tracing

    Get PDF
    With the increasing amount and diversity of information available on the Internet, there has been a huge growth in information systems that need to integrate data from distributed, heterogeneous data sources. Tracing the lineage of the integrated data is one of the problems being addressed in data warehousing research. This paper presents a data lineage tracing approach based on schema transformation pathways. Our approach is not limited to one specific data model or query language, and would be useful in any data transformation/integration framework based on sequences of primitive schema transformations

    An Ontology Based Method to Solve Query Identifier Heterogeneity in Post-Genomic Clinical Trials

    Get PDF
    The increasing amount of information available for biomedical research has led to issues related to knowledge discovery in large collections of data. Moreover, Information Retrieval techniques must consider heterogeneities present in databases, initially belonging to different domains—e.g. clinical and genetic data. One of the goals, among others, of the ACGT European is to provide seamless and homogeneous access to integrated databases. In this work, we describe an approach to overcome heterogeneities in identifiers inside queries. We present an ontology classifying the most common identifier semantic heterogeneities, and a service that makes use of it to cope with the problem using the described approach. Finally, we illustrate the solution by analysing a set of real queries
    • 

    corecore