7 research outputs found

    Towards Automatic Capturing of Manual Data Processing Provenance

    Get PDF
    Often data processing is not implemented by a work ow system or an integration application but is performed manually by humans along the lines of a more or less specified procedure. Collecting provenance information during manual data processing can not be automated. Further, manual collection of provenance information is error prone and time consuming. Therefore, we propose to infer provenance information based on the read and write access of users. The derived provenance information is complete, but has a low precision. Therefore, we propose further to introducing organizational guidelines in order to improve the precision of the inferred provenance information

    A policy language definition for provenance in pervasive computing

    Get PDF
    Recent advances in computing technology have led to the paradigm of pervasive computing, which provides a means of simplifying daily life by integrating information processing into the everyday physical world. Pervasive computing draws its power from knowing the surroundings and creates an environment which combines computing and communication capabilities. Sensors that provide high-resolution spatial and instant measurement are most commonly used for forecasting, monitoring and real-time environmental modelling. Sensor data generated by a sensor network depends on several influences, such as the configuration and location of the sensors or the processing performed on the raw measurements. Storing sufficient metadata that gives meaning to the recorded observation is important in order to draw accurate conclusions or to enhance the reliability of the result dataset that uses this automatically collected data. This kind of metadata is called provenance data, as the origin of the data and the process by which it arrived from its origin are recorded. Provenance is still an exploratory field in pervasive computing and many open research questions are yet to emerge. The context information and the different characteristics of the pervasive environment call for different approaches to a provenance support system. This work implements a policy language definition that specifies the collecting model for provenance management systems and addresses the challenges that arise with stream data and sensor environments. The structure graph of the proposed model is mapped to the Open Provenance Model in order to facilitating the sharing of provenance data and interoperability with other systems. As provenance security has been recognized as one of the most important components in any provenance system, an access control language has been developed that is tailored to support the special requirements of provenance: fine-grained polices, privacy policies and preferences. Experimental evaluation findings show a reasonable overhead for provenance collecting and a reasonable time for provenance query performance, while a numerical analysis was used to evaluate the storage overhead

    Um modelo de proveniência para extração de tendências em séries temporais

    Get PDF
    Orientador : Prof. Dr. Marcos Sfair SunyeCo-orientadora : Profa. Dra. Maria Salete Marcon Gomes VazTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 29/08/2014Inclui referências : f. 201-216Resumo: Muitas áreas do conhecimento estão relacionadas com a análise de séries temporais, as quais são constituídas por uma sequencia de observações de dados sobre o tempo. A análise de séries temporais difere da análise de dados tradicional, dada sua natureza intrínseca, onde as observações são dependentes. Nesse caso, procedimentos estatísticos considerando a independência dos dados não se aplicam, sendo necessário o uso de métodos específicos. Geralmente, a análise de séries temporais ocorre em duas fases, pré-processamento e análise dos dados. Na fase de pré-processamento, são feitas correções para remoção de fenômenos que ocorrem ao longo do tempo, como a extração de tendências (detrending). Vários softwares de detrending podem ser aplicados para esse fim, melhorando a análise, assim como a maioria dos métodos estatísticos são desenvolvidos para séries temporais estacionárias. Em um processo de detrending, informações de proveniência sobre as séries temporais e como as mesmas foram corrigidas de tendências nem sempre são explícitas e de fácil interpretação. Tais informações podem ser obtidas pelo uso de metadados, os quais podem gerar ambiguidades nos resultados gerados, assim como podem ser insuficientes para semanticamente enriquecer o processo de detrending. Por outro lado, ontologias permitem gerar e compartilhar conhecimento sobre as séries temporais e sobre os métodos estatísticos aplicados para sua correção, assim como permitem inferências. O principal objetivo desta tese é definir um modelo de proveniência usando ontologias para enriquecer semanticamente a extração de tendências em séries temporais. O modelo é validado por um estudo de caso com séries temporais fotométricas reais. A principal contribuição é a geração de conhecimento semântico, permitindo identificar, além dos dados, agentes e processos envolvidos, informações quanto aos métodos estatísticos usados para detrending, facilitando o entendimento de como as séries temporais foram geradas e corrigidas, melhorando a tomada de decisão quanto ao uso de métodos estatísticos. O ineditismo desta tese é a definição de um modelo de proveniência para extração de tendências, apresentando um projeto modular, centrado no reuso e na extensão de ontologias para gerar proveniência sobre séries temporais e processos de detrending, enriquecendo semanticamente um passo relevante da fase de pré-processamento da análise de séries temporais, contribuindo para a geração do conhecimento científico. Palavras-chave: Modelo de Proveniência, Ontologias, OWL, Séries Temporais Não-Estacionárias, Extração de TendênciasAbstract: Nowadays, many knowledge areas are related with the time series analysis, which are constituted by a sequence of data observation at the time. The time series analysis is different from the traditional data analysis, due to their intrinsic nature, where the observations are dependent. In this case, statistical procedures considering the data?s independence are not applied, being necessary the use of specific methods. Usually, the time series analysis occurs in two phases, preprocessing and data analysis. In the preprocessing phase, corrections are done to remove phenomena that occur throughout the time, like the trend extraction (detrending). Many detrending software can be applied for this objective, improving the analysis, as well as the most of statistical methods are developed to stationary time series. In a detrending process, provenance information about the time series and how the time series were detrended are not always explicit and easy to interpret. Such information can be obtained by metadata, which can generate ambiguity in the results generated and they can also be insufficient to semantically enrich the detrending process. On the other hand, ontologies allow generating and sharing knowledge about the time series and on the statistical methods used for it?s correction, as well as allow inferences. The main goal of this doctoral thesis is to define a provenance model using ontologies to semantically enrich the trend extraction of time series. The model is validated by a case study involving real photometric time series. The main contribution is the semantic knowledge generation, allowing to identify, besides the data, agents and process involved, information about the statistical methods used for detrending, facilitating the understanding about how the time series were generated and corrected, improving the decision making related with the statistical methods applicability. The novelty of this doctoral thesis is the definition of a provenance model for trend extraction, presenting a modular design, centered on reuse and on the ontologies extension to generate provenance about time series and detrending processes, enriching semantically a relevant step of preprocessing phase of the time series analysis, contributing to the generation of the scientific knowledge. Keywords: Provenance Model, Ontologies, OWL, Nonstationary Time Series, Detrendin
    corecore