5 research outputs found

    Provenance documentation to enable explainable and trustworthy AI: A literature review

    Get PDF
    ABSTRACTRecently artificial intelligence (AI) and machine learning (ML) models have demonstrated remarkable progress with applications developed in various domains. It is also increasingly discussed that AI and ML models and applications should be transparent, explainable, and trustworthy. Accordingly, the field of Explainable AI (XAI) is expanding rapidly. XAI holds substantial promise for improving trust and transparency in AI-based systems by explaining how complex models such as the deep neural network (DNN) produces their outcomes. Moreover, many researchers and practitioners consider that using provenance to explain these complex models will help improve transparency in AI-based systems. In this paper, we conduct a systematic literature review of provenance, XAI, and trustworthy AI (TAI) to explain the fundamental concepts and illustrate the potential of using provenance as a medium to help accomplish explainability in AI-based systems. Moreover, we also discuss the patterns of recent developments in this area and offer a vision for research in the near future. We hope this literature review will serve as a starting point for scholars and practitioners interested in learning about essential components of provenance, XAI, and TAI

    Polyflow: a Polystore-compliant mechanism to provide interoperability to heterogeneous provenance graphs

    Get PDF
    Many scientific experiments are modeled as workflows. Workflows usually output massive amounts of data. To guarantee the reproducibility of workflows, they are usually orchestrated by Workflow Management Systems (WfMS), that capture provenance data. Provenance represents the lineage of a data fragment throughout its transformations by activities in a workflow. Provenance traces are usually represented as graphs. These graphs allows scientists to analyze and evaluate results produced by a workflow. However, each WfMS has a proprietary format for provenance and do it in different granularity levels. Therefore, in more complex scenarios in which the scientist needs to interpret provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To first understand the research landscape, we conduct a Systematic Literature Mapping, assessing existing solutions under several different lenses. With a clearer understanding of the state of the art, we propose a tool called Polyflow, which is based on the concept of Polystore systems, integrating several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. Polyflow was evaluated by experts using provenance data collected from real experiments that generate phylogenetic trees through workflows. The experiment results suggest that Polyflow is a viable solution for interoperating heterogeneous provenance data generated by different WfMSs, from both a usability and performance standpoint.Muitos experimentos científicos são modelados como workflows (fluxos de trabalho). Workflows produzem comumente um grande volume de dados. De forma a garantir a reprodutibilidade desses workflows, estes geralmente são orquestrados por Sistemas de Gerência de Workflows (SGWfs), garantindo que dados de proveniência sejam capturados. Dados de proveniência representam o histórico de derivação de um dado ao longo da execução do workflow. Assim, o histórico de derivação dos dados pode ser representado por meio de um grafo de proveniência. Este grafo possibilita aos cientistas analisarem e avaliarem resultados produzidos por um workflow. Todavia, cada SGWf tem seu formato proprietário de representação para dados de proveniência, e os armazenam em diferentes granularidades. Consequentemente, em cenários mais complexos em que um cientista precisa analisar de forma integrada grafos de proveniência gerados por múltiplos workflows, isso se torna desafiador. Primeiramente, para entender o campo de pesquisa, realizamos um Mapeamento Sistemático da Literatura, avaliando soluções existentes sob diferentes lentes. Com uma compreensão mais clara do atual estado da arte, propomos uma ferramenta chamada Polyflow, inspirada em conceitos de sistemas Polystore, possibilitando a integração de várias bases de dados heterogêneas por meio de uma interface de consulta única que utiliza o ProvONE como schema global. Polyflow permite que cientistas submetam consultas em múltiplos grafos de proveniência de maneira integrada. Polyflow foi avaliado em conjunto com especialistas usando dados de proveniência coletados de workflows reais que apoiam o estudo de geração de árvores filogenéticas. O resultado da avaliação mostrou a viabilidade do Polyflow para interoperar semanticamente dados de proveniência gerado por distintos SGWfs, tanto do ponto de vista de desempenho quanto de usabilidade

    Investigando a Mobilidade Urbana Através de Dados Abertos Governamentais Enriquecidos com Proveniência

    Get PDF
    Atualmente, os principais desafios para a consolidação das cidades inteligentes em países emergentes ainda são a limitada oferta de dados abertos de qualidade e a disponibilidade de ferramentas que aprofundem a colaboração entre o governo e a sociedade civil. Este artigo tem como objetivo contribuir com a oferta de estudos relacionados aos desafios da mobilidade urbana em cidades inteligentes. Apresentamos uma arquitetura distribuída e seu protótipo intitulado BusInRio. Diferentemente dos trabalhos relacionados, nossa proposta utiliza exclusivamente dados abertos governamentais enriquecidos por proveniência do tipo retrospectiva. Este artigo também avalia quantitativamente a proposta através de experimentos de campo baseados na análise de dados de proveniência oriundos das interações de usuários reais. As primeiras análises e resultados indicam os graus de correção e aceitação da proposta

    Yavaa: supporting data workflows from discovery to visualization

    Get PDF
    Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations
    corecore