149 research outputs found

    Concept explainability for plant diseases classification

    Full text link
    Plant diseases remain a considerable threat to food security and agricultural sustainability. Rapid and early identification of these diseases has become a significant concern motivating several studies to rely on the increasing global digitalization and the recent advances in computer vision based on deep learning. In fact, plant disease classification based on deep convolutional neural networks has shown impressive performance. However, these methods have yet to be adopted globally due to concerns regarding their robustness, transparency, and the lack of explainability compared with their human experts counterparts. Methods such as saliency-based approaches associating the network output to perturbations of the input pixels have been proposed to give insights into these algorithms. Still, they are not easily comprehensible and not intuitive for human users and are threatened by bias. In this work, we deploy a method called Testing with Concept Activation Vectors (TCAV) that shifts the focus from pixels to user-defined concepts. To the best of our knowledge, our paper is the first to employ this method in the field of plant disease classification. Important concepts such as color, texture and disease related concepts were analyzed. The results suggest that concept-based explanation methods can significantly benefit automated plant disease identification.Comment: Accepted at VISAPP 202

    Reproducible Domain-Specific Knowledge Graphs in the Life Sciences: a Systematic Literature Review

    Full text link
    Knowledge graphs (KGs) are widely used for representing and organizing structured knowledge in diverse domains. However, the creation and upkeep of KGs pose substantial challenges. Developing a KG demands extensive expertise in data modeling, ontology design, and data curation. Furthermore, KGs are dynamic, requiring continuous updates and quality control to ensure accuracy and relevance. These intricacies contribute to the considerable effort required for their development and maintenance. One critical dimension of KGs that warrants attention is reproducibility. The ability to replicate and validate KGs is fundamental for ensuring the trustworthiness and sustainability of the knowledge they represent. Reproducible KGs not only support open science by allowing others to build upon existing knowledge but also enhance transparency and reliability in disseminating information. Despite the growing number of domain-specific KGs, a comprehensive analysis concerning their reproducibility has been lacking. This paper addresses this gap by offering a general overview of domain-specific KGs and comparing them based on various reproducibility criteria. Our study over 19 different domains shows only eight out of 250 domain-specific KGs (3.2%) provide publicly available source code. Among these, only one system could successfully pass our reproducibility assessment (14.3%). These findings highlight the challenges and gaps in achieving reproducibility across domain-specific KGs. Our finding that only 0.4% of published domain-specific KGs are reproducible shows a clear need for further research and a shift in cultural practices

    Provenance-based Semantic Approach for the Reproducibility of Scientific Experiments

    Get PDF
    Data provenance has become an integral part of the natural sciences where data flow through several complex steps of processing and analysis to generate intermediate and final results. To reproduce scientific experiments, scientists need to understand how the steps were performed in order to check the validity of the results. The scientific experiments consist of activities in the real world (e.g., wet lab or field work) and activities in cyberspace. Many scientists now write scripts as part of their field research for different tasks including data analysis, statistical modeling, numerical simulation, computation and visualization of results. Reproducibility of the computational and non-computational parts are important steps towards reproducibility of the experiments as a whole. In order to reproduce results or to detect which error occurred in the output, it is required to know which input data was responsible for the output, the steps involved in generating them, the devices and the materials used, the settings of the devices used, the dependencies, the agents involved and the execution environment etc. The aim of our work is to semantically describe the provenance of the complete execution of a scientific experiment in a structured form using linked data without worrying about any underlying technologies. In our work, we propose an approach to ensure this reproducibility by collecting the provenance data of the experiment and using the REPRODUCE-ME ontology extended from the existing W3C vocabularies to describe the steps and sequence of steps performed in an experiment. The ontology is developed to describe a scientific experiment along with its steps, input and output variables and their relationship with each other. The semantic layer on top of the captured provenance provided with ontology-based data access allows the scientists to understand and visualize the complete path taken in a computational experiment along with its execution environment. We also provide a provenance-based semantic approach which captures the data from interactive notebooks in a multi-user environment provided by JupyterHub and semantically describe the data using the REPRODUCE-ME ontology

    Mobile Datenbanken - heute, morgen und in 20 Jahren. Tagungsband zum 8. Workshop des GI-Arbeitskreises "Mobile Datenbanken und Informationssysteme" am 28.2.2005 im Rahmen der BTW 2005 in Karlsruhe

    Get PDF
    Der Workshop Mobile Datenbanken heute, morgen und in 20 Jahren ist der nunmehr achte Workshop des GI Arbeitskreises Mobile Datenbanken und Informationssysteme. Der Workshop findet im Rahmen der BTW 2005, der GI Fachtagung für Datenbanksysteme in Business, Technologie und Web, vom 28. Februar bis zum 01. März 2005 in Karlsruhe statt. Das Workshopprogramm umfasst zwei eingeladene Vorträge sowie sieben wissenschaftliche Beiträge, die vom Programmkomitee aus den Einreichungen ausgewählt wurden. Für den zweiten Workshoptag, der im Zeichen intensiver Diskussionen stehen soll, wurden zwei weitere Einreichungen als Diskussionsgrundlage ausgewählt. Inhaltlich spannt der Workshop einen weiten Bogen: Von fast schon klassischen Fragen aus dem Kernbereich mobiler Datenbanken, wie etwa der Transaktionsbearbeitung in diesen Systemen, bis hin zu neuen Multimediaanwendungen auf mobilen Geräten und von der Anfragebearbeitung in Ad-hoc-Netzen bis zur Analyse des Stands der Technik beim Entwurf mobiler Anwendungen. Diese Breite spiegelt die Breite der Fragestellungen, die bei der Betrachtung von mobiler Informationsnutzung zu Tage treten, wider. Wir hoffen mit unserem Workshop einen Beitrag zum besseren Verständnis dieser Fragestellungen zu liefern und ein Forum zum Austausch von Fragen, Lösungsansätzen und Problemstellungen zwischen Praktikern und Forschern aus dem universitären Umfeld zu bieten

    From human experts to machines: An LLM supported approach to ontology and knowledge graph construction

    Full text link
    The conventional process of building Ontologies and Knowledge Graphs (KGs) heavily relies on human domain experts to define entities and relationship types, establish hierarchies, maintain relevance to the domain, fill the ABox (or populate with instances), and ensure data quality (including amongst others accuracy and completeness). On the other hand, Large Language Models (LLMs) have recently gained popularity for their ability to understand and generate human-like natural language, offering promising ways to automate aspects of this process. This work explores the (semi-)automatic construction of KGs facilitated by open-source LLMs. Our pipeline involves formulating competency questions (CQs), developing an ontology (TBox) based on these CQs, constructing KGs using the developed ontology, and evaluating the resultant KG with minimal to no involvement of human experts. We showcase the feasibility of our semi-automated pipeline by creating a KG on deep learning methodologies by exploiting scholarly publications. To evaluate the answers generated via Retrieval-Augmented-Generation (RAG) as well as the KG concepts automatically extracted using LLMs, we design a judge LLM, which rates the generated content based on ground truth. Our findings suggest that employing LLMs could potentially reduce the human effort involved in the construction of KGs, although a human-in-the-loop approach is recommended to evaluate automatically generated KGs

    Data lifecycle is not a cycle, but a plane!

    Get PDF
    Most of the data-intensive scientific domains, e.g., life-, natural-, and geo-sciences have come up with data life cycles. These cycles feature, in various ways, a set of core data-centric steps, e.g., planning, collecting, describing, integrating, analyzing, and publishing. Although they differ in the steps they identify and the execution order, they collectively suffer from a collection of short-comings. They mainly promote a waterfall-like model of sequentially executing the lifecycles’ steps. For example, the lifecycle used by DataOne suggests that “analyze” happens after "integrate". However, in practice, a scientist may need to analyze data without performing the integration. In general, scientists may not need to accomplish all the steps. Also, in many cases, they simply jump from, e.g., "collect" to "analyze" in order to evaluate the feasibility and fitness of the data and then return to "describe" and "preserve" steps. This causes the cycle to gradually turn into a mesh. Indeed, this problem has been recognized and dealt with by the GFBio and USGS data lifecycles. The former has added a set of direct links between non-neighboring steps to allow shortcuts, while the later has factored out cross-cutting steps, e.g., "describe" and "manage quality" and argued that these tasks must be performed continually across all stages of the lifecycle. Although aforementioned lifecycles have realized these issues, they do not offer customization guidelines based on, e.g., project requirements, resources availability, priority, or effort estimations. In this work, we propose a two-dimensional Cartesian-like plane, in that the x- and y-axes represent phases and disciplines, respectively. A phase is a stage of the project with a predefined focus that that leads the work towards achieving a set of targeted objectives in a specific timespan. We identify four phases; conception, implementation, publishing, and preservation. Phases can be repeated in a run, and do not need to have equal timespan. However, each phase should satisfy its exit criteria to be able to proceed to the next phase. A discipline, on the vertical axis, is a set of correlated activities that, when performed, makes a measurable progress in the data-centric project. We have incorporated these disciplines: plan, acquire, assure, describe, preserve, discover, integrate, analyze, maintain, and execute. An execution plan is developed by placing required activities in their respective disciplines’ lanes on the plane. Each task (activity instance) is visualized as a rectangle that its width and height respectively indicate the duration and effort estimation needed to complete it. The phases, as well as the characteristics of the project (requirements, size, team, time, and budget), may influence these dimensions. It is possible for a discipline or an activity to be utilized several times in different phases. For example, a planning activity gains more weight in conception and fades out over the course of the project, while analysis activities start in mid-conception, get full focus on implementation, and may still need some attention during publishing phases. Also, multiple activities of different disciplines can run in parallel. However, each task's objective should remain aligned according to the phase’s focus and exit criteria. For instance, an analysis task in the conception phase may utilize multiple methodologies to perform experimentation on a small sample of a designated dataset, while the same task in the implementation phase conducts a full-fledged analysis using the chosen methodology on the whole datase

    Engineering incentive schemes for ad hoc networks: a case study for the lanes overlay [online]

    Get PDF
    In ad hoc networks, devices have to cooperate in order to compensate for the absence of infrastructure. Yet, autonomous devices tend to abstain from cooperation in order to save their own resources. Incentive schemes have been proposed as a means of fostering cooperation under these circumstances. In order to work effectively, incentive schemes need to be carefully tailored to the characteristics of the cooperation protocol they should support. This is a complex and demanding task. However, up to now, engineers are given virtually no help in designing an incentive scheme. Even worse, there exists no systematic investigation into which characteristics should be taken into account and what they imply. Therefore, in this paper, we propose a systematic approach for the engineering of incentive schemes. The suggested procedure comprises the analysis and adjustment of the cooperation protocol, the choice of appropriate incentives for cooperation, and guidelines for the evaluation of the incentive scheme. Finally, we show how the proposed procedure is successfully applied to a service discovery overlay
    corecore