149 research outputs found
Concept explainability for plant diseases classification
Plant diseases remain a considerable threat to food security and agricultural
sustainability. Rapid and early identification of these diseases has become a
significant concern motivating several studies to rely on the increasing global
digitalization and the recent advances in computer vision based on deep
learning. In fact, plant disease classification based on deep convolutional
neural networks has shown impressive performance. However, these methods have
yet to be adopted globally due to concerns regarding their robustness,
transparency, and the lack of explainability compared with their human experts
counterparts. Methods such as saliency-based approaches associating the network
output to perturbations of the input pixels have been proposed to give insights
into these algorithms. Still, they are not easily comprehensible and not
intuitive for human users and are threatened by bias. In this work, we deploy a
method called Testing with Concept Activation Vectors (TCAV) that shifts the
focus from pixels to user-defined concepts. To the best of our knowledge, our
paper is the first to employ this method in the field of plant disease
classification. Important concepts such as color, texture and disease related
concepts were analyzed. The results suggest that concept-based explanation
methods can significantly benefit automated plant disease identification.Comment: Accepted at VISAPP 202
Reproducible Domain-Specific Knowledge Graphs in the Life Sciences: a Systematic Literature Review
Knowledge graphs (KGs) are widely used for representing and organizing
structured knowledge in diverse domains. However, the creation and upkeep of
KGs pose substantial challenges. Developing a KG demands extensive expertise in
data modeling, ontology design, and data curation. Furthermore, KGs are
dynamic, requiring continuous updates and quality control to ensure accuracy
and relevance. These intricacies contribute to the considerable effort required
for their development and maintenance. One critical dimension of KGs that
warrants attention is reproducibility. The ability to replicate and validate
KGs is fundamental for ensuring the trustworthiness and sustainability of the
knowledge they represent. Reproducible KGs not only support open science by
allowing others to build upon existing knowledge but also enhance transparency
and reliability in disseminating information. Despite the growing number of
domain-specific KGs, a comprehensive analysis concerning their reproducibility
has been lacking. This paper addresses this gap by offering a general overview
of domain-specific KGs and comparing them based on various reproducibility
criteria. Our study over 19 different domains shows only eight out of 250
domain-specific KGs (3.2%) provide publicly available source code. Among these,
only one system could successfully pass our reproducibility assessment (14.3%).
These findings highlight the challenges and gaps in achieving reproducibility
across domain-specific KGs. Our finding that only 0.4% of published
domain-specific KGs are reproducible shows a clear need for further research
and a shift in cultural practices
Provenance-based Semantic Approach for the Reproducibility of Scientific Experiments
Data provenance has become an integral part of the natural sciences where data flow through several complex steps of processing and analysis to generate intermediate and final results. To reproduce scientific
experiments, scientists need to understand how the steps were performed in order to check the validity of the results. The scientific experiments consist of activities in the real world (e.g., wet lab or field work) and
activities in cyberspace. Many scientists now write scripts as part of their field research for different tasks including data analysis, statistical modeling, numerical simulation, computation and visualization of results.
Reproducibility of the computational and non-computational parts are important steps towards reproducibility of the experiments as a whole. In order to reproduce results or to detect which error occurred in the output, it is required to know which input data was responsible for the output, the steps involved in generating them, the devices and the materials used, the settings of the devices used, the dependencies, the agents involved
and the execution environment etc. The aim of our work is to semantically describe the provenance of the complete execution of a scientific
experiment in a structured form using linked data without worrying about any underlying technologies. In our work, we propose an approach to ensure this reproducibility by collecting the provenance data of the
experiment and using the REPRODUCE-ME ontology extended from the existing W3C vocabularies to describe the steps and sequence of steps performed in an experiment. The ontology is developed to
describe a scientific experiment along with its steps, input and output variables and their relationship with each other. The semantic layer on top of the captured provenance provided with ontology-based data access
allows the scientists to understand and visualize the complete path taken in a computational experiment along with its execution environment. We also provide a provenance-based semantic approach which
captures the data from interactive notebooks in a multi-user environment provided by JupyterHub and semantically describe the data using the REPRODUCE-ME ontology
Mobile Datenbanken - heute, morgen und in 20 Jahren. Tagungsband zum 8. Workshop des GI-Arbeitskreises "Mobile Datenbanken und Informationssysteme" am 28.2.2005 im Rahmen der BTW 2005 in Karlsruhe
Der Workshop Mobile Datenbanken heute, morgen und in 20
Jahren ist der nunmehr achte Workshop des GI Arbeitskreises
Mobile Datenbanken und Informationssysteme. Der Workshop
findet im Rahmen der BTW 2005, der GI Fachtagung für
Datenbanksysteme in Business, Technologie und Web, vom 28.
Februar bis zum 01. März 2005 in Karlsruhe statt.
Das Workshopprogramm umfasst zwei eingeladene Vorträge sowie
sieben wissenschaftliche Beiträge, die vom Programmkomitee aus
den Einreichungen ausgewählt wurden. Für den zweiten
Workshoptag, der im Zeichen intensiver Diskussionen stehen soll,
wurden zwei weitere Einreichungen als Diskussionsgrundlage
ausgewählt. Inhaltlich spannt der Workshop einen weiten Bogen:
Von fast schon klassischen Fragen aus dem Kernbereich mobiler
Datenbanken, wie etwa der Transaktionsbearbeitung in diesen
Systemen, bis hin zu neuen Multimediaanwendungen auf mobilen
Geräten und von der Anfragebearbeitung in Ad-hoc-Netzen bis zur
Analyse des Stands der Technik beim Entwurf mobiler Anwendungen.
Diese Breite spiegelt die Breite der Fragestellungen, die bei
der Betrachtung von mobiler Informationsnutzung zu Tage treten,
wider. Wir hoffen mit unserem Workshop einen Beitrag zum
besseren Verständnis dieser Fragestellungen zu
liefern und ein Forum zum Austausch von Fragen, Lösungsansätzen
und Problemstellungen zwischen Praktikern und Forschern aus dem
universitären Umfeld zu bieten
From human experts to machines: An LLM supported approach to ontology and knowledge graph construction
The conventional process of building Ontologies and Knowledge Graphs (KGs)
heavily relies on human domain experts to define entities and relationship
types, establish hierarchies, maintain relevance to the domain, fill the ABox
(or populate with instances), and ensure data quality (including amongst others
accuracy and completeness). On the other hand, Large Language Models (LLMs)
have recently gained popularity for their ability to understand and generate
human-like natural language, offering promising ways to automate aspects of
this process. This work explores the (semi-)automatic construction of KGs
facilitated by open-source LLMs. Our pipeline involves formulating competency
questions (CQs), developing an ontology (TBox) based on these CQs, constructing
KGs using the developed ontology, and evaluating the resultant KG with minimal
to no involvement of human experts. We showcase the feasibility of our
semi-automated pipeline by creating a KG on deep learning methodologies by
exploiting scholarly publications. To evaluate the answers generated via
Retrieval-Augmented-Generation (RAG) as well as the KG concepts automatically
extracted using LLMs, we design a judge LLM, which rates the generated content
based on ground truth. Our findings suggest that employing LLMs could
potentially reduce the human effort involved in the construction of KGs,
although a human-in-the-loop approach is recommended to evaluate automatically
generated KGs
Data lifecycle is not a cycle, but a plane!
Most of the data-intensive scientific domains, e.g., life-, natural-, and geo-sciences have come up with data life cycles. These cycles feature, in various ways, a set of core data-centric steps, e.g., planning, collecting, describing, integrating, analyzing, and publishing. Although they differ in the steps they identify and the execution order, they collectively suffer from a collection of short-comings.
They mainly promote a waterfall-like model of sequentially executing the lifecycles’ steps. For example, the lifecycle used by DataOne suggests that “analyze” happens after "integrate". However, in practice, a scientist may need to analyze data without performing the integration. In general, scientists may not need to accomplish all the steps. Also, in many cases, they simply jump from, e.g., "collect" to "analyze" in order to evaluate the feasibility and fitness of the data and then return to "describe" and "preserve" steps. This causes the cycle to gradually turn into a mesh. Indeed, this problem has been recognized and dealt with by the GFBio and USGS data lifecycles. The former has added a set of direct links between non-neighboring steps to allow shortcuts, while the later has factored out cross-cutting steps, e.g., "describe" and "manage quality" and argued that these tasks must be performed continually across all stages of the lifecycle. Although aforementioned lifecycles have realized these issues, they do not offer customization guidelines based on, e.g., project requirements, resources availability, priority, or effort estimations.
In this work, we propose a two-dimensional Cartesian-like plane, in that the x- and y-axes represent phases and disciplines, respectively. A phase is a stage of the project with a predefined focus that that leads the work towards achieving a set of targeted objectives in a specific timespan. We identify four phases; conception, implementation, publishing, and preservation. Phases can be repeated in a run, and do not need to have equal timespan. However, each phase should satisfy its exit criteria to be able to proceed to the next phase. A discipline, on the vertical axis, is a set of correlated activities that, when performed, makes a measurable progress in the data-centric project. We have incorporated these disciplines: plan, acquire, assure, describe, preserve, discover, integrate, analyze, maintain, and execute.
An execution plan is developed by placing required activities in their respective disciplines’ lanes on the plane. Each task (activity instance) is visualized as a rectangle that its width and height respectively indicate the duration and effort estimation needed to complete it. The phases, as well as the characteristics of the project (requirements, size, team, time, and budget), may influence these dimensions.
It is possible for a discipline or an activity to be utilized several times in different phases. For example, a planning activity gains more weight in conception and fades out over the course of the project, while analysis activities start in mid-conception, get full focus on implementation, and may still need some attention during publishing phases. Also, multiple activities of different disciplines can run in parallel. However, each task's objective should remain aligned according to the phase’s focus and exit criteria. For instance, an analysis task in the conception phase may utilize multiple methodologies to perform experimentation on a small sample of a designated dataset, while the same task in the implementation phase conducts a full-fledged analysis using the chosen methodology on the whole datase
Engineering incentive schemes for ad hoc networks: a case study for the lanes overlay [online]
In ad hoc networks, devices have to cooperate in order to
compensate for the absence of infrastructure. Yet, autonomous
devices tend to abstain from cooperation in order to save their
own resources.
Incentive schemes have been proposed as a means of fostering
cooperation under these circumstances. In order to work
effectively, incentive schemes need to be carefully tailored to
the characteristics of the cooperation protocol they should
support. This is a complex and demanding task. However, up to
now, engineers are given virtually no help in designing an
incentive scheme. Even worse, there exists no systematic
investigation into which characteristics should be taken into
account and what they imply. Therefore, in this paper, we
propose a systematic approach for the engineering of incentive
schemes. The suggested procedure comprises the analysis and
adjustment of the cooperation protocol, the choice of
appropriate incentives for cooperation, and guidelines for the
evaluation of the incentive scheme. Finally, we show how the
proposed procedure is successfully applied to a service
discovery overlay
- …