20,795 research outputs found
Ontology-based explanation of classifiers
The rise of data mining and machine learning use in many applications has brought new challenges related to classification. Here, we deal with the following challenge: how to interpret and understand the reason behind a classifier's prediction. Indeed, understanding the behaviour of a classifier is widely recognized as a very important task for wide and safe adoption of machine learning and data mining technologies, especially in high-risk domains, and in dealing with bias.We present a preliminary work on a proposal of using the Ontology-Based Data Management paradigm for explaining the behavior of a classifier in terms of the concepts and the relations that are meaningful in the domain that is relevant for the classifier
Introducing Dynamic Behavior in Amalgamated Knowledge Bases
The problem of integrating knowledge from multiple and heterogeneous sources
is a fundamental issue in current information systems. In order to cope with
this problem, the concept of mediator has been introduced as a software
component providing intermediate services, linking data resources and
application programs, and making transparent the heterogeneity of the
underlying systems. In designing a mediator architecture, we believe that an
important aspect is the definition of a formal framework by which one is able
to model integration according to a declarative style. To this purpose, the use
of a logical approach seems very promising. Another important aspect is the
ability to model both static integration aspects, concerning query execution,
and dynamic ones, concerning data updates and their propagation among the
various data sources. Unfortunately, as far as we know, no formal proposals for
logically modeling mediator architectures both from a static and dynamic point
of view have already been developed. In this paper, we extend the framework for
amalgamated knowledge bases, presented by Subrahmanian, to deal with dynamic
aspects. The language we propose is based on the Active U-Datalog language, and
extends it with annotated logic and amalgamation concepts. We model the sources
of information and the mediator (also called supervisor) as Active U-Datalog
deductive databases, thus modeling queries, transactions, and active rules,
interpreted according to the PARK semantics. By using active rules, the system
can efficiently perform update propagation among different databases. The
result is a logical environment, integrating active and deductive rules, to
perform queries and update propagation in an heterogeneous mediated framework.Comment: Other Keywords: Deductive databases; Heterogeneous databases; Active
rules; Update
Samples and data accessibility in research biobanks. An explorative survey
Biobanks, which contain human biological samples and/or data, provide a crucial contribution to the progress of biomedical research. However, the effective and efficient use of biobank resources depends on their accessibility. In fact, making bio-resources promptly accessible to everybody may increase the benefits for society. Furthermore, optimizing their use and ensuring their quality will promote scientific creativity and, in general, contribute to the progress of bio-medical research. Although this has become a rather common belief, several laboratories are still secretive and continue to withhold samples and data. In this study, we conducted a questionnairebased survey in order to investigate sample and data accessibility in research biobanks operating all over the world. The survey involved a total of 46 biobanks. Most of them gave permission to access their samples (95.7%) and data (85.4%), but free and unconditioned accessibility seemed not to be common practice. The analysis of the guidelines regarding the accessibility to resources of the biobanks that responded to the survey highlights three issues: (i) the request for applicants to explain what they would like to do with the resources requested; (ii) the role of funding, public or private, in the establishment of fruitful collaborations between biobanks and research labs; (iii) the request of co-authorship in order to give access to their data. These results suggest that economic and academic aspects are involved in determining the extent of sample and data sharing stored in biobanks. As a second step of this study, we investigated the reasons behind the high diversity of requirements to access biobank resources. The analysis of informative answers suggested that the different modalities of resource accessibility seem to be largely influenced by both social context and legislation of the countries where the biobanks operate
Learning Tuple Probabilities
Learning the parameters of complex probabilistic-relational models from
labeled training data is a standard technique in machine learning, which has
been intensively studied in the subfield of Statistical Relational Learning
(SRL), but---so far---this is still an under-investigated topic in the context
of Probabilistic Databases (PDBs). In this paper, we focus on learning the
probability values of base tuples in a PDB from labeled lineage formulas. The
resulting learning problem can be viewed as the inverse problem to confidence
computations in PDBs: given a set of labeled query answers, learn the
probability values of the base tuples, such that the marginal probabilities of
the query answers again yield in the assigned probability labels. We analyze
the learning problem from a theoretical perspective, cast it into an
optimization problem, and provide an algorithm based on stochastic gradient
descent. Finally, we conclude by an experimental evaluation on three real-world
and one synthetic dataset, thus comparing our approach to various techniques
from SRL, reasoning in information extraction, and optimization
A logic programming framework for modeling temporal objects
Published versio
- …