20,795 research outputs found

    Ontology-based explanation of classifiers

    Get PDF
    The rise of data mining and machine learning use in many applications has brought new challenges related to classification. Here, we deal with the following challenge: how to interpret and understand the reason behind a classifier's prediction. Indeed, understanding the behaviour of a classifier is widely recognized as a very important task for wide and safe adoption of machine learning and data mining technologies, especially in high-risk domains, and in dealing with bias.We present a preliminary work on a proposal of using the Ontology-Based Data Management paradigm for explaining the behavior of a classifier in terms of the concepts and the relations that are meaningful in the domain that is relevant for the classifier

    Introducing Dynamic Behavior in Amalgamated Knowledge Bases

    Full text link
    The problem of integrating knowledge from multiple and heterogeneous sources is a fundamental issue in current information systems. In order to cope with this problem, the concept of mediator has been introduced as a software component providing intermediate services, linking data resources and application programs, and making transparent the heterogeneity of the underlying systems. In designing a mediator architecture, we believe that an important aspect is the definition of a formal framework by which one is able to model integration according to a declarative style. To this purpose, the use of a logical approach seems very promising. Another important aspect is the ability to model both static integration aspects, concerning query execution, and dynamic ones, concerning data updates and their propagation among the various data sources. Unfortunately, as far as we know, no formal proposals for logically modeling mediator architectures both from a static and dynamic point of view have already been developed. In this paper, we extend the framework for amalgamated knowledge bases, presented by Subrahmanian, to deal with dynamic aspects. The language we propose is based on the Active U-Datalog language, and extends it with annotated logic and amalgamation concepts. We model the sources of information and the mediator (also called supervisor) as Active U-Datalog deductive databases, thus modeling queries, transactions, and active rules, interpreted according to the PARK semantics. By using active rules, the system can efficiently perform update propagation among different databases. The result is a logical environment, integrating active and deductive rules, to perform queries and update propagation in an heterogeneous mediated framework.Comment: Other Keywords: Deductive databases; Heterogeneous databases; Active rules; Update

    Samples and data accessibility in research biobanks. An explorative survey

    Get PDF
    Biobanks, which contain human biological samples and/or data, provide a crucial contribution to the progress of biomedical research. However, the effective and efficient use of biobank resources depends on their accessibility. In fact, making bio-resources promptly accessible to everybody may increase the benefits for society. Furthermore, optimizing their use and ensuring their quality will promote scientific creativity and, in general, contribute to the progress of bio-medical research. Although this has become a rather common belief, several laboratories are still secretive and continue to withhold samples and data. In this study, we conducted a questionnairebased survey in order to investigate sample and data accessibility in research biobanks operating all over the world. The survey involved a total of 46 biobanks. Most of them gave permission to access their samples (95.7%) and data (85.4%), but free and unconditioned accessibility seemed not to be common practice. The analysis of the guidelines regarding the accessibility to resources of the biobanks that responded to the survey highlights three issues: (i) the request for applicants to explain what they would like to do with the resources requested; (ii) the role of funding, public or private, in the establishment of fruitful collaborations between biobanks and research labs; (iii) the request of co-authorship in order to give access to their data. These results suggest that economic and academic aspects are involved in determining the extent of sample and data sharing stored in biobanks. As a second step of this study, we investigated the reasons behind the high diversity of requirements to access biobank resources. The analysis of informative answers suggested that the different modalities of resource accessibility seem to be largely influenced by both social context and legislation of the countries where the biobanks operate

    Learning Tuple Probabilities

    Get PDF
    Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from labeled lineage formulas. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, cast it into an optimization problem, and provide an algorithm based on stochastic gradient descent. Finally, we conclude by an experimental evaluation on three real-world and one synthetic dataset, thus comparing our approach to various techniques from SRL, reasoning in information extraction, and optimization

    A logic programming framework for modeling temporal objects

    Get PDF
    Published versio
    • …
    corecore