8,125 research outputs found
The relationship between IR and multimedia databases
Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud
\ud
Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud
\ud
Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud
\ud
First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud
\ud
Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud
\ud
Third, we add the functionality to process the users' relevance feedback.\ud
\ud
We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud
\ud
We conclude with an outline for implementation of miRRor on top of the Monet extensible database system
08421 Abstracts Collection -- Uncertainty Management in Information Systems
From October 12 to 17, 2008 the Dagstuhl Seminar 08421 \u27`Uncertainty Management in Information Systems \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. The abstracts of the plenary and session talks given during the seminar as well as those of the shown demos are put together in this paper
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Type-Constrained Representation Learning in Knowledge Graphs
Large knowledge graphs increasingly add value to various applications that
require machines to recognize and understand queries and their semantics, as in
search or question answering systems. Latent variable models have increasingly
gained attention for the statistical modeling of knowledge graphs, showing
promising results in tasks related to knowledge graph completion and cleaning.
Besides storing facts about the world, schema-based knowledge graphs are backed
by rich semantic descriptions of entities and relation-types that allow
machines to understand the notion of things and their semantic relationships.
In this work, we study how type-constraints can generally support the
statistical modeling with latent variable models. More precisely, we integrated
prior knowledge in form of type-constraints in various state of the art latent
variable approaches. Our experimental results show that prior knowledge on
relation-types significantly improves these models up to 77% in link-prediction
tasks. The achieved improvements are especially prominent when a low model
complexity is enforced, a crucial requirement when these models are applied to
very large datasets. Unfortunately, type-constraints are neither always
available nor always complete e.g., they can become fuzzy when entities lack
proper typing. We show that in these cases, it can be beneficial to apply a
local closed-world assumption that approximates the semantics of relation-types
based on observations made in the data
Handling uncertainty in information extraction
This position paper proposes an interactive approach for developing information extractors based on the ontology definition process with knowledge about possible (in)correctness of annotations. We discuss the problem of managing and manipulating probabilistic dependencies
- …