382,350 research outputs found
Type-Constrained Representation Learning in Knowledge Graphs
Large knowledge graphs increasingly add value to various applications that
require machines to recognize and understand queries and their semantics, as in
search or question answering systems. Latent variable models have increasingly
gained attention for the statistical modeling of knowledge graphs, showing
promising results in tasks related to knowledge graph completion and cleaning.
Besides storing facts about the world, schema-based knowledge graphs are backed
by rich semantic descriptions of entities and relation-types that allow
machines to understand the notion of things and their semantic relationships.
In this work, we study how type-constraints can generally support the
statistical modeling with latent variable models. More precisely, we integrated
prior knowledge in form of type-constraints in various state of the art latent
variable approaches. Our experimental results show that prior knowledge on
relation-types significantly improves these models up to 77% in link-prediction
tasks. The achieved improvements are especially prominent when a low model
complexity is enforced, a crucial requirement when these models are applied to
very large datasets. Unfortunately, type-constraints are neither always
available nor always complete e.g., they can become fuzzy when entities lack
proper typing. We show that in these cases, it can be beneficial to apply a
local closed-world assumption that approximates the semantics of relation-types
based on observations made in the data
The Role of Quasi-identifiers in k-Anonymity Revisited
The concept of k-anonymity, used in the recent literature to formally
evaluate the privacy preservation of published tables, was introduced based on
the notion of quasi-identifiers (or QI for short). The process of obtaining
k-anonymity for a given private table is first to recognize the QIs in the
table, and then to anonymize the QI values, the latter being called
k-anonymization. While k-anonymization is usually rigorously validated by the
authors, the definition of QI remains mostly informal, and different authors
seem to have different interpretations of the concept of QI. The purpose of
this paper is to provide a formal underpinning of QI and examine the
correctness and incorrectness of various interpretations of QI in our formal
framework. We observe that in cases where the concept has been used correctly,
its application has been conservative; this note provides a formal
understanding of the conservative nature in such cases.Comment: 17 pages. Submitted for publicatio
From Data Fusion to Knowledge Fusion
The task of {\em data fusion} is to identify the true values of data items
(eg, the true date of birth for {\em Tom Cruise}) among multiple observed
values drawn from different sources (eg, Web sites) of varying (and unknown)
reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of
various fusion methods on Deep Web data. In this paper, we study the
applicability and limitations of different fusion techniques on a more
challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true
subject-predicate-object triples extracted by multiple information extractors
from multiple information sources. These extractors perform the tasks of entity
linkage and schema alignment, thus introducing an additional source of noise
that is quite different from that traditionally considered in the data fusion
literature, which only focuses on factual errors in the original sources. We
adapt state-of-the-art data fusion techniques and apply them to a knowledge
base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B
Web pages, which is three orders of magnitude larger than the data sets used in
previous data fusion papers. We show great promise of the data fusion
approaches in solving the knowledge fusion problem, and suggest interesting
research directions through a detailed error analysis of the methods.Comment: VLDB'201
On the emergent Semantic Web and overlooked issues
The emergent Semantic Web, despite being in its infancy, has already received a lotof attention from academia and industry. This resulted in an abundance of prototype systems and discussion most of which are centred around the underlying infrastructure. However, when we critically review the work done to date we realise that there is little discussion with respect to the vision of the Semantic Web. In particular, there is an observed dearth of discussion on how to deliver knowledge sharing in an environment such as the Semantic Web in effective and efficient manners. There are a lot of overlooked issues, associated with agents and trust to hidden assumptions made with respect to knowledge representation and robust reasoning in a distributed environment. These issues could potentially hinder further development if not considered at the early stages of designing Semantic Web systems. In this perspectives paper, we aim to help engineers and practitioners of the Semantic Web by raising awareness of these issues
Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds
In this paper we address the problems of modeling the acoustic space
generated by a full-spectrum sound source and of using the learned model for
the localization and separation of multiple sources that simultaneously emit
sparse-spectrum sounds. We lay theoretical and methodological grounds in order
to introduce the binaural manifold paradigm. We perform an in-depth study of
the latent low-dimensional structure of the high-dimensional interaural
spectral data, based on a corpus recorded with a human-like audiomotor robot
head. A non-linear dimensionality reduction technique is used to show that
these data lie on a two-dimensional (2D) smooth manifold parameterized by the
motor states of the listener, or equivalently, the sound source directions. We
propose a probabilistic piecewise affine mapping model (PPAM) specifically
designed to deal with high-dimensional data exhibiting an intrinsic piecewise
linear structure. We derive a closed-form expectation-maximization (EM)
procedure for estimating the model parameters, followed by Bayes inversion for
obtaining the full posterior density function of a sound source direction. We
extend this solution to deal with missing data and redundancy in real world
spectrograms, and hence for 2D localization of natural sound sources such as
speech. We further generalize the model to the challenging case of multiple
sound sources and we propose a variational EM framework. The associated
algorithm, referred to as variational EM for source separation and localization
(VESSL) yields a Bayesian estimation of the 2D locations and time-frequency
masks of all the sources. Comparisons of the proposed approach with several
existing methods reveal that the combination of acoustic-space learning with
Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table
Event-based simulation of quantum physics experiments
We review an event-based simulation approach which reproduces the statistical
distributions of wave theory not by requiring the knowledge of the solution of
the wave equation of the whole system but by generating detection events
one-by-one according to an unknown distribution. We illustrate its
applicability to various single photon and single neutron interferometry
experiments and to two Bell test experiments, a single-photon
Einstein-Podolsky-Rosen experiment employing post-selection for photon pair
identification and a single-neutron Bell test interferometry experiment with
nearly detection efficiency.Comment: Lectures notes of the Advanced School on Quantum Foundations and Open
Quantum Systems, Jo\~ao Pessoa, Brazil, July 2012, edited by T. M.
Nieuwenhuizen et al, World Scientific, to appea
Using Description Logics for RDF Constraint Checking and Closed-World Recognition
RDF and Description Logics work in an open-world setting where absence of
information is not information about absence. Nevertheless, Description Logic
axioms can be interpreted in a closed-world setting and in this setting they
can be used for both constraint checking and closed-world recognition against
information sources. When the information sources are expressed in well-behaved
RDF or RDFS (i.e., RDF graphs interpreted in the RDF or RDFS semantics) this
constraint checking and closed-world recognition is simple to describe. Further
this constraint checking can be implemented as SPARQL querying and thus
effectively performed.Comment: Extended version of a paper of the same name that will appear in
AAAI-201
Integrating and Ranking Uncertain Scientific Data
Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
- âŠ