5,882 research outputs found
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Coarse-to-Fine Lifted MAP Inference in Computer Vision
There is a vast body of theoretical research on lifted inference in
probabilistic graphical models (PGMs). However, few demonstrations exist where
lifting is applied in conjunction with top of the line applied algorithms. We
pursue the applicability of lifted inference for computer vision (CV), with the
insight that a globally optimal (MAP) labeling will likely have the same label
for two symmetric pixels. The success of our approach lies in efficiently
handling a distinct unary potential on every node (pixel), typical of CV
applications. This allows us to lift the large class of algorithms that model a
CV problem via PGM inference. We propose a generic template for coarse-to-fine
(C2F) inference in CV, which progressively refines an initial coarsely lifted
PGM for varying quality-time trade-offs. We demonstrate the performance of C2F
inference by developing lifted versions of two near state-of-the-art CV
algorithms for stereo vision and interactive image segmentation. We find that,
against flat algorithms, the lifted versions have a much superior anytime
performance, without any loss in final solution quality.Comment: Published in IJCAI 201
Truncating the loop series expansion for Belief Propagation
Recently, M. Chertkov and V.Y. Chernyak derived an exact expression for the
partition sum (normalization constant) corresponding to a graphical model,
which is an expansion around the Belief Propagation solution. By adding
correction terms to the BP free energy, one for each "generalized loop" in the
factor graph, the exact partition sum is obtained. However, the usually
enormous number of generalized loops generally prohibits summation over all
correction terms. In this article we introduce Truncated Loop Series BP
(TLSBP), a particular way of truncating the loop series of M. Chertkov and V.Y.
Chernyak by considering generalized loops as compositions of simple loops. We
analyze the performance of TLSBP in different scenarios, including the Ising
model, regular random graphs and on Promedas, a large probabilistic medical
diagnostic system. We show that TLSBP often improves upon the accuracy of the
BP solution, at the expense of increased computation time. We also show that
the performance of TLSBP strongly depends on the degree of interaction between
the variables. For weak interactions, truncating the series leads to
significant improvements, whereas for strong interactions it can be
ineffective, even if a high number of terms is considered.Comment: 31 pages, 12 figures, submitted to Journal of Machine Learning
Researc
Lifted Bayesian filtering in multi-entity systems
This thesis focuses on Bayesian filtering for systems that consist of multiple, interacting entites (e.g. agents or objects), which can naturally be described by Multiset Rewriting Systems (MRSs). The main insight is that the state space that is underling an MRS exhibits a certain symmetry, which can be exploited to increase inference efficiency. We provide an efficient, lifted filtering algorithm, which is able to achieve a factorial reduction in space and time complexity, compared to conventional, ground filtering.Diese Arbeit betrachtet Bayes'sche Filter in Systemen, die aus mehreren, interagierenden Entitäten (z.B. Agenten oder Objekten) bestehen. Die Systemdynamik solcher Systeme kann auf natürliche Art durch Multiset Rewriting Systems (MRS) spezifiziert werden. Die wesentliche Erkenntnis ist, dass der Zustandraum Symmetrien aufweist, die ausgenutzt werden können, um die Effizienz der Inferenz zu erhöhen. Wir führen einen effizienten, gelifteten Filter-Algorithmus ein, dessen Zeit- und Platzkomplexität gegenüber dem grundierten Algorithmus um einen faktoriellen Faktor reduziert ist
A Bayesian Approach to Graphical Record Linkage and De-duplication
We propose an unsupervised approach for linking records across arbitrarily
many files, while simultaneously detecting duplicate records within files. Our
key innovation involves the representation of the pattern of links between
records as a bipartite graph, in which records are directly linked to latent
true individuals, and only indirectly linked to other records. This flexible
representation of the linkage structure naturally allows us to estimate the
attributes of the unique observable people in the population, calculate
transitive linkage probabilities across records (and represent this visually),
and propagate the uncertainty of record linkage into later analyses. Our method
makes it particularly easy to integrate record linkage with post-processing
procedures such as logistic regression, capture-recapture, etc. Our linkage
structure lends itself to an efficient, linear-time, hybrid Markov chain Monte
Carlo algorithm, which overcomes many obstacles encountered by previously
record linkage approaches, despite the high-dimensional parameter space. We
illustrate our method using longitudinal data from the National Long Term Care
Survey and with data from the Italian Survey on Household and Wealth, where we
assess the accuracy of our method and show it to be better in terms of error
rates and empirical scalability than other approaches in the literature.Comment: 39 pages, 8 figures, 8 tables. Longer version of arXiv:1403.0211, In
press, Journal of the American Statistical Association: Theory and Methods
(2015
- …