33,808 research outputs found
Linear and Parallel Learning of Markov Random Fields
We introduce a new embarrassingly parallel parameter learning algorithm for
Markov random fields with untied parameters which is efficient for a large
class of practical models. Our algorithm parallelizes naturally over cliques
and, for graphs of bounded degree, its complexity is linear in the number of
cliques. Unlike its competitors, our algorithm is fully parallel and for
log-linear models it is also data efficient, requiring only the local
sufficient statistics of the data to estimate parameters
Multi-path Probabilistic Available Bandwidth Estimation through Bayesian Active Learning
Knowing the largest rate at which data can be sent on an end-to-end path such
that the egress rate is equal to the ingress rate with high probability can be
very practical when choosing transmission rates in video streaming or selecting
peers in peer-to-peer applications. We introduce probabilistic available
bandwidth, which is defined in terms of ingress rates and egress rates of
traffic on a path, rather than in terms of capacity and utilization of the
constituent links of the path like the standard available bandwidth metric. In
this paper, we describe a distributed algorithm, based on a probabilistic
graphical model and Bayesian active learning, for simultaneously estimating the
probabilistic available bandwidth of multiple paths through a network. Our
procedure exploits the fact that each packet train provides information not
only about the path it traverses, but also about any path that shares a link
with the monitored path. Simulations and PlanetLab experiments indicate that
this process can dramatically reduce the number of probes required to generate
accurate estimates
Probabilistic Bag-Of-Hyperlinks Model for Entity Linking
Many fundamental problems in natural language processing rely on determining
what entities appear in a given text. Commonly referenced as entity linking,
this step is a fundamental component of many NLP tasks such as text
understanding, automatic summarization, semantic search or machine translation.
Name ambiguity, word polysemy, context dependencies and a heavy-tailed
distribution of entities contribute to the complexity of this problem.
We here propose a probabilistic approach that makes use of an effective
graphical model to perform collective entity disambiguation. Input mentions
(i.e.,~linkable token spans) are disambiguated jointly across an entire
document by combining a document-level prior of entity co-occurrences with
local information captured from mentions and their surrounding context. The
model is based on simple sufficient statistics extracted from data, thus
relying on few parameters to be learned.
Our method does not require extensive feature engineering, nor an expensive
training procedure. We use loopy belief propagation to perform approximate
inference. The low complexity of our model makes this step sufficiently fast
for real-time usage. We demonstrate the accuracy of our approach on a wide
range of benchmark datasets, showing that it matches, and in many cases
outperforms, existing state-of-the-art methods
- …