4,630,122 research outputs found
Bayesian Information Extraction Network
Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various
aspects of language in one model. Many existing algorithms developed for
learning and inference in DBNs are applicable to probabilistic language
modeling. To demonstrate the potential of DBNs for natural language processing,
we employ a DBN in an information extraction task. We show how to assemble
wealth of emerging linguistic instruments for shallow parsing, syntactic and
semantic tagging, morphological decomposition, named entity recognition etc. in
order to incrementally build a robust information extraction system. Our method
outperforms previously published results on an established benchmark domain.Comment: 6 page
Pairwise Network Information and Nonlinear Correlations
Reconstructing the structural connectivity between interacting units from
observed activity is a challenge across many different disciplines. The
fundamental first step is to establish whether or to what extent the
interactions between the units can be considered pairwise and, thus, can be
modeled as an interaction network with simple links corresponding to pairwise
interactions. In principle this can be determined by comparing the maximum
entropy given the bivariate probability distributions to the true joint
entropy. In many practical cases this is not an option since the bivariate
distributions needed may not be reliably estimated, or the optimization is too
computationally expensive. Here we present an approach that allows one to use
mutual informations as a proxy for the bivariate distributions. This has the
advantage of being less computationally expensive and easier to estimate. We
achieve this by introducing a novel entropy maximization scheme that is based
on conditioning on entropies and mutual informations. This renders our approach
typically superior to other methods based on linear approximations. The
advantages of the proposed method are documented using oscillator networks and
a resting-state human brain network as generic relevant examples
Lecture Notes on Network Information Theory
These lecture notes have been converted to a book titled Network Information
Theory published recently by Cambridge University Press. This book provides a
significantly expanded exposition of the material in the lecture notes as well
as problems and bibliographic notes at the end of each chapter. The authors are
currently preparing a set of slides based on the book that will be posted in
the second half of 2012. More information about the book can be found at
http://www.cambridge.org/9781107008731/. The previous (and obsolete) version of
the lecture notes can be found at http://arxiv.org/abs/1001.3404v4/
Network Information Flow with Correlated Sources
In this paper, we consider a network communications problem in which multiple
correlated sources must be delivered to a single data collector node, over a
network of noisy independent point-to-point channels. We prove that perfect
reconstruction of all the sources at the sink is possible if and only if, for
all partitions of the network nodes into two subsets S and S^c such that the
sink is always in S^c, we have that H(U_S|U_{S^c}) < \sum_{i\in S,j\in S^c}
C_{ij}. Our main finding is that in this setup a general source/channel
separation theorem holds, and that Shannon information behaves as a classical
network flow, identical in nature to the flow of water in pipes. At first
glance, it might seem surprising that separation holds in a fairly general
network situation like the one we study. A closer look, however, reveals that
the reason for this is that our model allows only for independent
point-to-point channels between pairs of nodes, and not multiple-access and/or
broadcast channels, for which separation is well known not to hold. This
``information as flow'' view provides an algorithmic interpretation for our
results, among which perhaps the most important one is the optimality of
implementing codes using a layered protocol stack.Comment: Final version, to appear in the IEEE Transactions on Information
Theory -- contains (very) minor changes based on the last round of review
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
Improving information filtering via network manipulation
Recommender system is a very promising way to address the problem of
overabundant information for online users. Though the information filtering for
the online commercial systems received much attention recently, almost all of
the previous works are dedicated to design new algorithms and consider the
user-item bipartite networks as given and constant information. However, many
problems for recommender systems such as the cold-start problem (i.e. low
recommendation accuracy for the small degree items) are actually due to the
limitation of the underlying user-item bipartite networks. In this letter, we
propose a strategy to enhance the performance of the already existing
recommendation algorithms by directly manipulating the user-item bipartite
networks, namely adding some virtual connections to the networks. Numerical
analyses on two benchmark data sets, MovieLens and Netflix, show that our
method can remarkably improve the recommendation performance. Specifically, it
not only improve the recommendations accuracy (especially for the small degree
items), but also help the recommender systems generate more diverse and novel
recommendations.Comment: 6 pages, 5 figure
- …
