27 research outputs found
Hierarchical relational models for document networks
We develop the relational topic model (RTM), a hierarchical model of both
network structure and node attributes. We focus on document networks, where the
attributes of each document are its words, that is, discrete observations taken
from a fixed vocabulary. For each pair of documents, the RTM models their link
as a binary random variable that is conditioned on their contents. The model
can be used to summarize a network of documents, predict links between them,
and predict words within them. We derive efficient inference and estimation
algorithms based on variational methods that take advantage of sparsity and
scale with the number of links. We evaluate the predictive performance of the
RTM for large networks of scientific abstracts, web documents, and
geographically tagged news.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS309 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Mixed membership stochastic blockmodels
Observations consisting of measurements on relationships for pairs of objects
arise in many settings, such as protein interaction and gene regulatory
networks, collections of author-recipient email, and social networks. Analyzing
such data with probabilisic models can be delicate because the simple
exchangeability assumptions underlying many boilerplate models no longer hold.
In this paper, we describe a latent variable model of such data called the
mixed membership stochastic blockmodel. This model extends blockmodels for
relational data to ones which capture mixed membership latent relational
structure, thus providing an object-specific low-dimensional representation. We
develop a general variational inference algorithm for fast approximate
posterior inference. We explore applications to social and protein interaction
networks.Comment: 46 pages, 14 figures, 3 table
Automatic information retrieval through text-mining
The dissertation presented for obtaining the Master’s Degree in Electrical Engineering and Computer Science, at Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaNowadays, around a huge amount of firms in the European Union catalogued as Small and Medium Enterprises (SMEs), employ almost a great portion of the active workforce in Europe. Nonetheless, SMEs cannot afford implementing neither methods nor tools to systematically adapt innovation as a part of their business process. Innovation is the engine to be competitive in the globalized environment, especially in the current
socio-economic situation. This thesis provides a platform that when integrated with ExtremeFactories(EF) project, aids SMEs to become more competitive by means of monitoring schedule functionality.
In this thesis a text-mining platform that possesses the ability to schedule a gathering
information through keywords is presented. In order to develop the platform, several
choices concerning the implementation have been made, in the sense that one of them
requires particular emphasis is the framework, Apache Lucene Core 2 by supplying an efficient text-mining tool and it is highly used for the purpose of the thesis
Pervasive sensing to model political opinions in face-to-face networks
Exposure and adoption of opinions in social networks are
important questions in education, business, and government. We de-
scribe a novel application of pervasive computing based on using mobile
phone sensors to measure and model the face-to-face interactions and
subsequent opinion changes amongst undergraduates, during the 2008
US presidential election campaign. We nd that self-reported political
discussants have characteristic interaction patterns and can be predicted
from sensor data. Mobile features can be used to estimate unique individ-
ual exposure to di erent opinions, and help discover surprising patterns
of dynamic homophily related to external political events, such as elec-
tion debates and election day. To our knowledge, this is the rst time
such dynamic homophily e ects have been measured. Automatically esti-
mated exposure explains individual opinions on election day. Finally, we
report statistically signi cant di erences in the daily activities of individ-
uals that change political opinions versus those that do not, by modeling
and discovering dominant activities using topic models. We nd people
who decrease their interest in politics are routinely exposed (face-to-face)
to friends with little or no interest in politics.U.S. Army Research Laboratory (Cooperative Agreement No. W911NF-09-2-0053)United States. Air Force Office of Scientific Research (Award No. FA9550-10-1-0122)Swiss National Science Foundatio
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed