41 research outputs found
Structure Selection from Streaming Relational Data
Statistical relational learning techniques have been successfully applied in
a wide range of relational domains. In most of these applications, the human
designers capitalized on their background knowledge by following a
trial-and-error trajectory, where relational features are manually defined by a
human engineer, parameters are learned for those features on the training data,
the resulting model is validated, and the cycle repeats as the engineer adjusts
the set of features. This paper seeks to streamline application development in
large relational domains by introducing a light-weight approach that
efficiently evaluates relational features on pieces of the relational graph
that are streamed to it one at a time. We evaluate our approach on two social
media tasks and demonstrate that it leads to more accurate models that are
learned faster
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty
There is a growing need for methods which can capture uncertainties and
answer queries over graph-structured data. Two common types of uncertainty are
uncertainty over the attribute values of nodes and uncertainty over the
existence of edges. In this paper, we combine those with identity uncertainty.
Identity uncertainty represents uncertainty over the mapping from objects
mentioned in the data, or references, to the underlying real-world entities. We
propose the notion of a probabilistic entity graph (PEG), a probabilistic graph
model that defines a distribution over possible graphs at the entity level. The
model takes into account node attribute uncertainty, edge existence
uncertainty, and identity uncertainty, and thus enables us to systematically
reason about all three types of uncertainties in a uniform manner. We introduce
a general framework for constructing a PEG given uncertain data at the
reference level and develop highly efficient algorithms to answer subgraph
pattern matching queries in this setting. Our algorithms are based on two novel
ideas: context-aware path indexing and reduction by join-candidates, which
drastically reduce the query search space. A comprehensive experimental
evaluation shows that our approach outperforms baseline implementations by
orders of magnitude
Constrained Clustering Based on the Link Structure of a Directed Graph
In many segmentation applications, data objects are often clustered based purely on attribute-level similarities. This practice has neglected the useful information that resides in the link structure among data objects and the valuable expert domain knowledge about the desirable cluster assignment. Link structure can carry worthy information about the similarity between data objects (e.g. citation), and we should also incorporate the existing domain information on preferred outcome when segmenting data. In this paper, we investigate the segmentation problem combining these three sources of information, which has not been addressed in the existing literature. We propose a segmentation method for directed graphs that incorporates the attribute values, link structure and expert domain information (represented as constraints). The proposed method combines these three types of information to achieve good quality segmentation on data which can be represented as a directed graph. We conducted comprehensive experiments to evaluate various aspects of our approach and demonstrate the effectiveness of our method