7 research outputs found
Grammar-Based Geodesics in Semantic Networks
A geodesic is the shortest path between two vertices in a connected network.
The geodesic is the kernel of various network metrics including radius,
diameter, eccentricity, closeness, and betweenness. These metrics are the
foundation of much network research and thus, have been studied extensively in
the domain of single-relational networks (both in their directed and undirected
forms). However, geodesics for single-relational networks do not translate
directly to multi-relational, or semantic networks, where vertices are
connected to one another by any number of edge labels. Here, a more
sophisticated method for calculating a geodesic is necessary. This article
presents a technique for calculating geodesics in semantic networks with a
focus on semantic networks represented according to the Resource Description
Framework (RDF). In this framework, a discrete "walker" utilizes an abstract
path description called a grammar to determine which paths to include in its
geodesic calculation. The grammar-based model forms a general framework for
studying geodesic metrics in semantic networks.Comment: First draft written in 200
ABSTRACT Node Ranking In Labeled Directed Graphs
Our work is motivated by the problem of ranking hyperlinked documents for a given query. Given an arbitrary directed graph with edge and node labels, we present a new flow-based model and an efficient method to dynamically rank the nodes of this graph with respect to any of the original labels. Ranking documents for a given query in a hyperlinked document set and ranking of authors/articles for a given topic in a citation database are some typical applications of our method. We outline the structural conditions that the graph must satisfy for our ranking to be different from the traditional PageRank. We have built a system using two indices that is capable of dynamically ranking documents for any given query. We validate our system and method using experiments on a few datasets: a crawl of the IBM Intranet (12 million pages), a crawl of the www (30 million pages) and the DBLP citation dataset. We compare our method to existing schemes for topic-biased ranking that require a classifier and the traditional PageRank. In these experiments, we demonstrate that our method is well suited for fine-grained ranking and that our method performs better than the existing schemes. We also demonstrate that our system can obtain an improved ranking with very little impact on query time
A model for handling approximate, noisy or incomplete labeling in text classification
We introduce a Bayesian model, BayesANIL, that is capable of estimating uncertainties associated with the labeling process. Given a labeled or partially labeled training corpus of text documents, the model estimates the joint distribution of training documents and class labels by using a generalization of the Expectation Maximization algorithm. The estimates can be used in standard classification models to reduce error rates. Since uncertainties in the labeling are taken into account, the model provides an elegant mechanism to deal with noisy labels. We provide an intuitive modification to the EM iterations by re-estimating the empirical distribution in order to reinforce feature values in unlabeled data and to reduce the influence of noisily labeled examples. Considerable improvement in the classification accuracies of two popular classification algorithms on standard labeled data-sets with and without artificially introduced noise, as well as in the presence and absence of unlabeled data, indicates that this may be a promising method to reduce the burden of manual labeling. 1