600 research outputs found
Adaptive image retrieval using a graph model for semantic feature integration
The variety of features available to represent multimedia data constitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the
retrieval model should ideally incorporate context-dependent feature representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe
how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both feature similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. However, the model is not restricted to the image domain and could easily be employed for retrieving multimedia data (and even a combination of different domains, eg images, audio and text documents)
Random Walks on Hypergraphs with Edge-Dependent Vertex Weights
Hypergraphs are used in machine learning to model higher-order relationships
in data. While spectral methods for graphs are well-established, spectral
theory for hypergraphs remains an active area of research. In this paper, we
use random walks to develop a spectral theory for hypergraphs with
edge-dependent vertex weights: hypergraphs where every vertex has a weight
for each incident hyperedge that describes the contribution
of to the hyperedge . We derive a random walk-based hypergraph
Laplacian, and bound the mixing time of random walks on such hypergraphs.
Moreover, we give conditions under which random walks on such hypergraphs are
equivalent to random walks on graphs. As a corollary, we show that current
machine learning methods that rely on Laplacians derived from random walks on
hypergraphs with edge-independent vertex weights do not utilize higher-order
relationships in the data. Finally, we demonstrate the advantages of
hypergraphs with edge-dependent vertex weights on ranking applications using
real-world datasets.Comment: Accepted to ICML 201
Integration of multiple data sources to prioritize candidate genes using discounted rating system
Background: Identifying disease gene from a list of candidate genes is an important task in bioinformatics. The main strategy is to prioritize candidate genes based on their similarity to known disease genes. Most of existing gene prioritization methods access only one genomic data source, which is noisy and incomplete. Thus, there is a need for the integration of multiple data sources containing different information. Results: In this paper, we proposed a combination strategy, called discounted rating system (DRS). We performed leave one out cross validation to compare it with N-dimensional order statistics (NDOS) used in Endeavour. Results showed that the AUC (Area Under the Curve) values achieved by DRS were comparable with NDOS on most of the disease families. But DRS worked much faster than NDOS, especially when the number of data sources increases. When there are 100 candidate genes and 20 data sources, DRS works more than 180 times faster than NDOS. In the framework of DRS, we give different weights for different data sources. The weighted DRS achieved significantly higher AUC values than NDOS. Conclusion: The proposed DRS algorithm is a powerful and effective framework for candidate gene prioritization. If weights of different data sources are proper given, the DRS algorithm will perform better
Discriminative Link Prediction using Local Links, Node Features and Community Structure
A link prediction (LP) algorithm is given a graph, and has to rank, for each
node, other nodes that are candidates for new linkage. LP is strongly motivated
by social search and recommendation applications. LP techniques often focus on
global properties (graph conductance, hitting or commute times, Katz score) or
local properties (Adamic-Adar and many variations, or node feature vectors),
but rarely combine these signals. Furthermore, neither of these extremes
exploit link densities at the intermediate level of communities. In this paper
we describe a discriminative LP algorithm that exploits two new signals. First,
a co-clustering algorithm provides community level link density estimates,
which are used to qualify observed links with a surprise value. Second, links
in the immediate neighborhood of the link to be predicted are not interpreted
at face value, but through a local model of node feature similarities. These
signals are combined into a discriminative link predictor. We evaluate the new
predictor using five diverse data sets that are standard in the literature. We
report on significant accuracy boosts compared to standard LP methods
(including Adamic-Adar and random walk). Apart from the new predictor, another
contribution is a rigorous protocol for benchmarking and reporting LP
algorithms, which reveals the regions of strengths and weaknesses of all the
predictors studied here, and establishes the new proposal as the most robust.Comment: 10 pages, 5 figure
Random Walk on Multiple Networks
Random Walk is a basic algorithm to explore the structure of networks, which
can be used in many tasks, such as local community detection and network
embedding. Existing random walk methods are based on single networks that
contain limited information. In contrast, real data often contain entities with
different types or/and from different sources, which are comprehensive and can
be better modeled by multiple networks. To take advantage of rich information
in multiple networks and make better inferences on entities, in this study, we
propose random walk on multiple networks, RWM. RWM is flexible and supports
both multiplex networks and general multiple networks, which may form
many-to-many node mappings between networks. RWM sends a random walker on each
network to obtain the local proximity (i.e., node visiting probabilities)
w.r.t. the starting nodes. Walkers with similar visiting probabilities
reinforce each other. We theoretically analyze the convergence properties of
RWM. Two approximation methods with theoretical performance guarantees are
proposed for efficient computation. We apply RWM in link prediction, network
embedding, and local community detection. Comprehensive experiments conducted
on both synthetic and real-world datasets demonstrate the effectiveness and
efficiency of RWM.Comment: Accepted to IEEE TKD
- …