1,775 research outputs found
Search Efficient Binary Network Embedding
Traditional network embedding primarily focuses on learning a dense vector
representation for each node, which encodes network structure and/or node
content information, such that off-the-shelf machine learning algorithms can be
easily applied to the vector-format node representations for network analysis.
However, the learned dense vector representations are inefficient for
large-scale similarity search, which requires to find the nearest neighbor
measured by Euclidean distance in a continuous vector space. In this paper, we
propose a search efficient binary network embedding algorithm called BinaryNE
to learn a sparse binary code for each node, by simultaneously modeling node
context relations and node attribute relations through a three-layer neural
network. BinaryNE learns binary node representations efficiently through a
stochastic gradient descent based online learning algorithm. The learned binary
encoding not only reduces memory usage to represent each node, but also allows
fast bit-wise comparisons to support much quicker network node search compared
to Euclidean distance or other distance measures. Our experiments and
comparisons show that BinaryNE not only delivers more than 23 times faster
search speed, but also provides comparable or better search quality than
traditional continuous vector based network embedding methods
Identification of functionally related enzymes by learning-to-rank methods
Enzyme sequences and structures are routinely used in the biological sciences
as queries to search for functionally related enzymes in online databases. To
this end, one usually departs from some notion of similarity, comparing two
enzymes by looking for correspondences in their sequences, structures or
surfaces. For a given query, the search operation results in a ranking of the
enzymes in the database, from very similar to dissimilar enzymes, while
information about the biological function of annotated database enzymes is
ignored.
In this work we show that rankings of that kind can be substantially improved
by applying kernel-based learning algorithms. This approach enables the
detection of statistical dependencies between similarities of the active cleft
and the biological function of annotated enzymes. This is in contrast to
search-based approaches, which do not take annotated training data into
account. Similarity measures based on the active cleft are known to outperform
sequence-based or structure-based measures under certain conditions. We
consider the Enzyme Commission (EC) classification hierarchy for obtaining
annotated enzymes during the training phase. The results of a set of sizeable
experiments indicate a consistent and significant improvement for a set of
similarity measures that exploit information about small cavities in the
surface of enzymes
FREDE: Linear-Space Anytime Graph Embeddings
Low-dimensional representations, or embeddings, of a graph's nodes facilitate
data mining tasks. Known embedding methods explicitly or implicitly rely on a
similarity measure among nodes. As the similarity matrix is quadratic, a
tradeoff between space complexity and embedding quality arises; past research
initially opted for heuristics and linear-transform factorizations, which allow
for linear space but compromise on quality; recent research has proposed a
quadratic-space solution as a viable option too.
In this paper we observe that embedding methods effectively aim to preserve
the covariance among the rows of a similarity matrix, and raise the question:
is there a method that combines (i) linear space complexity, (ii) a nonlinear
transform as its basis, and (iii) nontrivial quality guarantees? We answer this
question in the affirmative, with FREDE(FREquent Directions Embedding), a
sketching-based method that iteratively improves on quality while processing
rows of the similarity matrix individually; thereby, it provides, at any
iteration, column-covariance approximation guarantees that are, in due course,
almost indistinguishable from those of the optimal row-covariance approximation
by SVD. Our experimental evaluation on variably sized networks shows that FREDE
performs as well as SVD and competitively against current state-of-the-art
methods in diverse data mining tasks, even when it derives an embedding based
on only 10% of node similarities
- …