1,775 research outputs found

    Search Efficient Binary Network Embedding

    Full text link
    Traditional network embedding primarily focuses on learning a dense vector representation for each node, which encodes network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned dense vector representations are inefficient for large-scale similarity search, which requires to find the nearest neighbor measured by Euclidean distance in a continuous vector space. In this paper, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a sparse binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations efficiently through a stochastic gradient descent based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support much quicker network node search compared to Euclidean distance or other distance measures. Our experiments and comparisons show that BinaryNE not only delivers more than 23 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods

    Identification of functionally related enzymes by learning-to-rank methods

    Full text link
    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

    FREDE: Linear-Space Anytime Graph Embeddings

    Full text link
    Low-dimensional representations, or embeddings, of a graph's nodes facilitate data mining tasks. Known embedding methods explicitly or implicitly rely on a similarity measure among nodes. As the similarity matrix is quadratic, a tradeoff between space complexity and embedding quality arises; past research initially opted for heuristics and linear-transform factorizations, which allow for linear space but compromise on quality; recent research has proposed a quadratic-space solution as a viable option too. In this paper we observe that embedding methods effectively aim to preserve the covariance among the rows of a similarity matrix, and raise the question: is there a method that combines (i) linear space complexity, (ii) a nonlinear transform as its basis, and (iii) nontrivial quality guarantees? We answer this question in the affirmative, with FREDE(FREquent Directions Embedding), a sketching-based method that iteratively improves on quality while processing rows of the similarity matrix individually; thereby, it provides, at any iteration, column-covariance approximation guarantees that are, in due course, almost indistinguishable from those of the optimal row-covariance approximation by SVD. Our experimental evaluation on variably sized networks shows that FREDE performs as well as SVD and competitively against current state-of-the-art methods in diverse data mining tasks, even when it derives an embedding based on only 10% of node similarities
    • …
    corecore