26,452 research outputs found

    Search Efficient Binary Network Embedding

    Full text link
    Traditional network embedding primarily focuses on learning a dense vector representation for each node, which encodes network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned dense vector representations are inefficient for large-scale similarity search, which requires to find the nearest neighbor measured by Euclidean distance in a continuous vector space. In this paper, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a sparse binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations efficiently through a stochastic gradient descent based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support much quicker network node search compared to Euclidean distance or other distance measures. Our experiments and comparisons show that BinaryNE not only delivers more than 23 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods

    Nearest neighbor search with multiple random projection trees : core method and improvements

    Get PDF
    Nearest neighbor search is a crucial tool in computer science and a part of many machine learning algorithms, the most obvious example being the venerable k-NN classifier. More generally, nearest neighbors have usages in numerous fields such as classification, regression, computer vision, recommendation systems, robotics and compression to name just a few examples. In general, nearest neighbor problems cannot be answered in sublinear time – to identify the actual nearest data points, clearly all objects have to be accessed at least once. However, in the class of applications where nearest neighbor searches are repeatedly made within a fixed data set that is available upfront, such as recommendation systems (Spotify, e-commerce, etc.), we can do better. In a computationally expensive offline phase the data set is indexed with a data structure, and in the online phase the index is used to answer nearest neighbor queries at a superior rate. The cost of indexing is usually much larger than that of performing a single query, but with a high number of queries the initial indexing cost gets eventually compensated. The urge for efficient index structures for nearest neighbors search has sparked a lot of research and hundreds of papers have been published to date. We look into the class of structures called binary space partitioning trees, specifically the random projection tree. Random projection trees have favorable properties especially when working with data sets with low intrinsic dimensionality. However, they have rarely been used in real-life nearest neighbor solutions due to limiting factors such as the relatively high cost of projection computations in high dimensional spaces. We present a new index structure for approximate nearest neighbor search that consists of multiple random projection trees, and several variants of algorithms to use it for efficient nearest neighbor search. We start by specifying our variant of the random projection tree and show how to construct an index of multiple random projection trees (MRPT), along with a simple query that combines the results from independent random projection trees to achieve much higher query accuracy with faster query times. This is followed by discussion of further methods to optimize accuracy and storage. The focus will be on algorithmic details, accompanied by a thorough analysis of memory and time complexity. Finally we will show experimentally that a real-life implementation of these ideas leads to an algorithm that achieves faster query times than the currently available open source libraries for high-recall approximate nearest neighbor search

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Neural Distributed Autoassociative Memories: A Survey

    Full text link
    Introduction. Neural network models of autoassociative, distributed memory allow storage and retrieval of many items (vectors) where the number of stored items can exceed the vector dimension (the number of neurons in the network). This opens the possibility of a sublinear time search (in the number of stored items) for approximate nearest neighbors among vectors of high dimension. The purpose of this paper is to review models of autoassociative, distributed memory that can be naturally implemented by neural networks (mainly with local learning rules and iterative dynamics based on information locally available to neurons). Scope. The survey is focused mainly on the networks of Hopfield, Willshaw and Potts, that have connections between pairs of neurons and operate on sparse binary vectors. We discuss not only autoassociative memory, but also the generalization properties of these networks. We also consider neural networks with higher-order connections and networks with a bipartite graph structure for non-binary data with linear constraints. Conclusions. In conclusion we discuss the relations to similarity search, advantages and drawbacks of these techniques, and topics for further research. An interesting and still not completely resolved question is whether neural autoassociative memories can search for approximate nearest neighbors faster than other index structures for similarity search, in particular for the case of very high dimensional vectors.Comment: 31 page
    corecore