11,675 research outputs found
k-NN ๊ฒ์ ๋ฐ k-NN ๊ทธ๋ํ ์์ฑ์ ์ํ ๊ณ ์ ๊ทผ์ฌ ์๊ณ ๋ฆฌ์ฆ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2015. 2. ์ด์๊ตฌ.Finding k-nearest neighbors (k-NN) is an essential part of recommeder systems, information retrieval, and many data mining and machine learning algorithms. However, there are two main problems in finding k-nearest neighbors: 1) Existing approaches require a huge amount of time when the number of objects or dimensions is scale up. 2) The k-NN computation methods do not show the consistent performance over different search tasks and types of data. In this dissertation, we present fast and versatile algorithms for finding k-nearest neighbors in order to cope with these problems. The main contributions are summarized as follows: first, we present an efficient and scalable algorithm for finding an approximate k-NN graph by filtering node pairs whose large value dimensions do not match at all. Second, a fast collaborative filtering algorithm that utilizes k-NN graph is presented. The main idea of this approach is to reverse the process of finding k-nearest neighbors in item-based collaborative filtering. Last, we propose a fast approximate algorithm for k-NN search by selecting query-specific signatures from a signature pool to pick high-quality k-NN candidates.The experimental results show that the proposed algorithms guarantee a high level of accuracy while also being much faster than the other algorithms over different types of search tasks and datasets.Abstract i
Contents iii
List of Figures vii
List of Tables xi
Chapter 1 Introduction 1
1.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Fast Approximation . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Versatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Our Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . 7
1.2.3 Reversed CF . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 2 Background and Related Work 14
2.1 k-NN Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Locality Sensitive Hashing . . . . . . . . . . . . . . . . . . 15
2.1.2 LSH-based k-NN Search . . . . . . . . . . . . . . . . . . . 16
2.2 k-NN Graph Construction . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 LSH-based Approach . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Clustering-based Approach . . . . . . . . . . . . . . . . . 19
2.2.3 Heuristic-based Approach . . . . . . . . . . . . . . . . . . 20
2.2.4 Similarity Join . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 3 Fast Approximate k-NN Graph Construction 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Constructing a k-Nearest Neighbor Graph . . . . . . . . . . . . . 29
3.3.1 Greedy Filtering . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Prefix Selection Scheme . . . . . . . . . . . . . . . . . . . 32
3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Graph Construction Time . . . . . . . . . . . . . . . . . . 39
3.4.3 Graph Accuracy . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 44
3.5.2 Performance Comparison . . . . . . . . . . . . . . . . . . 48
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 4 Fast Collaborative Filtering 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Fast Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Nearest Neighbor Graph Construction . . . . . . . . . . . 58
4.3.2 Fast Recommendation Algorithm . . . . . . . . . . . . . . 60
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Overall Comparison . . . . . . . . . . . . . . . . . . . . . 65
4.4.3 Effects of Parameter Changes . . . . . . . . . . . . . . . . 68
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Chapter 5 Fast Approximate k-NN Search 72
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Signature Selection LSH . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1 Data-dependent LSH . . . . . . . . . . . . . . . . . . . . . 75
5.2.2 Signature Pool Generation . . . . . . . . . . . . . . . . . . 76
5.2.3 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 79
5.2.4 Optimization Techniques . . . . . . . . . . . . . . . . . . 83
5.3 S2LSH for Graph Construction . . . . . . . . . . . . . . . . . . . 84
5.3.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.2 Signature Selection . . . . . . . . . . . . . . . . . . . . . . 84
5.3.3 Optimization Techniques . . . . . . . . . . . . . . . . . . 85
5.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 87
5.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . 91
5.5.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . 97
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Chapter 6 Conclusion 103
Bibliography 105
์ด๋ก 113Docto
Graph Convolutional Matrix Completion
We consider matrix completion for recommender systems from the point of view
of link prediction on graphs. Interaction data such as movie ratings can be
represented by a bipartite user-item graph with labeled edges denoting observed
ratings. Building on recent progress in deep learning on graph-structured data,
we propose a graph auto-encoder framework based on differentiable message
passing on the bipartite interaction graph. Our model shows competitive
performance on standard collaborative filtering benchmarks. In settings where
complimentary feature information or structured data such as a social network
is available, our framework outperforms recent state-of-the-art methods.Comment: 9 pages, 3 figures, updated with additional experimental evaluatio
A Harmonic Extension Approach for Collaborative Ranking
We present a new perspective on graph-based methods for collaborative ranking
for recommender systems. Unlike user-based or item-based methods that compute a
weighted average of ratings given by the nearest neighbors, or low-rank
approximation methods using convex optimization and the nuclear norm, we
formulate matrix completion as a series of semi-supervised learning problems,
and propagate the known ratings to the missing ones on the user-user or
item-item graph globally. The semi-supervised learning problems are expressed
as Laplace-Beltrami equations on a manifold, or namely, harmonic extension, and
can be discretized by a point integral method. We show that our approach does
not impose a low-rank Euclidean subspace on the data points, but instead
minimizes the dimension of the underlying manifold. Our method, named LDM (low
dimensional manifold), turns out to be particularly effective in generating
rankings of items, showing decent computational efficiency and robust ranking
quality compared to state-of-the-art methods
Knowledge Graph semantic enhancement of input data for improving AI
Intelligent systems designed using machine learning algorithms require a
large number of labeled data. Background knowledge provides complementary, real
world factual information that can augment the limited labeled data to train a
machine learning algorithm. The term Knowledge Graph (KG) is in vogue as for
many practical applications, it is convenient and useful to organize this
background knowledge in the form of a graph. Recent academic research and
implemented industrial intelligent systems have shown promising performance for
machine learning algorithms that combine training data with a knowledge graph.
In this article, we discuss the use of relevant KGs to enhance input data for
two applications that use machine learning -- recommendation and community
detection. The KG improves both accuracy and explainability
Social Collaborative Retrieval
Socially-based recommendation systems have recently attracted significant
interest, and a number of studies have shown that social information can
dramatically improve a system's predictions of user interests. Meanwhile, there
are now many potential applications that involve aspects of both recommendation
and information retrieval, and the task of collaborative retrieval---a
combination of these two traditional problems---has recently been introduced.
Successful collaborative retrieval requires overcoming severe data sparsity,
making additional sources of information, such as social graphs, particularly
valuable. In this paper we propose a new model for collaborative retrieval, and
show that our algorithm outperforms current state-of-the-art approaches by
incorporating information from social networks. We also provide empirical
analyses of the ways in which cultural interests propagate along a social graph
using a real-world music dataset.Comment: 10 page
A Graphical Model Formulation of Collaborative Filtering Neighbourhood Methods with Fast Maximum Entropy Training
Item neighbourhood methods for collaborative filtering learn a weighted graph
over the set of items, where each item is connected to those it is most similar
to. The prediction of a user's rating on an item is then given by that rating
of neighbouring items, weighted by their similarity. This paper presents a new
neighbourhood approach which we call item fields, whereby an undirected
graphical model is formed over the item graph. The resulting prediction rule is
a simple generalization of the classical approaches, which takes into account
non-local information in the graph, allowing its best results to be obtained
when using drastically fewer edges than other neighbourhood approaches. A fast
approximate maximum entropy training method based on the Bethe approximation is
presented, which uses a simple gradient ascent procedure. When using
precomputed sufficient statistics on the Movielens datasets, our method is
faster than maximum likelihood approaches by two orders of magnitude.Comment: ICML201
- โฆ