1,654 research outputs found
Transfer Learning for Content-Based Recommender Systems using Tree Matching
In this paper we present a new approach to content-based transfer learning
for solving the data sparsity problem in cases when the users' preferences in
the target domain are either scarce or unavailable, but the necessary
information on the preferences exists in another domain. We show that training
a system to use such information across domains can produce better performance.
Specifically, we represent users' behavior patterns based on topological graph
structures. Each behavior pattern represents the behavior of a set of users,
when the users' behavior is defined as the items they rated and the items'
rating values. In the next step we find a correlation between behavior patterns
in the source domain and behavior patterns in the target domain. This mapping
is considered a bridge between the two domains. Based on the correlation and
content-attributes of the items, we train a machine learning model to predict
users' ratings in the target domain. When we compare our approach to the
popularity approach and KNN-cross-domain on a real world dataset, the results
show that on an average of 83 of the cases our approach outperforms both
methods
Cartesian Tree Matching and Indexing
We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation
Fast Sublinear Sparse Representation using Shallow Tree Matching Pursuit
Sparse approximations using highly over-complete dictionaries is a
state-of-the-art tool for many imaging applications including denoising,
super-resolution, compressive sensing, light-field analysis, and object
recognition. Unfortunately, the applicability of such methods is severely
hampered by the computational burden of sparse approximation: these algorithms
are linear or super-linear in both the data dimensionality and size of the
dictionary. We propose a framework for learning the hierarchical structure of
over-complete dictionaries that enables fast computation of sparse
representations. Our method builds on tree-based strategies for nearest
neighbor matching, and presents domain-specific enhancements that are highly
efficient for the analysis of image patches. Contrary to most popular methods
for building spatial data structures, out methods rely on shallow, balanced
trees with relatively few layers. We show an extensive array of experiments on
several applications such as image denoising/superresolution, compressive
video/light-field sensing where we practically achieve 100-1000x speedup (with
a less than 1dB loss in accuracy)
Optimal Enumeration: Efficient Top-k Tree Matching
Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number ofmatches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of top-k tree pattern matching; that is, given a rooted tree T, compute its top-k matches in a directed graph G based on the twig-pattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler's procedure. We show that our enumeration algorithm runs in O(nT + log k) time in each round where nT is the number of nodes in T. Considering that the time complexity to output a match of T is O(nT) and nT = log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top-1 match of T in our algorithm is O(mR) where mR is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O(mR) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR +k(nT +log k)) in contrast to the time complexity O(mR log k+knT (log k+dT)) of the existing technique where dT is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of top-k graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude
Incorporating structured text retrieval into the extended Boolean model
Conventional information retrieval models are inappropriate for use in databases containing semi-structured biographical data. A hybrid algorithm that effectively addresses many of the problems in searching biographical databases is presented in this article. An overview of applicable structured text retrieval algorithms is given, with focus specifically on the tree matching model. Small adaptations to the Extended Boolean Model, to make it more applicable to biographical databases, are described. The adaptation of tree matching models to the hierarchical nature of data in a person record is described and a distance function between query and record is defined. A hybrid model between the Extended Boolean Model and the adapted Tree Matching Model is then presented. A fast ranking algorithm appropriate for general searches and a more effective (but more resource intensive) algorithm for more advanced searches is given. It is shown how dates can be incorporated in the hybrid model to create a more powerful search algorithm. The hybrid algorithm can be used to rank records in descending order of relevance to a user's query
From tree matching to sparse graph alignment
In this paper we consider alignment of sparse graphs, for which we introduce
the Neighborhood Tree Matching Algorithm (NTMA). For correlated
Erd\H{o}s-R\'{e}nyi random graphs, we prove that the algorithm returns -- in
polynomial time -- a positive fraction of correctly matched vertices, and a
vanishing fraction of mismatches. This result holds with average degree of the
graphs in and correlation parameter that can be bounded away from 1,
conditions under which random graph alignment is particularly challenging. As a
byproduct of the analysis we introduce a matching metric between trees and
characterize it for several models of correlated random trees. These results
may be of independent interest, yielding for instance efficient tests for
determining whether two random trees are correlated or independent.Comment: 33 pages, 10 figures, accepted at COLT 2020. Typos corrected, some
new figures, some remarks and explanations detailed, minor changes in proof
of Th. 1.
- …