1,654 research outputs found

    Transfer Learning for Content-Based Recommender Systems using Tree Matching

    Full text link
    In this paper we present a new approach to content-based transfer learning for solving the data sparsity problem in cases when the users' preferences in the target domain are either scarce or unavailable, but the necessary information on the preferences exists in another domain. We show that training a system to use such information across domains can produce better performance. Specifically, we represent users' behavior patterns based on topological graph structures. Each behavior pattern represents the behavior of a set of users, when the users' behavior is defined as the items they rated and the items' rating values. In the next step we find a correlation between behavior patterns in the source domain and behavior patterns in the target domain. This mapping is considered a bridge between the two domains. Based on the correlation and content-attributes of the items, we train a machine learning model to predict users' ratings in the target domain. When we compare our approach to the popularity approach and KNN-cross-domain on a real world dataset, the results show that on an average of 83% of the cases our approach outperforms both methods

    Cartesian Tree Matching and Indexing

    Get PDF
    We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation

    Fast Sublinear Sparse Representation using Shallow Tree Matching Pursuit

    Full text link
    Sparse approximations using highly over-complete dictionaries is a state-of-the-art tool for many imaging applications including denoising, super-resolution, compressive sensing, light-field analysis, and object recognition. Unfortunately, the applicability of such methods is severely hampered by the computational burden of sparse approximation: these algorithms are linear or super-linear in both the data dimensionality and size of the dictionary. We propose a framework for learning the hierarchical structure of over-complete dictionaries that enables fast computation of sparse representations. Our method builds on tree-based strategies for nearest neighbor matching, and presents domain-specific enhancements that are highly efficient for the analysis of image patches. Contrary to most popular methods for building spatial data structures, out methods rely on shallow, balanced trees with relatively few layers. We show an extensive array of experiments on several applications such as image denoising/superresolution, compressive video/light-field sensing where we practically achieve 100-1000x speedup (with a less than 1dB loss in accuracy)

    Optimal Enumeration: Efficient Top-k Tree Matching

    Full text link
    Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number ofmatches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of top-k tree pattern matching; that is, given a rooted tree T, compute its top-k matches in a directed graph G based on the twig-pattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler's procedure. We show that our enumeration algorithm runs in O(nT + log k) time in each round where nT is the number of nodes in T. Considering that the time complexity to output a match of T is O(nT) and nT = log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top-1 match of T in our algorithm is O(mR) where mR is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O(mR) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR +k(nT +log k)) in contrast to the time complexity O(mR log k+knT (log k+dT)) of the existing technique where dT is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of top-k graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude

    Incorporating structured text retrieval into the extended Boolean model

    Get PDF
    Conventional information retrieval models are inappropriate for use in databases containing semi-structured biographical data. A hybrid algorithm that effectively addresses many of the problems in searching biographical databases is presented in this article. An overview of applicable structured text retrieval algorithms is given, with focus specifically on the tree matching model. Small adaptations to the Extended Boolean Model, to make it more applicable to biographical databases, are described. The adaptation of tree matching models to the hierarchical nature of data in a person record is described and a distance function between query and record is defined. A hybrid model between the Extended Boolean Model and the adapted Tree Matching Model is then presented. A fast ranking algorithm appropriate for general searches and a more effective (but more resource intensive) algorithm for more advanced searches is given. It is shown how dates can be incorporated in the hybrid model to create a more powerful search algorithm. The hybrid algorithm can be used to rank records in descending order of relevance to a user's query

    From tree matching to sparse graph alignment

    Get PDF
    In this paper we consider alignment of sparse graphs, for which we introduce the Neighborhood Tree Matching Algorithm (NTMA). For correlated Erd\H{o}s-R\'{e}nyi random graphs, we prove that the algorithm returns -- in polynomial time -- a positive fraction of correctly matched vertices, and a vanishing fraction of mismatches. This result holds with average degree of the graphs in O(1)O(1) and correlation parameter ss that can be bounded away from 1, conditions under which random graph alignment is particularly challenging. As a byproduct of the analysis we introduce a matching metric between trees and characterize it for several models of correlated random trees. These results may be of independent interest, yielding for instance efficient tests for determining whether two random trees are correlated or independent.Comment: 33 pages, 10 figures, accepted at COLT 2020. Typos corrected, some new figures, some remarks and explanations detailed, minor changes in proof of Th. 1.
    • …
    corecore