38,757 research outputs found

    Fast Structural Search in Phylogenetic Databases

    Get PDF
    As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P. The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising

    Automatic Classification of Text Databases through Query Probing

    Get PDF
    Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to find these valuable resources. We propose a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classifier's rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial exploratory experiments that show that our approach is promising to automatically characterize the contents of text databases accessible on the web.Comment: 7 pages, 1 figur

    An approximate search engine for structure

    Get PDF
    As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label and the left-to-right order among siblings matters. Unordered labeled trees are trees in which the parent-child relationship is significant, but the order among siblings is unimportant. Free trees (unrooted unordered trees) are acyclic graphs. These trees find many applications in bioinformatics, Web log analysis, phyloinformatics, XML processing, etc. Two types of similarity measures are investigated: (i) counting the mismatching paths in the query tree and a data tree, and (ii) measuring the topological relationship between the trees. The proposed approaches include storing the paths of trees in a suffix array, employing hashing techniques to speed up retrieval, and counting the number of up-down operations to move a token from one node to another node in a tree. Various filters for accelerating a search, different strategies for parallelizing these search algorithms and applications of these algorithms to XML and phylogenetic data management are discussed. The proposed techniques have been implemented into a phylogenetic search engine which is fully operational and is available on the World Wide Web. Experimental results on comparing the similarity measures with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate the effectiveness of the search engine. Future work includes extending the techniques to other structural data, as well as developing new filters and algorithms for speeding up searching and mining in complex structures

    Fast Search for Dynamic Multi-Relational Graphs

    Full text link
    Acting on time-critical events by processing ever growing social media or news streams is a major technical challenge. Many of these data sources can be modeled as multi-relational graphs. Continuous queries or techniques to search for rare events that typically arise in monitoring applications have been studied extensively for relational databases. This work is dedicated to answer the question that emerges naturally: how can we efficiently execute a continuous query on a dynamic graph? This paper presents an exact subgraph search algorithm that exploits the temporal characteristics of representative queries for online news or social media monitoring. The algorithm is based on a novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the structural and semantic characteristics of the underlying multi-relational graph. The paper concludes with extensive experimentation on several real-world datasets that demonstrates the validity of this approach.Comment: SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM), 201

    Searching digital music libraries

    Get PDF
    There has been a recent explosion of interest in digital music libraries. In particular, interactive melody retrieval is a striking example of a search paradigm that differs radically from the standard full-text search. Many different techniques have been proposed for melody matching, but the area lacks standard databases that allow them to be compared on common grounds––and copyright issues have stymied attempts to develop such a corpus. This paper focuses on methods for evaluating different symbolic music matching strategies, and describes a series of experiments that compare and contrast results obtained using three dominant paradigms. Combining two of these paradigms yields a hybrid approach which is shown to have the best overall combination of efficiency and effectiveness

    When Hashing Met Matching: Efficient Spatio-Temporal Search for Ridesharing

    Full text link
    Carpooling, or sharing a ride with other passengers, holds immense potential for urban transportation. Ridesharing platforms enable such sharing of rides using real-time data. Finding ride matches in real-time at urban scale is a difficult combinatorial optimization task and mostly heuristic approaches are applied. In this work, we mathematically model the problem as that of finding near-neighbors and devise a novel efficient spatio-temporal search algorithm based on the theory of locality sensitive hashing for Maximum Inner Product Search (MIPS). The proposed algorithm can find kk near-optimal potential matches for every ride from a pool of nn rides in time O(n1+ρ(k+logn)logk)O(n^{1 + \rho} (k + \log n) \log k) and space O(n1+ρlogk)O(n^{1 + \rho} \log k) for a small ρ<1\rho < 1. Our algorithm can be extended in several useful and interesting ways increasing its practical appeal. Experiments with large NY yellow taxi trip datasets show that our algorithm consistently outperforms state-of-the-art heuristic methods thereby proving its practical applicability

    Term-Specific Eigenvector-Centrality in Multi-Relation Networks

    Get PDF
    Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim
    corecore