103 research outputs found

    Approximate String Joins in a Database (Almost) for Free -- Erratum

    Get PDF
    In [GIJ+01a, GIJ+01b] we described how to use q-grams in an RDBMS to perform approximate string joins. We also showed how to implement the approximate join using plain SQL queries. Specifically, we described three filters, count filter, position filter, and length filter, which can be used to execute efficiently the approximate join. The intuition behind the count filter was that strings that are similar have many q-grams in common. In particular, two strings s1 and s2 can have up to max max {|s1|, |s2|} + q - 1 common q-grams. When s1 = s2, they have exactly that many q-grams in common. When s1 and s2 are within edit distance k, they share at least (max {|s1|, |s2|} + q - 1) - kq q-grams, since kq is the maximum numbers of q-grams that can be affected by k edit distance operations

    Using Element Clustering to Increase the Efficiency of XML Schema Matching

    Get PDF
    Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research

    Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity

    Full text link
    Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly suitable for cloud computing. We extend our protocol, such that it mitigates iterated differential attacks proposed by Goodrich. Further an implementation of the system is evaluated and compared against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure

    On Demand Quality of web services using Ranking by multi criteria

    Get PDF
    In the Web database scenario, the records to match are highly query-dependent, since they can only be obtained through online queries. Moreover, they are only a partial and biased portion of all the data in the source Web databases. Consequently, hand-coding or offline-learning approaches are not appropriate for two reasons. First, the full data set is not available beforehand, and therefore, good representative data for training are hard to obtain. Second, and most importantly, even if good representative data are found and labeled for learning, the rules learned on the representatives of a full data set may not work well on a partial and biased part of that data set. Keywords: SOA, Web Services, Network

    Type Ahead Search in Database using SQL

    Get PDF
    A type ahead search system computes answers on the fly as a user types in a keyword query character by character. We are going to study how to support type ahead search on data in a relational DBMS. We focus on how to help this type of search using the SQL. A prominent task that tests is how to influence existing database functionalities to meet the high performance to achieve an interactive speed. We extended the efficient way to the case of fuzzy queries, and suggested various techniques to improve query performance. We suggested incremental computation method to answer multi keyword queries, and calculated how to support first N queries and incremental updates. Our experimental results on large and real data sets showed that the proposed techniques can enables DBMS systems to support search as you type on large tables. DOI: 10.17762/ijritcc2321-8169.15024
    • …
    corecore