Search CORE

103 research outputs found

Approximate String Joins in a Database (Almost) for Free -- Erratum

Author: Gravano Luis
Ipeirotis Panagiotis G.
Jagadish H. V.
Koudas Nick
Muthukrishnan S.
Srivastava Divesh
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

In [GIJ+01a, GIJ+01b] we described how to use q-grams in an RDBMS to perform approximate string joins. We also showed how to implement the approximate join using plain SQL queries. Specifically, we described three filters, count filter, position filter, and length filter, which can be used to execute efficiently the approximate join. The intuition behind the count filter was that strings that are similar have many q-grams in common. In particular, two strings s1 and s2 can have up to max max {|s1|, |s2|} + q - 1 common q-grams. When s1 = s2, they have exactly that many q-grams in common. When s1 and s2 are within edit distance k, they share at least (max {|s1|, |s2|} + q - 1) - kq q-grams, since kq is the maximum numbers of q-grams that can be affected by k edit distance operations

CiteSeerX

Columbia University Academic Commons

Using Element Clustering to Increase the Efficiency of XML Schema Matching

Author: Jonker Willem
Keulen Maurice van
Smiljanic Marko
Publication venue
Publication date: 01/01/2006
Field of study

Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research

Crossref

University of Twente Research Information

Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity

Author: Beck Martin
Kerschbaum Florian
Publication venue
Publication date: 12/02/2013
Field of study

Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly suitable for cloud computing. We extend our protocol, such that it mitigates iterated differential attacks proposed by Goodrich. Further an implementation of the system is evaluated and compared against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure

arXiv.org e-Print Archive

Crossref

On Demand Quality of web services using Ranking by multi criteria

Author: Kumar Pradeep
Meena B.
Rajanikath Nagelli
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 04/11/2011
Field of study

In the Web database scenario, the records to match are highly query-dependent, since they can only be obtained through online queries. Moreover, they are only a partial and biased portion of all the data in the source Web databases. Consequently, hand-coding or offline-learning approaches are not appropriate for two reasons. First, the full data set is not available beforehand, and therefore, good representative data for training are hard to obtain. Second, and most importantly, even if good representative data are found and labeled for learning, the rules learned on the representatives of a full data set may not work well on a partial and biased part of that data set. Keywords: SOA, Web Services, Network

International Institute for Science, Technology and Education (IISTE): E-Journals

Type Ahead Search in Database using SQL

Author: Salunke Shrikant Dadasaheb, Prof. Bere Sachin Sukhadeo
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 26/02/2015
Field of study

A type ahead search system computes answers on the fly as a user types in a keyword query character by character. We are going to study how to support type ahead search on data in a relational DBMS. We focus on how to help this type of search using the SQL. A prominent task that tests is how to influence existing database functionalities to meet the high performance to achieve an interactive speed. We extended the efficient way to the case of fuzzy queries, and suggested various techniques to improve query performance. We suggested incremental computation method to answer multi keyword queries, and calculated how to support first N queries and incremental updates. Our experimental results on large and real data sets showed that the proposed techniques can enables DBMS systems to support search as you type on large tables. DOI: 10.17762/ijritcc2321-8169.15024

International Journal on Recent and Innovation Trends in Computing and Communication