Search CORE

593 research outputs found

SoK: Cryptographically Protected Database Search

Author: Cunningham Robert K.
Fuller Benjamin
Gadepally Vijay
Hamlin Ariel
Mitchell John Darby
Shay Richard
Shen Emily
Varia Mayank
Yerukhimovich Arkady
Publication venue
Publication date: 01/01/2017
Field of study

Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering the database. This separation limits unnecessary administrator access and protects data in the case of system breaches. Since protected search was introduced in 2000, the area has grown rapidly; systems are offered by academia, start-ups, and established companies. However, there is no best protected search system or set of techniques. Design of such systems is a balancing act between security, functionality, performance, and usability. This challenge is made more difficult by ongoing database specialization, as some users will want the functionality of SQL, NoSQL, or NewSQL databases. This database evolution will continue, and the protected search community should be able to quickly provide functionality consistent with newly invented databases. At the same time, the community must accurately and clearly characterize the tradeoffs between different approaches. To address these challenges, we provide the following contributions: 1) An identification of the important primitive operations across database paradigms. We find there are a small number of base operations that can be used and combined to support a large number of database paradigms. 2) An evaluation of the current state of protected search systems in implementing these base operations. This evaluation describes the main approaches and tradeoffs for each base operation. Furthermore, it puts protected search in the context of unprotected search, identifying key gaps in functionality. 3) An analysis of attacks against protected search for different base queries. 4) A roadmap and tools for transforming a protected search system into a protected database, including an open-source performance evaluation platform and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Space-Economical Partial Gram Indices for Exact Substring Matching

Author: Boncz P.A. (Peter)
Sidirourgos E. (Eleftherios)
Tang N. (Nan)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/11/2009
Field of study

CWI's Institutional Repository

Geographical Search with Approximate String in Spatial Databases

Author: S.Anandhi, B .Anantharaj, R.Hariraman
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2014
Field of study

International Journal on Recent and Innovation Trends in Computing and Communication

Cloud-Scale Entity Resolution: Current State and Open Challenges

Author: Eike Schallehn
Gunter Saake
Xiao Chen
Publication venue: RonPub
Publication date: 01/01/2018
Field of study

Entity resolution (ER) is a process to identify records in information systems, which refer to the same real-world entity. Because in the two recent decades the data volume has grown so large, parallel techniques are called upon to satisfy the ER requirements of high performance and scalability. The development of parallel ER has reached a relatively prosperous stage, and has found its way into several applications. In this work, we first comprehensively survey the state of the art of parallel ER approaches. From the comprehensive overview, we then extract the classification criteria of parallel ER, classify and compare these approaches based on these criteria. Finally, we identify open research questions and challenges and discuss potential solutions and further research potentials in this field

RonPub -- Research Online Publishing

Approximate String Joins in a Database (Almost) for Free -- Erratum

Author: Gravano Luis
Ipeirotis Panagiotis G.
Jagadish H. V.
Koudas Nick
Muthukrishnan S.
Srivastava Divesh
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

In [GIJ+01a, GIJ+01b] we described how to use q-grams in an RDBMS to perform approximate string joins. We also showed how to implement the approximate join using plain SQL queries. Specifically, we described three filters, count filter, position filter, and length filter, which can be used to execute efficiently the approximate join. The intuition behind the count filter was that strings that are similar have many q-grams in common. In particular, two strings s1 and s2 can have up to max max {|s1|, |s2|} + q - 1 common q-grams. When s1 = s2, they have exactly that many q-grams in common. When s1 and s2 are within edit distance k, they share at least (max {|s1|, |s2|} + q - 1) - kq q-grams, since kq is the maximum numbers of q-grams that can be affected by k edit distance operations

CiteSeerX

Columbia University Academic Commons

Query-Driven Sampling for Collective Entity Resolution

Author: Grant Christan
Wang Daisy Zhe
Wick Michael L.
Publication venue
Publication date: 13/08/2015
Field of study

Probabilistic databases play a preeminent role in the processing and management of uncertain data. Recently, many database research efforts have integrated probabilistic models into databases to support tasks such as information extraction and labeling. Many of these efforts are based on batch oriented inference which inhibits a realtime workflow. One important task is entity resolution (ER). ER is the process of determining records (mentions) in a database that correspond to the same real-world entity. Traditional pairwise ER methods can lead to inconsistencies and low accuracy due to localized decisions. Leading ER systems solve this problem by collectively resolving all records using a probabilistic graphical model and Markov chain Monte Carlo (MCMC) inference. However, for large datasets this is an extremely expensive process. One key observation is that, such exhaustive ER process incurs a huge up-front cost, which is wasteful in practice because most users are interested in only a small subset of entities. In this paper, we advocate pay-as-you-go entity resolution by developing a number of query-driven collective ER techniques. We introduce two classes of SQL queries that involve ER operators --- selection-driven ER and join-driven ER. We implement novel variations of the MCMC Metropolis Hastings algorithm to generate biased samples and selectivity-based scheduling algorithms to support the two classes of ER queries. Finally, we show that query-driven ER algorithms can converge and return results within minutes over a database populated with the extraction from a newswire dataset containing 71 million mentions

arXiv.org e-Print Archive

Crossref