4,241 research outputs found
Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data
We provide formal definitions and efficient secure techniques for
- turning noisy information into keys usable for any cryptographic
application, and, in particular,
- reliably and securely authenticating biometric data.
Our techniques apply not just to biometric information, but to any keying
material that, unlike traditional cryptographic keys, is (1) not reproducible
precisely and (2) not distributed uniformly. We propose two primitives: a
"fuzzy extractor" reliably extracts nearly uniform randomness R from its input;
the extraction is error-tolerant in the sense that R will be the same even if
the input changes, as long as it remains reasonably close to the original.
Thus, R can be used as a key in a cryptographic application. A "secure sketch"
produces public information about its input w that does not reveal w, and yet
allows exact recovery of w given another value that is close to w. Thus, it can
be used to reliably reproduce error-prone biometric inputs without incurring
the security risk inherent in storing them.
We define the primitives to be both formally secure and versatile,
generalizing much prior work. In addition, we provide nearly optimal
constructions of both primitives for various measures of ``closeness'' of input
data, such as Hamming distance, edit distance, and set difference.Comment: 47 pp., 3 figures. Prelim. version in Eurocrypt 2004, Springer LNCS
3027, pp. 523-540. Differences from version 3: minor edits for grammar,
clarity, and typo
Scalable string reconciliation by recursive content-dependent shingling
We consider the problem of reconciling similar strings in a distributed system. Specifically, we are interested in performing this reconciliation in an efficient manner, minimizing the communication cost. Our problem applies to several types of large-scale distributed networks, file synchronization utilities, and any system that manages the consistency of string encoded ordered data. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol that can handle large strings and has the communication complexity that scales with the edit distance between the reconciling strings. Also, we provide analysis, experimental results, and comparisons to existing synchronization software such as the Rsync utility with an implementation of our protocol.2019-12-03T00:00:00
Reconciling Graphs and Sets of Sets
We explore a generalization of set reconciliation, where the goal is to
reconcile sets of sets. Alice and Bob each have a parent set consisting of
child sets, each containing at most elements from a universe of size .
They want to reconcile their sets of sets in a scenario where the total number
of differences between all of their child sets (under the minimum difference
matching between their child sets) is . We give several algorithms for this
problem, and discuss applications to reconciliation problems on graphs,
databases, and collections of documents. We specifically focus on graph
reconciliation, providing protocols based on set of sets reconciliation for
random graphs from and for forests of rooted trees
Automatic transformation of raw clinical data into clean data using decision tree learning combining with string similarity algorithm
It is challenging to conduct statistical analyses of complex scientific datasets. It is a timeconsuming process to find the relationships within data for whether a scientist or a statistician. The process involves preprocessing the raw data, the selection of appropriate statistics, performing analysis and providing correct interpretations, among which, the data pre-processing is tedious and a particular time drain. In a large amount of data provided for analysis, there is not a standard for recording the information, and some errors either of spelling, typing or transmission. Thus, there will be many expressions for the same meaning in the data, but it will be impossible for analysis system to automatically deal with these inaccuracies. What is needed is an automatic method for transforming the raw clinical data into data which it is possible to process automatically. In this paper we propose a method combining decision tree learning with the string similarity algorithm, which is fast and accuracy to clinical data cleaning. Experimental results show that it outperforms individual string similarity algorithms and traditional data cleaning process
- …