Search CORE

2,021 research outputs found

A practical index for approximate dictionary matching with few mismatches

Author: Cisłak Aleksander
Grabowski Szymon
Publication venue
Publication date: 11/02/2016
Field of study

Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in

q

-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

arXiv.org e-Print Archive

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Simple, compact and robust approximate string dictionary

Author: Belazzougui Djamal
Chegrane Ibrahim
Publication venue
Publication date: 22/08/2014
Field of study

This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary

D

d

strings of total length

n

over an alphabet of size

\sigma

. Given a bound

k

and a pattern

x

of length

m

, a query has to return all the strings of the dictionary which are at edit distance at most

k

from

x

, where the edit distance between two strings

x

and

y

is defined as the minimum-cost sequence of edit operations that transform

x

into

y

. The cost of a sequence of operations is defined as the sum of the costs of the operations involved in the sequence. In this paper, we assume that each of these operations has unit cost and consider only three operations: deletion of one character, insertion of one character and substitution of a character by another. We present a practical implementation of the data structure we recently proposed and which works only for one error. We extend the scheme to

2\leq k<m

. Our implementation has many desirable properties: it has a very fast and space-efficient building algorithm. The dictionary data structure is compact and has fast and robust query time. Finally our data structure is simple to implement as it only uses basic techniques from the literature, mainly hashing (linear probing and hash signatures) and succinct data structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures

arXiv.org e-Print Archive

CiteSeerX

Checking Cryptographic API Specifications in JavaScript

Author: Mitchell Duncan
Publication venue
Publication date: 01/01/2020
Field of study

Royal Holloway - Pure

Bloom Filters in Adversarial Environments

Author: Naor Moni
Yogev Eylon
Publication venue
Publication date: 29/01/2019
Field of study

Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set

S

of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any

x \in S

it should always answer `Yes', and for any

x \notin S

it should answer `Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size

n

and error

\varepsilon

, that is secure against

t

queries and uses only

O(n \log{\frac{1}{\varepsilon}}+t)

bits of memory. In comparison,

n\log{\frac{1}{\varepsilon}}

is the best possible under a non-adaptive adversary

arXiv.org e-Print Archive

Cryptology ePrint Archive

Token Based Authentication and Authorization with Zero-Knowledge Proofs for Enhancing Web API Security and Privacy

Author: Lodder Michael
Publication venue: Beadle Scholar
Publication date: 01/01/2023
Field of study

This design science study showcases an innovative artifact that utilizes Zero-Knowledge Proofs for API Authentication and Authorization. A comprehensive examination of existing literature and technology is conducted to evaluate the effectiveness of this alternative approach. The study reveals that existing APIs are using slower techniques that don’t scale, can’t take advantage of newer hardware, and have been unable to adequately address current security issues. In contrast, the novel technique presented in this study performs better, is more resilient in privacy sensitive and security settings, and is easy to implement and deploy. Additionally, this study identifies potential avenues for further research that could help advance the field of Web API development in terms of security, privacy, and simplicity

Beadle Scholar at Dakota State University

Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data

Author: Ateniese Giuseppe
Pasquini Dario
Troncoso Carmela
Publication venue
Publication date: 16/02/2023
Field of study

We develop the first universal password model -- a password model that, once pre-trained, can automatically adapt to any password distribution. To achieve this result, the model does not need to access any plaintext passwords from the target set. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying target password distribution. The model uses deep learning to capture the correlation between the auxiliary data of a group of users (e.g., users of a web application) and their passwords. It then exploits those patterns to create a tailored password model for the target community at inference time. No further training steps, targeted data collection, or prior knowledge of the community's password distribution is required. Besides defining a new state-of-the-art for password strength estimation, our model enables any end-user (e.g., system administrators) to autonomously generate tailored password models for their systems without the often unworkable requirement of collecting suitable training data and fitting the underlying password model. Ultimately, our framework enables the democratization of well-calibrated password models to the community, addressing a major challenge in the deployment of password security solutions on a large scale.Comment: v0.0

arXiv.org e-Print Archive

Mobile security and smart systems

Author: Huang Xuan
Publication venue
Publication date: 01/07/2013
Field of study

Abertay Research Portal