Search CORE

7 research outputs found

A practical index for approximate dictionary matching with few mismatches

Author: Cisłak Aleksander
Grabowski Szymon
Publication venue
Publication date: 11/02/2016
Field of study

Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in

q

-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

arXiv.org e-Print Archive

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Lightweight Fingerprints for Fast Approximate Keyword Matching Using Bitwise Operations

Author: Cisłak Aleksander
Grabowski Szymon
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 31/05/2019
Field of study

We aim to speed up approximate keyword matching with the use of a lightweight, fixed-size block of data for each string, called a fingerprint. These work in a similar way to hash values; however, they can be also used for matching with errors. They store information regarding symbol occurrences using individual bits, and they can be compared against each other with a constant number of bitwise operations. In this way, certain strings can be deduced to be at least within the distance k from each other (using Hamming or Levenshtein distance) without performing an explicit verification. We show experimentally that for a preprocessed collection of strings, fingerprints can provide substantial speedups for k = 1, namely over 2.5 times for the Hamming distance and over 30 times for the Levenshtein distance. Tests were conducted on synthetic and real-world English and URL data

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

A Study on Fuzzy Cognitive Map Optimization Using Metaheuristics

Author: Cisłak Aleksander
Homenda Władysław
Jastrzębska Agnieszka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2016
Field of study

Part 8: Intelligent Distributed SystemsInternational audienceFuzzy Cognitive Maps (FCMs) are a framework based on weighted directed graphs which can be used for system modeling. The relationships between the concepts are stored in graph edges and they are expressed as real numbers from the

[-1,1]

interval (called weights). Our goal was to evaluate the effectiveness of non-deterministic optimization algorithms which can calculate weight matrices (i.e. collections of all weights) of FCMs for synthetic and real-world time series data sets. The best results were reported for Differential Evolution (DE) with recombination based on 3 random individuals, as well as Particle Swarm Optimization (PSO) where each particle is guided by its neighbors and the best particle. The choice of the algorithm was not crucial for maps of size roughly up to 10 nodes, however, the difference in performance was substantial (in the orders of magnitude) for bigger matrices

SOPanG: online text searching over a pan-genome

Author: Aleksander Cisłak
Baeza-Yates
Bernardini
Faro
Fredriksson
Grossi
Iliopoulos
Inanc Birol
Jan Holub
Pissis
Szymon Grabowski
The Computational Pan-Genomics Consortium
Valenzuela
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref