Search CORE

1,530 research outputs found

New algorithms for exact and approximate text matching

Author: Grabowski Szymon
Publication venue: Lodz University of Technology. Press
Publication date: 01/01/2010
Field of study

Praca przedstawia główne wyniki z tematyki algorytmów tekstowych otrzymane w Katedrze Informatyki Stosowanej w latach 2004-2009. Algorytmy te dotyczą wybranych rozmaitych problemów wyszukiwania dokładnego i przybliżonego, również w intensywnie w ostatnich latach badanym scenariuszu z wykorzystaniem kompresji.This work presents main results in the domain of text algorithms obtained in Computer Engineering Dept. in the years 2004-2009. The algorithms concern various exact and approximate string matching problems, also in the recently actively developed scenario involving compression

Lodz University of Technology Repository

The C-BRAHMS Project

Author: Lemström Kjell
Mäkinen Veli
Pienimäki Anna
Turkia Mika
Ukkonen Esko
Publication venue
Publication date: 01/01/2003
Field of study

Bononia University Press; 88-7395-155-4;Peer reviewe

CiteSeerX

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

JScholarship

A hybrid algorithm for the longest common transposition-invariant subsequence problem

Author: Deorowicz Sebastian
Grabowski Szymon
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

The longest common transposition-invariant subsequence (LCTS) problem is a music information retrieval oriented variation of the classic LCS problem. There are basically only two known efficient approaches to calculate the length of the LCTS, one based on sparse dynamic programming and the other on bit-parallelism. In this work, we propose a hybrid algorithm picking the better of the two algorithms for individual subproblems. Experiments on music (MIDI), with 32-bit and 64-bit implementations, show that the proposed algorithm outperforms the faster of the two component algorithms by a factor of 1.4–2.0, depending on sequence lengths. Similar, if not better, improvements can be observed for random data with Gaussian distribution. Also for uniformly random data, the hybrid algorithm is the winner if the alphabet is neither too small (at least 32 symbols) nor too large (up to 128 symbols). Part of the success of our scheme is attributed to a quite robust component selection heuristic

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Matemaattisen morfologian käyttö geometrisessa musiikinhaussa

Author: Karvonen Mikko
Publication venue: Helsingin yliopisto
Publication date: 01/01/2008
Field of study

The usual task in music information retrieval (MIR) is to find occurrences of a monophonic query pattern within a music database, which can contain both monophonic and polyphonic content. The so-called query-by-humming systems are a famous instance of content-based MIR. In such a system, the user's hummed query is converted into symbolic form to perform search operations in a similarly encoded database. The symbolic representation (e.g., textual, MIDI or vector data) is typically a quantized and simplified version of the sampled audio data, yielding to faster search algorithms and space requirements that can be met in real-life situations. In this thesis, we investigate geometric approaches to MIR. We first study some musicological properties often needed in MIR algorithms, and then give a literature review on traditional (e.g., string-matching-based) MIR algorithms and novel techniques based on geometry. We also introduce some concepts from digital image processing, namely the mathematical morphology, which we will use to develop and implement four algorithms for geometric music retrieval. The symbolic representation in the case of our algorithms is a binary 2-D image. We use various morphological pre- and post-processing operations on the query and the database images to perform template matching / pattern recognition for the images. The algorithms are basically extensions to classic image correlation and hit-or-miss transformation techniques used widely in template matching applications. They aim to be a future extension to the retrieval engine of C-BRAHMS, which is a research project of the Department of Computer Science at University of Helsinki

Helsingin yliopiston digitaalinen arkisto

String Indexing for Patterns with Wildcards

Author: A. Tam
B. Chazelle
D. Harel
D. Tsur
G. Chen
G. Landau
G. Landau
G. Navarro
H.L. Chan
K. Hofmann
L.P. Coelho
M. Lewenstein
M. Maas
M.L. Fredman
P. Bille
P. Bille
P. Clifford
T.-W. Lam
Z. Galil
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of indexing a string

t

of length

n

to report the occurrences of a query pattern

p

containing

m

characters and

j

wildcards. Let

occ

be the number of occurrences of

p

t

, and

\sigma

the size of the alphabet. We obtain the following results. - A linear space index with query time

O(m+\sigma^j \log \log n + occ)

. This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time

\Theta(jn)

in the worst case. - An index with query time

O(m+j+occ)

using space

O(\sigma^{k^2} n \log^k \log n)

, where

k

is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Wavefront Longest Common Subsequence Algorithm On Multicore And Gpgpu Platform.

Author: Shehabat Bilal Mahmoud
Publication venue
Publication date: 01/06/2010
Field of study

String comparison is a central operation in numerous applications. It has a critical task in many operations such as data mining, spelling error correction and molecular biology (Tan et al, 2007; Michailidis and Margaritis, 2000)

Repository@USM