Search CORE

13 research outputs found

Improved Algorithms for Approximate String Matching (Extended Abstract)

Author: D Lowrance
D Sankoff
Dimitris Papamichail
DR Powell
DS Hirschberg
E Ukkonen
EW Myers
Georgios Papamichail
P Sellers
R Wagner
S Needleman
T Vintsyuk
V Levenstein
VL Arlazarov
W Masek
Publication venue
Publication date: 28/07/2008
Field of study

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm excels also in practice, especially in cases where the two strings compared differ significantly in length. Source code of our algorithm is available at http://www.cs.miami.edu/\~dimitris/edit_distanceComment: 10 page

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Pitch- and spectral-based dynamic time warping methods for comparing field recordings of harmonic avian vocalizations

Author: Boersma P.
C Daniel Meliza
Dustin R. Rubenstein
Goller F.
Sara C. Keen
Searcy W. A.
Vintsyuk T. K.
Wang C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Quantitative measures of acoustic similarity can reveal patterns of shared vocal behavior in social species. Many methods for computing similarity have been developed, but their performance has not been extensively characterized in noisy environments and with vocalizations characterized by complex frequency modulations. This paper describes methods of bioacoustic comparison based on dynamic time warping (DTW) of the fundamental frequency or spectrogram. Fundamental frequency is estimated using a Bayesian particle filter adaptation of harmonic template matching. The methods were tested on field recordings of flight calls from superb starlings, Lamprotornis superbus, for how well they could separate distinct categories of call elements (motifs). The fundamental-frequency-based method performed best, but the spectrogram-based method was less sensitive to noise. Both DTW methods provided better separation of categories than spectrographic cross correlation, likely due to substantial variability in the duration of superb starling flight call motifs

Crossref

Columbia University Academic Commons

PubMed Central

Medical record linkage in health information systems by approximate string matching and clustering

Author: A Baxter
A Ben-Dor
AE Monge
AE Monge
AK McCallum
Antoine Buemi
AP Dempster
B Everitt
C Quantin
E Hartuv
EH Porter
Erik A Sauleau
G Navarro
G Navarro
H Kawaji
HB Newcombe
HB Newcombe
I Fellegi
J Hartigan
JA Hylthon
Jean-Philippe Paumier
M Fortini
M Hernandez
M Pavan
MA Jaro
MA Jaro
P Eades
P Sellers
R Baeza-Yates
R Sharan
R Sharan
T Fruchterman
T Kamada
T Vintsyuk
TF Smith
TR Belin
V Levenhstein
W Cohen
WE Winkler
WE Winkler
WE Yancey
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. METHODS: The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. RESULTS: The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. CONCLUSION: Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dynamic time warping — An improved method for 4D and tomography time shift estimation?

Author: Jon Marius Venstad
Kolbjørnsen O.
Vintsyuk T.
Zinke A.
Publication venue: 'Society of Exploration Geophysicists'
Publication date
Field of study

Crossref

Optimum partitioning of a sequence of elements into subsequences

Author: E. Kamke
G. Ya. Volochin
M. I. Shlezinger
T. K. Vintsyuk
V. I. Rybak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1973
Field of study