63,806 research outputs found
Searching digital music libraries
There has been a recent explosion of interest in digital music libraries. In particular, interactive melody retrieval is a striking example of a search paradigm that differs radically from the standard full-text search. Many different techniques have been proposed for melody matching, but the area lacks standard databases that allow them to be compared on common groundsāāand copyright issues have stymied attempts to develop such a corpus. This paper focuses on methods for evaluating different symbolic music matching strategies, and describes a series of experiments that compare and contrast results obtained using three dominant paradigms. Combining two of these paradigms yields a hybrid approach which is shown to have the best overall combination of efficiency and effectiveness
Indexing large genome collections on a PC
Motivation: The availability of thousands of invidual genomes of one species
should boost rapid progress in personalized medicine or understanding of the
interaction between genotype and phenotype, to name a few applications. A key
operation useful in such analyses is aligning sequencing reads against a
collection of genomes, which is costly with the use of existing algorithms due
to their large memory requirements.
Results: We present MuGI, Multiple Genome Index, which reports all
occurrences of a given pattern, in exact and approximate matching model,
against a collection of thousand(s) genomes. Its unique feature is the small
index size fitting in a standard computer with 16--32\,GB, or even 8\,GB, of
RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is
also fast. For example, the exact matching queries are handled in average time
of 39\,s and with up to 3 mismatches in 373\,s on the test PC with
the index size of 13.4\,GB. For a smaller index, occupying 7.4\,GB in memory,
the respective times grow to 76\,s and 917\,s.
Availability: Software and Suuplementary material:
\url{http://sun.aei.polsl.pl/mugi}
Collaborative video searching on a tabletop
Almost all system and application design for multimedia systems is based around a single user working in isolation to perform some task yet much of the work for which we use computers to help us, is based on working collaboratively with colleagues. Groupware systems do support user collaboration but typically this is supported through software and users still physically work independently. Tabletop systems, such as the DiamondTouch from MERL, are interface devices which support direct user collaboration on a tabletop. When a tabletop is used as the interface for a multimedia system, such as a video search system, then this kind of direct collaboration raises many questions for system design. In this paper we present a tabletop system for supporting a pair of users in a video search task and we evaluate the system not only in terms of search performance but also in terms of userāuser interaction and how different user personalities within each pair of searchers impacts search performance and user interaction. Incorporating the user into the system evaluation as we have done here reveals several interesting results and has important ramifications for the design of a multimedia search system
Rust-Bio - a fast and safe bioinformatics library
We present Rust-Bio, the first general purpose bioinformatics library for the
innovative Rust programming language. Rust-Bio leverages the unique combination
of speed, memory safety and high-level syntax offered by Rust to provide a fast
and safe set of bioinformatics algorithms and data structures with a focus on
sequence analysis
Fast algorithms for matching CCD images to a stellar catalogue
Two new algorithms are described for matching two dimensional coordinate
lists of point sources that are signifcantly faster than previous methods. By
matching rarely occurring triangles (or more complex shapes) in the two lists,
and by ordering searches by decreasing probability of success, it is
demonstrated that very few candidates need be considered to find a successful
match. Moreover, by immediately testing the suitability of a potential match
using an efficient mechanism, the need to process the entire candidate set is
avoided, yielding considerable performance improvements. Triangles are
described by a cosine metric that reduces the density of triangle space,
permitting efficient searches. An alternative shape characterization method
that reduces computational overhead in the construction phase is discussed. The
algorithms are tested on a set of 10 063 wide-field survey images, with
fields-of-view up to 4.8 x 3.6 deg, successfully matching 100% of the images in
a mean elapsed time of 6 ms (2.4 GHz Athlon CPU). The elapsed time of the
searching phase is shown to vary by less than 1 ms for list sizes between 10
and 200 points, demonstrating that fast, robust searches may be completed in
nearly constant time, independent of list size.Comment: Accepted for publication in Publications of the Astronomical Society
of Australi
TALON - The Telescope Alert Operation Network System: Intelligent Linking of Distributed Autonomous Robotic Telescopes
The internet has brought about great change in the astronomical community,
but this interconnectivity is just starting to be exploited for use in
instrumentation. Utilizing the internet for communicating between distributed
astronomical systems is still in its infancy, but it already shows great
potential. Here we present an example of a distributed network of telescopes
that performs more efficiently in synchronous operation than as individual
instruments. RAPid Telescopes for Optical Response (RAPTOR) is a system of
telescopes at LANL that has intelligent intercommunication, combined with
wide-field optics, temporal monitoring software, and deep-field follow-up
capability all working in closed-loop real-time operation. The Telescope ALert
Operations Network (TALON) is a network server that allows intercommunication
of alert triggers from external and internal resources and controls the
distribution of these to each of the telescopes on the network. TALON is
designed to grow, allowing any number of telescopes to be linked together and
communicate. Coupled with an intelligent alert client at each telescope, it can
analyze and respond to each distributed TALON alert based on the telescopes
needs and schedule.Comment: Presentation at SPIE 2004, Glasgow, Scotland (UK
Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
Error-tolerant recognition enables the recognition of strings that deviate
mildly from any string in the regular set recognized by the underlying finite
state recognizer. Such recognition has applications in error-tolerant
morphological processing, spelling correction, and approximate string matching
in information retrieval. After a description of the concepts and algorithms
involved, we give examples from two applications: In the context of
morphological analysis, error-tolerant recognition allows misspelled input word
forms to be corrected, and morphologically analyzed concurrently. We present an
application of this to error-tolerant analysis of agglutinative morphology of
Turkish words. The algorithm can be applied to morphological analysis of any
language whose morphology is fully captured by a single (and possibly very
large) finite state transducer, regardless of the word formation processes and
morphographemic phenomena involved. In the context of spelling correction,
error-tolerant recognition can be used to enumerate correct candidate forms
from a given misspelled string within a certain edit distance. Again, it can be
applied to any language with a word list comprising all inflected forms, or
whose morphology is fully described by a finite state transducer. We present
experimental results for spelling correction for a number of languages. These
results indicate that such recognition works very efficiently for candidate
generation in spelling correction for many European languages such as English,
Dutch, French, German, Italian (and others) with very large word lists of root
and inflected forms (some containing well over 200,000 forms), generating all
candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a
SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in
Computational Linguistics Volume 22 No:1, 1996, Also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
- ā¦