2,425 research outputs found
A unifying framework for seed sensitivity and its application to subset seeds
We propose a general approach to compute the seed sensitivity, that can be
applied to different definitions of seeds. It treats separately three
components of the seed sensitivity problem -- a set of target alignments, an
associated probability distribution, and a seed model -- that are specified by
distinct finite automata. The approach is then applied to a new concept of
subset seeds for which we propose an efficient automaton construction.
Experimental results confirm that sensitive subset seeds can be efficiently
designed using our approach, and can then be used in similarity search
producing better results than ordinary spaced seeds
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
Track Reconstruction in the ALICE TPC using GPUs for LHC Run 3
In LHC Run 3, ALICE will increase the data taking rate significantly to
continuous readout of 50 kHz minimum bias Pb-Pb collisions. The reconstruction
strategy of the online offline computing upgrade foresees a first synchronous
online reconstruction stage during data taking enabling detector calibration,
and a posterior calibrated asynchronous reconstruction stage. We present a
tracking algorithm for the Time Projection Chamber (TPC), the main tracking
detector of ALICE. The reconstruction must yield results comparable to current
offline reconstruction and meet the time constraints like in the current High
Level Trigger (HLT), processing 50 times as many collisions per second as
today. It is derived from the current online tracking in the HLT, which is
based on a Cellular automaton and the Kalman filter, and we integrate missing
features from offline tracking for improved resolution. The continuous TPC
readout and overlapping collisions pose new challenges: conversion to spatial
coordinates and the application of time- and location dependent calibration
must happen in between of track seeding and track fitting while the TPC
occupancy increases five-fold. The huge data volume requires a data reduction
factor of 20, which imposes additional requirements: the momentum range must be
extended to identify low-pt looping tracks and a special refit in uncalibrated
coordinates improves the track model entropy encoding. Our TPC track finding
leverages the potential of hardware accelerators via the OpenCL and CUDA APIs
in a shared source code for CPUs, GPUs, and both reconstruction stages. Porting
more reconstruction steps like the remainder of the TPC reconstruction and
tracking for other detectors will shift the computing balance from traditional
processors to GPUs.Comment: 13 pages, 10 figures, proceedings to Connecting The Dots Workshop,
Seattle, 201
On Sloane's persistence problem
We investigate the so-called persistence problem of Sloane, exploiting
connections with the dynamics of circle maps and the ergodic theory of
actions. We also formulate a conjecture concerning the
asymptotic distribution of digits in long products of finitely many primes
whose truth would, in particular, solve the persistence problem. The heuristics
that we propose to complement our numerical studies can be thought in terms of
a simple model in statistical mechanics.Comment: 5 figure
Compressive image sensor architecture with on-chip measurement matrix generation
A CMOS image sensor architecture that uses a cellular automaton for the pseudo-random compressive sampling matrix generation is presented. The image sensor employs in-pixel pulse-frequency modulation and column wise pulse counters to produce compressed samples. A common problem of compressive sampling applied to image sensors is that the size of a full-frame compressive strategy is too large to be stored in an on-chip memory. Since this matrix has to be transmitted to or from the reconstruction system its size would also prevent practical applications. A full-frame compressive strategy generated using a 1-D cellular automaton showing a class III behavior neither needs a storage memory nor needs to be continuously transmitted. In-pixel pulse frequency modulation and up-down counters allow the generation of differential compressed samples directly in the digital domain where it is easier to improve the required dynamic range. These solutions combined together improve the accuracy of the compressed samples thus improving the performance of any generic reconstruction algorithm.Ministerio de Economía y Competitividad TEC2015-66878-C3-1-RJunta de Andalucía TIC 2338-2013Office of Naval Research (USA) N00014141035
- …