76,745 research outputs found
The 3of5 web application for complex and comprehensive pattern matching in protein sequences
BACKGROUND: The identification of patterns in biological sequences is a key challenge in genome analysis and in proteomics. Frequently such patterns are complex and highly variable, especially in protein sequences. They are frequently described using terms of regular expressions (RegEx) because of the user-friendly terminology. Limitations arise for queries with the increasing complexity of patterns and are accompanied by requirements for enhanced capabilities. This is especially true for patterns containing ambiguous characters and positions and/or length ambiguities. RESULTS: We have implemented the 3of5 web application in order to enable complex pattern matching in protein sequences. 3of5 is named after a special use of its main feature, the novel n-of-m pattern type. This feature allows for an extensive specification of variable patterns where the individual elements may vary in their position, order, and content within a defined stretch of sequence. The number of distinct elements can be constrained by operators, and individual characters may be excluded. The n-of-m pattern type can be combined with common regular expression terms and thus also allows for a comprehensive description of complex patterns. 3of5 increases the fidelity of pattern matching and finds ALL possible solutions in protein sequences in cases of length-ambiguous patterns instead of simply reporting the longest or shortest hits. Grouping and combined search for patterns provides a hierarchical arrangement of larger patterns sets. The algorithm is implemented as internet application and freely accessible. The application is available at . CONCLUSION: The 3of5 application offers an extended vocabulary for the definition of search patterns and thus allows the user to comprehensively specify and identify peptide patterns with variable elements. The n-of-m pattern type offers an improved accuracy for pattern matching in combination with the ability to find all solutions, without compromising the user friendliness of regular expression terms
Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
Many robotics applications require precise pose estimates despite operating
in large and changing environments. This can be addressed by visual
localization, using a pre-computed 3D model of the surroundings. The pose
estimation then amounts to finding correspondences between 2D keypoints in a
query image and 3D points in the model using local descriptors. However,
computational power is often limited on robotic platforms, making this task
challenging in large-scale environments. Binary feature descriptors
significantly speed up this 2D-3D matching, and have become popular in the
robotics community, but also strongly impair the robustness to perceptual
aliasing and changes in viewpoint, illumination and scene structure. In this
work, we propose to leverage recent advances in deep learning to perform an
efficient hierarchical localization. We first localize at the map level using
learned image-wide global descriptors, and subsequently estimate a precise pose
from 2D-3D matches computed in the candidate places only. This restricts the
local search and thus allows to efficiently exploit powerful non-binary
descriptors usually dismissed on resource-constrained devices. Our approach
results in state-of-the-art localization performance while running in real-time
on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations
Revisiting Waiting Times in DNA evolution
Transcription factors are short stretches of DNA (or -mers) mainly located
in promoters sequences that enhance or repress gene expression. With respect to
an initial distribution of letters on the DNA alphabet, Behrens and Vingron
consider a random sequence of length that does not contain a given -mer
or word of size . Under an evolution model of the DNA, they compute the
probability that this -mer appears after a unit time of 20
years. They prove that the waiting time for the first apparition of the -mer
is well approximated by . Their work relies on the
simplifying assumption that the -mer is not self-overlapping. They observe
in particular that the waiting time is mostly driven by the initial
distribution of letters.
Behrens et al. use an approach by automata that relaxes the assumption
related to words overlaps. Their numerical evaluations confirms the validity of
Behrens and Vingron approach for non self-overlapping words, but provides up to
44% corrections for highly self-overlapping words such as . We
devised an approach of the problem by clump analysis and generating functions;
this approach leads to prove a quasi-linear behaviour of for a
large range of values of , an important result for DNA evolution. We present
here this clump analysis, first by language decomposition, and next by an
automaton construction; finally, we describe an equivalent approach by
construction of Markov automata.Comment: 19 pages, 3 Figures, 2 Table
An Efficient Dynamic Programming Algorithm for the Generalized LCS Problem with Multiple Substring Exclusion Constrains
In this paper, we consider a generalized longest common subsequence problem
with multiple substring exclusion constrains. For the two input sequences
and of lengths and , and a set of constrains
of total length , the problem is to find a common subsequence of and
excluding each of constrain string in as a substring and the length of
is maximized. The problem was declared to be NP-hard\cite{1}, but we
finally found that this is not true. A new dynamic programming solution for
this problem is presented in this paper. The correctness of the new algorithm
is proved. The time complexity of our algorithm is .Comment: arXiv admin note: substantial text overlap with arXiv:1301.718
- …