17 research outputs found
Linear Algorithm for Conservative Degenerate Pattern Matching
A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a
sequence of such symbols is a degenerate string. A degenerate string is said to
be conservative if its number of non-solid symbols is upper-bounded by a fixed
positive constant k. We consider here the matching problem of conservative
degenerate strings and present the first linear-time algorithm that can find,
for given degenerate strings P* and T* of total length n containing k non-solid
symbols in total, the occurrences of P* in T* in O(nk) time
Linear-Time Superbubble Identification Algorithm for Genome Assembly
DNA sequencing is the process of determining the exact order of the
nucleotide bases of an individual's genome in order to catalogue sequence
variation and understand its biological implications. Whole-genome sequencing
techniques produce masses of data in the form of short sequences known as
reads. Assembling these reads into a whole genome constitutes a major
algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs
constructed from reads for this purpose. A critical step of these algorithms is
to detect typical motif structures in the graph caused by sequencing errors and
genome repeats, and filter them out; one such complex subgraph class is a
so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to
detect all superbubbles in a directed acyclic graph with n nodes and m
(directed) edges, improving the best-known O(m log m)-time algorithm by Sung et
al
Erratum to: Circular sequence comparison:Algorithms and applications
[This corrects the article DOI: 10.1186/s13015-016-0076-6.]
Highlights from the 1st Student Symposium on Computational Biology and Life Sciences, organised by ISCB Regional Student Group, UK
Abstract This short report summarises the scientific content and activities of a student-led event, the 1st student symposium by the UK Regional Student Group of the International Society for Computational Biology. The event took place on October 2014
Circular sequence comparison: algorithms and applications
Background: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. Results: In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art
Highlights of the 1st Student Symposium of the ISCB RSG UK
This short report summarises the scientific content and activities of a student-led event, the 1st student symposium by the UK Regional Student Group of the International Society for Computational Biology. The event took place on the 8th of October 2014
Highlights of the 2nd Bioinformatics Student Symposium by ISCB RSG-UK
Following the success of the 1st Student Symposium by ISCB RSG-UK, a 2nd Student Symposium took place on 7th October 2015 at The Genome Analysis Centre, Norwich, UK. This short report summarizes the main highlights from the 2nd Bioinformatics Student Symposium