17 research outputs found

    Algorithms for the analysis of molecular sequences

    Get PDF

    Linear Algorithm for Conservative Degenerate Pattern Matching

    Full text link
    A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a sequence of such symbols is a degenerate string. A degenerate string is said to be conservative if its number of non-solid symbols is upper-bounded by a fixed positive constant k. We consider here the matching problem of conservative degenerate strings and present the first linear-time algorithm that can find, for given degenerate strings P* and T* of total length n containing k non-solid symbols in total, the occurrences of P* in T* in O(nk) time

    Linear-Time Superbubble Identification Algorithm for Genome Assembly

    Get PDF
    DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to detect all superbubbles in a directed acyclic graph with n nodes and m (directed) edges, improving the best-known O(m log m)-time algorithm by Sung et al

    Highlights from the 1st Student Symposium on Computational Biology and Life Sciences, organised by ISCB Regional Student Group, UK

    Get PDF
    Abstract This short report summarises the scientific content and activities of a student-led event, the 1st student symposium by the UK Regional Student Group of the International Society for Computational Biology. The event took place on October 2014

    Circular sequence comparison: algorithms and applications

    Get PDF
    Background: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. Results: In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art

    Preface

    No full text

    Highlights of the 1st Student Symposium of the ISCB RSG UK

    Get PDF
    This short report summarises the scientific content and activities of a student-led event, the 1st student symposium by the UK Regional Student Group of the International Society for Computational Biology. The event took place on the 8th of October 2014

    Highlights of the 2nd Bioinformatics Student Symposium by ISCB RSG-UK

    No full text
    Following the success of the 1st Student Symposium by ISCB RSG-UK, a 2nd Student Symposium took place on 7th October 2015 at The Genome Analysis Centre, Norwich, UK. This short report summarizes the main highlights from the 2nd Bioinformatics Student Symposium
    corecore