403 research outputs found

    Faster subsequence recognition in compressed strings

    Full text link
    Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length mˉ\bar m, and an uncompressed pattern of length nn, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time O(mˉn2logn)O(\bar mn^2 \log n). We improve the running time to O(mˉn1.5)O(\bar mn^{1.5}). Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time O(mˉn1.5)O(\bar mn^{1.5}); the same problem with a compressed pattern is known to be NP-hard

    Phylogenetic comparative assembly

    Get PDF
    Husemann P, Stoye J. Phylogenetic Comparative Assembly. Algorithms for Molecular Biology. 2010;5(1): 3.BACKGROUND:Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. RESULTS:Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent. Subsequently, this graph can be used to compute a layout graph that shows the most promising contig adjacencies in order to aid biologists in finishing the complete genomic sequence. The layout graph shows unique contig orderings where possible, and the best alternatives where necessary. CONCLUSIONS:Our new algorithm for contig ordering uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. An evaluation of our implementation shows that it performs better than recent approaches while being much faster at the same tim

    BFAST: An Alignment Tool for Large Scale Genome Resequencing

    Get PDF
    BACKGROUND:The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation. METHODOLOGY:We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. CONCLUSIONS:We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net)

    Analysis of cancer metabolism with high-throughput technologies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent advances in genomics and proteomics have allowed us to study the nuances of the Warburg effect – a long-standing puzzle in cancer energy metabolism – at an unprecedented level of detail. While modern next-generation sequencing technologies are extremely powerful, the lack of appropriate data analysis tools makes this study difficult. To meet this challenge, we developed a novel application for comparative analysis of gene expression and visualization of RNA-Seq data.</p> <p>Results</p> <p>We analyzed two biological samples (normal human brain tissue and human cancer cell lines) with high-energy, metabolic requirements. We calculated digital topology and the copy number of every expressed transcript. We observed subtle but remarkable qualitative and quantitative differences between the citric acid (TCA) cycle and glycolysis pathways. We found that in the first three steps of the TCA cycle, digital expression of aconitase 2 (<it>ACO2</it>) in the brain exceeded both citrate synthase (<it>CS</it>) and isocitrate dehydrogenase 2 (<it>IDH2</it>), while in cancer cells this trend was quite the opposite. In the glycolysis pathway, all genes showed higher expression levels in cancer cell lines; and most notably, digital gene expression of glyceraldehyde-3-phosphate dehydrogenase (<it>GAPDH</it>) and enolase (<it>ENO</it>) were considerably increased when compared to the brain sample.</p> <p>Conclusions</p> <p>The variations we observed should affect the rates and quantities of ATP production. We expect that the developed tool will provide insights into the subtleties related to the causality between the Warburg effect and neoplastic transformation. Even though we focused on well-known and extensively studied metabolic pathways, the data analysis and visualization pipeline that we developed is particularly valuable as it is global and pathway-independent.</p

    Horizontal DNA transfer mechanisms of bacteria as weapons of intragenomic conflict

    Get PDF
    Horizontal DNA transfer (HDT) is a pervasive mechanism of diversification in many microbial species, but its primary evolutionary role remains controversial. Much recent research has emphasised the adaptive benefit of acquiring novel DNA, but here we argue instead that intragenomic conflict provides a coherent framework for understanding the evolutionary origins of HDT. To test this hypothesis, we developed a mathematical model of a clonally descended bacterial population undergoing HDT through transmission of mobile genetic elements (MGEs) and genetic transformation. Including the known bias of transformation toward the acquisition of shorter alleles into the model suggested it could be an effective means of counteracting the spread of MGEs. Both constitutive and transient competence for transformation were found to provide an effective defence against parasitic MGEs; transient competence could also be effective at permitting the selective spread of MGEs conferring a benefit on their host bacterium. The coordination of transient competence with cell-cell killing, observed in multiple species, was found to result in synergistic blocking of MGE transmission through releasing genomic DNA for homologous recombination while simultaneously reducing horizontal MGE spread by lowering the local cell density. To evaluate the feasibility of the functions suggested by the modelling analysis, we analysed genomic data from longitudinal sampling of individuals carrying Streptococcus pneumoniae. This revealed the frequent within-host coexistence of clonally descended cells that differed in their MGE infection status, a necessary condition for the proposed mechanism to operate. Additionally, we found multiple examples of MGEs inhibiting transformation through integrative disruption of genes encoding the competence machinery across many species, providing evidence of an ongoing "arms race." Reduced rates of transformation have also been observed in cells infected by MGEs that reduce the concentration of extracellular DNA through secretion of DNases. Simulations predicted that either mechanism of limiting transformation would benefit individual MGEs, but also that this tactic's effectiveness was limited by competition with other MGEs coinfecting the same cell. A further observed behaviour we hypothesised to reduce elimination by transformation was MGE activation when cells become competent. Our model predicted that this response was effective at counteracting transformation independently of competing MGEs. Therefore, this framework is able to explain both common properties of MGEs, and the seemingly paradoxical bacterial behaviours of transformation and cell-cell killing within clonally related populations, as the consequences of intragenomic conflict between self-replicating chromosomes and parasitic MGEs. The antagonistic nature of the different mechanisms of HDT over short timescales means their contribution to bacterial evolution is likely to be substantially greater than previously appreciated

    Liveable Open Public Space - From Flaneur to Cyborg

    Get PDF
    Open public spaces have always been key elements of the city. Now they are also crucial for mixed reality. It is the main carrier of urban life, place for socialization, where users rest, have fun and talk. Moreover, “Seeing others and being seen” is a condition of socialization. Intensity of life in public spaces provides qualities like safety, comfort and attractiveness. Furthermore, open public spaces represent a spatial framework for meetings and multileveled interactions, and should include virtual flows, stimulating merging of physical and digital reality. Aim of the chapter is to present a critical analysis of public open spaces, aspects of their social role and liveability. It will also suggest how new technologies, in a mixed reality world, may enhance design approaches and upgrade the relationship between a user and his surroundings. New technologies are necessary for obtaining physical/digital spaces, becoming playable and liveable which will encourage walking, cycling, standing and interacting. Hence, they will attract more citizens and visitors, assure a healthy environment, quality of life and sociability. Public space, acting as an open book of the history of the city and of its future, should play a new role, being a place of reference for the flaneur/cyborg citizen personal and social life. The key result is a framework for understanding the particular importance of cyberparks in contemporary urban life in order to better adapt technologies in the modern urban life needs

    CNV-seq, a new method to detect copy number variation using high-throughput sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations.</p> <p>Results</p> <p>Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads.</p> <p>Conclusion</p> <p>Simulation of various sequencing methods with coverage between 0.1× to 8× show overall specificity between 91.7 – 99.9%, and sensitivity between 72.2 – 96.5%. We also show the results for assessment of CNV between two individual human genomes.</p

    Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

    Get PDF
    High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/
    corecore