715 research outputs found

    Cerulean: A hybrid assembly using high throughput short and long reads

    Full text link
    Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats. Contribution: We present a hybrid assembly approach that is both computationally effective and produces high quality assemblies. Our algorithm first operates with a simplified version of the assembly graph consisting only of long contigs and gradually improves the assembly by adding smaller contigs in each iteration. In contrast to the state-of-the-art long reads error correction technique, which requires high computational resources and long running time on a supercomputer even for bacterial genome datasets, our software can produce comparable assembly using only a standard desktop in a short running time.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Computational and Biological Analogies for Understanding Fine-Tuned Parameters in Physics

    Full text link
    In this philosophical paper, we explore computational and biological analogies to address the fine-tuning problem in cosmology. We first clarify what it means for physical constants or initial conditions to be fine-tuned. We review important distinctions such as the dimensionless and dimensional physical constants, and the classification of constants proposed by Levy-Leblond. Then we explore how two great analogies, computational and biological, can give new insights into our problem. This paper includes a preliminary study to examine the two analogies. Importantly, analogies are both useful and fundamental cognitive tools, but can also be misused or misinterpreted. The idea that our universe might be modelled as a computational entity is analysed, and we discuss the distinction between physical laws and initial conditions using algorithmic information theory. Smolin introduced the theory of "Cosmological Natural Selection" with a biological analogy in mind. We examine an extension of this analogy involving intelligent life. We discuss if and how this extension could be legitimated. Keywords: origin of the universe, fine-tuning, physical constants, initial conditions, computational universe, biological universe, role of intelligent life, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis.Comment: 25 pages, Foundations of Science, in pres

    Reconstructing complex regions of genomes using long-read sequencing technology

    Get PDF
    Cataloged from PDF version of article.Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state

    Resolving the complexity of the human genome using single-molecule sequencing

    Get PDF
    The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome - 78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology

    Error threshold in optimal coding, numerical criteria and classes of universalities for complexity

    Full text link
    The free energy of the Random Energy Model at the transition point between ferromagnetic and spin glass phases is calculated. At this point, equivalent to the decoding error threshold in optimal codes, free energy has finite size corrections proportional to the square root of the number of degrees. The response of the magnetization to the ferromagnetic couplings is maximal at the values of magnetization equal to half. We give several criteria of complexity and define different universality classes. According to our classification, at the lowest class of complexity are random graph, Markov Models and Hidden Markov Models. At the next level is Sherrington-Kirkpatrick spin glass, connected with neuron-network models. On a higher level are critical theories, spin glass phase of Random Energy Model, percolation, self organized criticality (SOC). The top level class involves HOT design, error threshold in optimal coding, language, and, maybe, financial market. Alive systems are also related with the last class. A concept of anti-resonance is suggested for the complex systems.Comment: 17 page

    Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, <it>Glossina pallidipes</it>, we found the need for tools to search quickly a set of reads for near exact text matches.</p> <p>Methods</p> <p>A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of <it>de novo </it>assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads.</p> <p>Results</p> <p>Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension.</p> <p>Conclusion</p> <p>The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.</p

    Efficacy of secondary isoniazid preventive therapy among HIVinfected Southern Africans: time to change policy?

    Get PDF
    Objective. To determine the efficacy of secondary preventive therapy against tuberculosis (TB) among goldminers working in South Africa. Design. An observational study. Methods. The incidence of recurrent TB was compared between two cohorts of HIV-infected miners: one cohort had received secondary preventive therapy with isoniazid and the other had not. Setting. Health service providing comprehensive care for goldminers. Participants. 338 men received secondary preventive therapy and 221 did not. Main outcome measure. Incidence of recurrent TB. Results. The overall incidence of recurrent TB was reduced by 55% among men who received isoniazid preventive therapy (IPT) compared to those who did not (incidence rates 8.6 and 19.1 per 100 person-years respectively, incidence rate ratio 0.45; 95% CI 0.26 – 0.78). The efficacy of isoniazid preventive therapy was unchanged after controlling for CD4 count and age. The number of person-years of isoniazid preventive therapy required to prevent one case of recurrent TB among individuals with a CD4 count < 200/µl and &#8805;&#61472;200/µl was 5 and 19, respectively. Conclusion. Secondary preventive therapy reduces TB recurrence: the absolute impact appears to be greatest among individuals with low CD4 counts. International TB preventive therapy guidelines for HIV-infected individuals need to be expanded to include recommendations for secondary preventive therapy in settings where TB prevalence is high. Southern African Journal of HIV Medicine Vol. 5(3) 2004: 8-1

    Rotation-disk connection for very low mass and substellar objects in the Orion Nebula Cluster

    Full text link
    Angular momentum loss requires magnetic interaction between the forming star and both the circumstellar disk and the magnetically driven outflows. In order to test these predictions many authors have investigated a rotation-disk connection in pre-main sequence objects with masses larger than about 0.4Msun. For brown dwarfs this connection was not investigated as yet because there are very few samples available. We aim to extend this investigation well down into the substellar regime for our large sample of BDs in the Orion Nebula Cluster, for which we have recently measured rotational periods. In order to investigate a rotation-disk correlation, we derived near-infrared (NIR) excesses for a sample of 732 periodic variables in the Orion Nebula Cluster with masses ranging between 1.5-0.02 Msun and whose IJHK colors are available. Circumstellar NIR excesses were derived from the Delta[I-K] index. We performed our analysis in three mass bins.We found a rotation-disk correlation in the high and intermediate mass regime, in which objects with NIR excess tend to rotate slower than objects without NIR excess. Interestingly, we found no correlation in the substellar regime. A tight correlation between the peak-to-peak (ptp) amplitude of the rotational modulation and the NIR excess was found however for all objects with available ptp values. We discuss possible scenarios which may explain the lack of rotation-disk connection in the substellar mass regime. One possible reason could be the strong dependence of the mass accretion rate on stellar mass in the investigated mass range.Comment: 12 pages, 7 figures, accepted for publication "Astronomy and Astrophysics

    Clinicopathological evaluation of chronic traumatic encephalopathy in players of American football

    Full text link
    IMPORTANCE: Players of American football may be at increased risk of long-term neurological conditions, particularly chronic traumatic encephalopathy (CTE). OBJECTIVE: To determine the neuropathological and clinical features of deceased football players with CTE. DESIGN, SETTING, AND PARTICIPANTS: Case series of 202 football players whose brains were donated for research. Neuropathological evaluations and retrospective telephone clinical assessments (including head trauma history) with informants were performed blinded. Online questionnaires ascertained athletic and military history. EXPOSURES: Participation in American football at any level of play. MAIN OUTCOMES AND MEASURES: Neuropathological diagnoses of neurodegenerative diseases, including CTE, based on defined diagnostic criteria; CTE neuropathological severity (stages I to IV or dichotomized into mild [stages I and II] and severe [stages III and IV]); informant-reported athletic history and, for players who died in 2014 or later, clinical presentation, including behavior, mood, and cognitive symptoms and dementia. RESULTS: Among 202 deceased former football players (median age at death, 66 years [interquartile range, 47-76 years]), CTE was neuropathologically diagnosed in 177 players (87%; median age at death, 67 years [interquartile range, 52-77 years]; mean years of football participation, 15.1 [SD, 5.2]), including 0 of 2 pre–high school, 3 of 14 high school (21%), 48 of 53 college (91%), 9 of 14 semiprofessional (64%), 7 of 8 Canadian Football League (88%), and 110 of 111 National Football League (99%) players. Neuropathological severity of CTE was distributed across the highest level of play, with all 3 former high school players having mild pathology and the majority of former college (27 [56%]), semiprofessional (5 [56%]), and professional (101 [86%]) players having severe pathology. Among 27 participants with mild CTE pathology, 26 (96%) had behavioral or mood symptoms or both, 23 (85%) had cognitive symptoms, and 9 (33%) had signs of dementia. Among 84 participants with severe CTE pathology, 75 (89%) had behavioral or mood symptoms or both, 80 (95%) had cognitive symptoms, and 71 (85%) had signs of dementia. CONCLUSIONS AND RELEVANCE: In a convenience sample of deceased football players who donated their brains for research, a high proportion had neuropathological evidence of CTE, suggesting that CTE may be related to prior participation in football.This study received support from NINDS (grants U01 NS086659, R01 NS078337, R56 NS078337, U01 NS093334, and F32 NS096803), the National Institute on Aging (grants K23 AG046377, P30AG13846 and supplement 0572063345-5, R01 AG1649), the US Department of Defense (grant W81XWH-13-2-0064), the US Department of Veterans Affairs (I01 CX001038), the Veterans Affairs Biorepository (CSP 501), the Veterans Affairs Rehabilitation Research and Development Traumatic Brain Injury Center of Excellence (grant B6796-C), the Department of Defense Peer Reviewed Alzheimer’s Research Program (grant 13267017), the National Operating Committee on Standards for Athletic Equipment, the Alzheimer’s Association (grants NIRG-15-362697 and NIRG-305779), the Concussion Legacy Foundation, the Andlinger Family Foundation, the WWE, and the NFL
    • …
    corecore