57 research outputs found

    Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler

    Get PDF
    BACKGROUND: Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies. PRINCIPAL FINDINGS: We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly. CONCLUSIONS: These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler

    CONDOR: a database resource of developmentally associated conserved non-coding elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative genomics is currently one of the most popular approaches to study the regulatory architecture of vertebrate genomes. Fish-mammal genomic comparisons have proved powerful in identifying conserved non-coding elements likely to be distal <it>cis-</it>regulatory modules such as enhancers, silencers or insulators that control the expression of genes involved in the regulation of early development. The scientific community is showing increasing interest in characterizing the function, evolution and language of these sequences. Despite this, there remains little in the way of user-friendly access to a large dataset of such elements in conjunction with the analysis and the visualization tools needed to study them.</p> <p>Description</p> <p>Here we present CONDOR (COnserved Non-coDing Orthologous Regions) available at: <url>http://condor.fugu.biology.qmul.ac.uk</url>. In an interactive and intuitive way the website displays data on > 6800 non-coding elements associated with over 120 early developmental genes and conserved across vertebrates. The database regularly incorporates results of ongoing <it>in vivo </it>zebrafish enhancer assays of the CNEs carried out in-house, which currently number ~100. Included and highlighted within this set are elements derived from duplication events both at the origin of vertebrates and more recently in the teleost lineage, thus providing valuable data for studying the divergence of regulatory roles between paralogs. CONDOR therefore provides a number of tools and facilities to allow scientists to progress in their own studies on the function and evolution of developmental <it>cis</it>-regulation.</p> <p>Conclusion</p> <p>By providing access to data with an approachable graphics interface, the CONDOR database presents a rich resource for further studies into the regulation and evolution of genes involved in early development.</p

    Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques

    Get PDF
    Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics

    Seasonal host and ecological drivers may promote restricted water as a viral vector

    Get PDF
    In climates with seasonally limited precipitation, terrestrial animals congregate at high densities at scarce water sources. We hypothesize that viruses can exploit the recurrence of these diverse animal congrega- tions to spread. In this study, we test the central prediction of this hypothesis — that viruses employing this transmission strategy remain stable and infectious in water. Equid herpesviruses (EHVs) were cho- sen as a model as they have been shown to remain stable and infectious in water for weeks under labo- ratory conditions. Using fecal data from wild equids from a previous study, we establish that EHVs are shed more frequently by their hosts during the dry season, increasing the probability of water source contamination with EHV. We document the presence of several strains of EHVs present in high genome copy number from the surface water and sediments of waterholes sampled across a variety of mamma- lian assemblages, locations, temperatures and pH. Phylogenetic analysis reveals that the different EHV strains found exhibit little divergence despite representing ancient lineages. We employed molecular approaches to show that EHVs shed remain stable in waterholes with detection decreasing with increas- ing temperature in sediments. Infectivity experiments using cell culture reveals that EHVs remain infectious in water derived from waterholes. The results are supportive of water as an abiotic viral vector for EHVacceptedVersio

    Functional Analysis of Conserved Non-Coding Regions Around the Short Stature hox Gene (shox) in Whole Zebrafish Embryos

    Get PDF
    Background: Mutations in the SHOX gene are responsible for Leri-Weill Dyschondrosteosis, a disorder characterised by mesomelic limb shortening. Recent investigations into regulatory elements surrounding SHOX have shown that deletions of conserved non-coding elements (CNEs) downstream of the SHOX gene produce a phenotype indistinguishable from Leri-Weill Dyschondrosteosis. As this gene is not found in rodents, we used zebrafish as a model to characterise the expression pattern of the shox gene across the whole embryo and characterise the enhancer domains of different CNEs associated with this gene. Methodology/Principal Findings: Expression of the shox gene in zebrafish was identified using in situ hybridization, with embryos showing expression in the blood, putative heart, hatching gland, brain pharyngeal arch, olfactory epithelium, and fin bud apical ectodermal ridge. By identifying sequences showing 65% identity over at least 40 nucleotides between Fugu, human, dog and opossum we uncovered 35 CNEs around the shox gene. These CNEs were compared with CNEs previously discovered by Sabherwal et al. ,resulting in the identification of smaller more deeply conserved sub-sequence. Sabherwal et al.’s CNEs were assayed for regulatory function in whole zebrafish embryos resulting in the identification of additional tissues under the regulatory control of these CNEs. Conclusion/Significance: Our results using whole zebrafish embryos have provided a more comprehensive picture of the expression pattern of the shox gene, and a better understanding of its regulation via deeply conserved noncoding elements. In particular, we identify additional tissues under the regulatory control of previously identified SHOX CNEs. We also demonstrate the importance of these CNEs in evolution by identifying duplicated shox CNEs and more deeply conserved sub-sequences within already identified CNEs

    Retroviral integrations contribute to elevated host cancer rates during germline invasion

    Get PDF
    © 2021, The Author(s). Repeated retroviral infections of vertebrate germlines have made endogenous retroviruses ubiquitous features of mammalian genomes. However, millions of years of evolution obscure many of the immediate repercussions of retroviral endogenisation on host health. Here we examine retroviral endogenisation during its earliest stages in the koala (Phascolarctos cinereus), a species undergoing germline invasion by koala retrovirus (KoRV) and affected by highcancerprevalence. We characterise KoRV integration sites (IS) in tumour and healthy tissues from 10 koalas, detecting 1002 unique IS, with hotspots of integration occurring in the vicinity of known cancer genes. We find that tumours accumulate novel IS, with proximate genes over-represented for cancer associations. We detect dysregulation of genes containing IS and identify a highly-expressed transduced oncogene. Our data provide insights into the tremendous mutational load suffered by the host during active retroviral germline invasion, a process repeatedly experienced and overcome during the evolution of vertebrate lineages

    A recent gibbon ape leukemia virus germline integration in a rodent from New Guinea

    Get PDF
    Germline colonization by retroviruses results in the formation of endogenous retroviruses (ERVs). Most colonization’s occurred millions of years ago. However, in the Australo-Papuan region (Australia and New Guinea), several recent germline colonization events have been discovered . The Wallace Line separates much of Southeast Asia from the Australo-Papuan region restricting faunal and pathogen dispersion. West of the Wallace Line, gibbon ape leukemia viruses (GALVs) have been isolated from captive gibbons. Two microbat species from China appear to have been infected naturally. East of Wallace’s Line, the woolly monkey virus (a GALV) and the closely related koala retrovirus (KoRV) have been detected in eutherians and marsupials in the Australo-Papuan region, often vertically transmitted. The detected vertically transmitted GALV-like viruses in Australo-Papuan fauna compared to sporadic horizontal transmission in Southeast Asia and China suggest the GALV-KoRV clade originates in the former region and further models of early-stage genome colonization may be found. We screened 278 samples, seven bat and one rodent family endemic to the Australo-Papuan region and bat and rodent species found on both sides of the Wallace Line. We identified two rodents ( Melomys ) from Australia and Papua New Guinea and no bat species harboring GALV-like retroviruses. Melomys leucogaster from New Guinea harbored a genomically complete replication-competent retrovirus with a shared integration site among individuals. The integration was only present in some individuals of the species indicating this retrovirus is at the earliest stages of germline colonization of the Melomys genome, providing a new small wild mammal model of early-stage genome colonization

    Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates

    Get PDF
    Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA
    corecore