13 research outputs found

    Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs <it>de novo</it>, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters.</p> <p>Results</p> <p>Unlike two single fragment reads, in paired-end sequence reads, such as BAC-end sequences, the two sequences in the pair have a known positional relationship in the original genome. This provides an additional level of confidence over match scores and e-values in the accuracy of the positional assignment of the reads in the comparative genome. Three commonly used sequence alignment programs: MegaBLAST, Blastz and PatternHunter were used to align a set of ovine BAC-end sequences against the equine genome assembly. A range of different search parameters, with a particular focus on contiguous and discontiguous seeds, were used for each program. The number of reads with a hit and the number of read pairs with hits for the two end sequences in the tail-to-tail paired-end configuration were plotted relative to the theoretical maximum expected curve. Of the programs tested, MegaBLAST with short contiguous seed lengths (word size 8-11) performed best in this particular task. In addition the data also provides estimates of the false positive and false negative rates, which can be used to determine the appropriate values of additional parameters, such as score cut-off, to balance sensitivity and specificity. To determine whether the approach also worked for the alignment of shorter reads, the first 240 bases of each BAC end sequence were also aligned to the equine genome. Again, contiguous MegaBLAST performed the best in optimising the sensitivity and specificity with which sheep BAC end reads map to the equine and bovine genomes.</p> <p>Conclusions</p> <p>Paired-end reads, such as BAC-end sequences, provide an efficient mechanism to optimise sequence alignment parameters, for example for comparative genome assemblies, by providing an objective standard to evaluate performance.</p

    Analysis of the complement and molecular evolution of tRNA genes in cow

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Detailed information regarding the number and organization of transfer RNA (tRNA) genes at the genome level is becoming readily available with the increase of DNA sequencing of whole genomes. However the identification of functional tRNA genes is challenging for species that have large numbers of repetitive elements containing tRNA derived sequences, such as <it>Bos taurus</it>. Reliable identification and annotation of entire sets of tRNA genes allows the evolution of tRNA genes to be understood on a genomic scale.</p> <p>Results</p> <p>In this study, we explored the <it>B. taurus </it>genome using bioinformatics and comparative genomics approaches to catalogue and analyze cow tRNA genes. The initial analysis of the cow genome using tRNAscan-SE identified 31,868 putative tRNA genes and 189,183 pseudogenes, where 28,830 of the 31,868 predicted tRNA genes were classified as repetitive elements by the RepeatMasker program. We then used comparative genomics to further discriminate between functional tRNA genes and tRNA-derived sequences for the remaining set of 3,038 putative tRNA genes. For our analysis, we used the human, chimpanzee, mouse, rat, horse, dog, chicken and fugu genomes to predict that the number of active tRNA genes in cow lies in the vicinity of 439. Of this set, 150 tRNA genes were 100% identical in their sequences across all nine vertebrate genomes studied. Using clustering analyses, we identified a new tRNA-Gly<sup>CCC </sup>subfamily present in all analyzed mammalian genomes. We suggest that this subfamily originated from an ancestral tRNA-Gly<sup>GCC </sup>gene via a point mutation prior to the radiation of the mammalian lineages. Lastly, in a separate analysis we created phylogenetic profiles for each putative cow tRNA gene using a representative set of genomes to gain an overview of common evolutionary histories of tRNA genes.</p> <p>Conclusion</p> <p>The use of a combination of bioinformatics and comparative genomics approaches has allowed the confident identification of a set of cow tRNA genes that will facilitate further studies in understanding the molecular evolution of cow tRNA genes.</p

    A multiway analysis for identifying high integrity bovine BACs

    Get PDF
    Background: In large genomics projects involving many different types of analyses of bacterial artificial chromosomes (BACs), such as fingerprinting, end sequencing (BES) and full BAC sequencing there are many opportunities for the identities of BACs to become confused. However, by comparing the results from the different analyses, inconsistencies can be identified and a set of high integrity BACs preferred for future research can be defined. Results: The location of each bovine BAC in the BAC fingerprint-based genome map and in the genome assembly were compared based on the reported BESs, and for a smaller number of BACs the full sequence. BACs with consistent positions in all three datasets, or if the full sequence was not available, for both the fingerprint map and BES-based alignments, were deemed to be correctly positioned. BACs with consistent BES-based and fingerprint-based locations, but with conflicting locations based on the fully sequenced BAC, appeared to have been misidentified during sequencing, and included a number of apparently swapped BACs. Inconsistencies between BESbased and fingerprint map positions identified thirty one plates from the CHORI-240 library that appear to have suffered substantial systematic problems during the end-sequencing of the BACs. No systematic problems were identified in the fingerprinting of the BACs. Analysis of BACs overlapping in the assembly identified a small overrepresentation of clones with substantial overlap in the library and a substantial enrichment of highly overlapping BACs on the same plate in the CHORI-240 library. More than half of these BACs appear to have been present as duplicates on the original BAC-library plates and thus should be avoided in subsequent projects. Conclusion: Our analysis shows that ~95% of the bovine CHORI-240 library clones with both a BAC fingerprint and two BESs mapping to the genome in the expected orientations (~27% of all BACs) have consistent locations in the BAC fingerprint map and the genome assembly. We have developed a broadly applicable methodology for checking the integrity of BAC-based datasets even where only incomplete and partially assembled genomic sequence is available

    Origin, evolution, and biological role of miRNA cluster in DLK-DIO3 genomic region in placental mammals

    No full text
    MicroRNAs (miRNAs) are a rapidly growing family of small regulatory RNAs modulating gene expression in plants and animals. In animals, most of the miRNAs discovered in early studies were found to be evolutionarily conserved across the whole kingdom. More recent studies, however, have identified many miRNAs that are specific to a particular group of organisms or even a single species. These present a question about evolution of the individual miRNAs and their role in establishing and maintaining lineage-specific functions and characteristics.In this study, we describe a detailed analysis of the miRNA cluster (hereafter mir-379/mir-656 cluster) located within the imprinted DLK-DIO3 region on human chromosome 14. We show that orthologous miRNA clusters are present in all sequenced genomes of the placental (eutherian) mammals but not in the marsupial (metatherian), monotreme (prototherian), or any other vertebrate genomes. We provide evidence that the locus encompassing this cluster emerged in an early eutherian ancestor prior to the radiation of modern placental mammals by tandem duplication of the ancient precursor sequence. The original amplified cluster may have contained in excess of 250 miRNA precursor sequences, most of which now appear to be inactive. Examination of the eutherian genomes showed that the cluster has been maintained in evolution for approximately 100 Myr.Analysis of genes that contain predicted evolutionarily conserved targets for miRNAs from this cluster revealed significant overrepresentation of the Gene Ontology terms associated with biological processes such as neurogenesis, embryonic development, transcriptional regulation, and RNA metabolism. Consistent with these findings, a survey of the miRNA expression data within the cluster demonstrates a strong bias toward brain and placenta samples from adult organisms and some embryonic tissues.Our results suggest that emergence of the mir-379/mir-656 miRNA cluster was one of the factors that facilitated evolution of the placental mammals. Overrepresentation of genes involved in regulation of neurogenesis among predicted miRNAs targets indicates an important role of the mir-379/mir-656 cluster in this biological process in the placental mammals

    A multiway analysis for identifying high integrity bovine BACs

    No full text
    Abstract Background In large genomics projects involving many different types of analyses of bacterial artificial chromosomes (BACs), such as fingerprinting, end sequencing (BES) and full BAC sequencing there are many opportunities for the identities of BACs to become confused. However, by comparing the results from the different analyses, inconsistencies can be identified and a set of high integrity BACs preferred for future research can be defined. Results The location of each bovine BAC in the BAC fingerprint-based genome map and in the genome assembly were compared based on the reported BESs, and for a smaller number of BACs the full sequence. BACs with consistent positions in all three datasets, or if the full sequence was not available, for both the fingerprint map and BES-based alignments, were deemed to be correctly positioned. BACs with consistent BES-based and fingerprint-based locations, but with conflicting locations based on the fully sequenced BAC, appeared to have been misidentified during sequencing, and included a number of apparently swapped BACs. Inconsistencies between BES-based and fingerprint map positions identified thirty one plates from the CHORI-240 library that appear to have suffered substantial systematic problems during the end-sequencing of the BACs. No systematic problems were identified in the fingerprinting of the BACs. Analysis of BACs overlapping in the assembly identified a small overrepresentation of clones with substantial overlap in the library and a substantial enrichment of highly overlapping BACs on the same plate in the CHORI-240 library. More than half of these BACs appear to have been present as duplicates on the original BAC-library plates and thus should be avoided in subsequent projects. Conclusion Our analysis shows that ~95% of the bovine CHORI-240 library clones with both a BAC fingerprint and two BESs mapping to the genome in the expected orientations (~27% of all BACs) have consistent locations in the BAC fingerprint map and the genome assembly. We have developed a broadly applicable methodology for checking the integrity of BAC-based datasets even where only incomplete and partially assembled genomic sequence is available.</p

    A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach

    Get PDF
    MicroRNA (miRNA) and other types of small regulatory RNAs play a crucial role in the regulation of gene expression in eukaryotes. Several distinct classes of small regulatory RNAs have been discovered in recent years. To extend the repertoire of small regulatory RNAs characterized in chickens we used a deep sequencing approach developed by Solexa (now Illumina Inc.). We sequenced three small RNA libraries prepared from different developmental stages of the chicken embryo (days five, seven, and nine) to produce over 9.5 million short sequence reads. We developed a bioinformatics pipeline to distinguish authentic mature miRNA sequences from other classes of small RNAs and short RNA fragments represented in the sequencing data. Using this approach we detected almost all of the previously known chicken miRNAs and their respective miRNA* sequences. In addition we discovered 449 new chicken miRNAs including 88 miRNA candidates. Of these, 430 miRNAs appear to be specific to the avian lineage. Another six new miRNAs had evidence of evolutionary conservation in at least one vertebrate species outside of the bird lineage. The remaining 13 putative miRNAs appear to represent chicken orthologs of known vertebrate miRNAs. We discovered 39 additional putative miRNA candidates originating from miRNA generating intronic sequences known as mirtrons
    corecore