102 research outputs found

    Quality control of the sheep bacterial artificial chromosome library, CHORI-243

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The sheep CHORI-243 bacterial artificial chromosome (BAC) library is being used in the construction of the virtual sheep genome, the sequencing and construction of the actual sheep genome assembly and as a source of DNA for regions of the genome of biological interest. The objective of our study is to assess the integrity of the clones and plates which make up the CHORI-243 library using the virtual sheep genome.</p> <p>Findings</p> <p>A series of analyses were undertaken based on the mapping the sheep BAC-end sequences (BESs) to the virtual sheep genome. Overall, very few plate specific biases were identified, with only three of the 528 plates in the library significantly affected. The analysis of the number of tail-to-tail (concordant) BACs on the plates identified a number of plates with lower than average numbers of such BACs. For plates 198 and 213 a partial swap of the BESs determined with one of the two primers appear to have occurred. A third plate, 341, also with a significant deficit in tail-to-tail BACs, appeared to contain a substantial number of sequences determined from contaminating eubacterial 16 S rRNA DNA. Additionally a small number of eubacterial 16 S rRNA DNA sequences were present on two other plates, 111 and 338, in the library.</p> <p>Conclusions</p> <p>The comparative genomic approach can be used to assess BAC library integrity in the absence of fingerprinting. The sequences of the sheep CHORI-243 library BACs have high integrity, especially with the corrections detailed above. The library represents a high quality resource for use by the sheep genomics community.</p

    Mining tissue specificity, gene connectivity and disease association to reveal a set of genes that modify the action of disease causing genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The tissue specificity of gene expression has been linked to a number of significant outcomes including level of expression, and differential rates of polymorphism, evolution and disease association. Recent studies have also shown the importance of exploring differential gene connectivity and sequence conservation in the identification of disease-associated genes. However, no study relates gene interactions with tissue specificity and disease association.</p> <p>Methods</p> <p>We adopted an <it>a priori </it>approach making as few assumptions as possible to analyse the interplay among gene-gene interactions with tissue specificity and its subsequent likelihood of association with disease. We mined three large datasets comprising expression data drawn from massively parallel signature sequencing across 32 tissues, describing a set of 55,606 true positive interactions for 7,197 genes, and microarray expression results generated during the profiling of systemic inflammation, from which 126,543 interactions among 7,090 genes were reported.</p> <p>Results</p> <p>Amongst the myriad of complex relationships identified between expression, disease, connectivity and tissue specificity, some interesting patterns emerged. These include elevated rates of expression and network connectivity in housekeeping and disease-associated tissue-specific genes. We found that disease-associated genes are more likely to show tissue specific expression and most frequently interact with other disease genes. Using the thresholds defined in these observations, we develop a guilt-by-association algorithm and discover a group of 112 non-disease annotated genes that predominantly interact with disease-associated genes, impacting on disease outcomes.</p> <p>Conclusion</p> <p>We conclude that parameters such as tissue specificity and network connectivity can be used in combination to identify a group of genes, not previously confirmed as disease causing, that are involved in interactions with disease causing genes. Our guilt-by-association algorithm should be useful for the discovery of additional modifiers of genetic diseases, and more generally, for the ability to associate genes of unknown function to clusters of genes with defined functions allowing for novel biological inference that can be subsequently validated.</p

    Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs <it>de novo</it>, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters.</p> <p>Results</p> <p>Unlike two single fragment reads, in paired-end sequence reads, such as BAC-end sequences, the two sequences in the pair have a known positional relationship in the original genome. This provides an additional level of confidence over match scores and e-values in the accuracy of the positional assignment of the reads in the comparative genome. Three commonly used sequence alignment programs: MegaBLAST, Blastz and PatternHunter were used to align a set of ovine BAC-end sequences against the equine genome assembly. A range of different search parameters, with a particular focus on contiguous and discontiguous seeds, were used for each program. The number of reads with a hit and the number of read pairs with hits for the two end sequences in the tail-to-tail paired-end configuration were plotted relative to the theoretical maximum expected curve. Of the programs tested, MegaBLAST with short contiguous seed lengths (word size 8-11) performed best in this particular task. In addition the data also provides estimates of the false positive and false negative rates, which can be used to determine the appropriate values of additional parameters, such as score cut-off, to balance sensitivity and specificity. To determine whether the approach also worked for the alignment of shorter reads, the first 240 bases of each BAC end sequence were also aligned to the equine genome. Again, contiguous MegaBLAST performed the best in optimising the sensitivity and specificity with which sheep BAC end reads map to the equine and bovine genomes.</p> <p>Conclusions</p> <p>Paired-end reads, such as BAC-end sequences, provide an efficient mechanism to optimise sequence alignment parameters, for example for comparative genome assemblies, by providing an objective standard to evaluate performance.</p

    Analysis of the complement and molecular evolution of tRNA genes in cow

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Detailed information regarding the number and organization of transfer RNA (tRNA) genes at the genome level is becoming readily available with the increase of DNA sequencing of whole genomes. However the identification of functional tRNA genes is challenging for species that have large numbers of repetitive elements containing tRNA derived sequences, such as <it>Bos taurus</it>. Reliable identification and annotation of entire sets of tRNA genes allows the evolution of tRNA genes to be understood on a genomic scale.</p> <p>Results</p> <p>In this study, we explored the <it>B. taurus </it>genome using bioinformatics and comparative genomics approaches to catalogue and analyze cow tRNA genes. The initial analysis of the cow genome using tRNAscan-SE identified 31,868 putative tRNA genes and 189,183 pseudogenes, where 28,830 of the 31,868 predicted tRNA genes were classified as repetitive elements by the RepeatMasker program. We then used comparative genomics to further discriminate between functional tRNA genes and tRNA-derived sequences for the remaining set of 3,038 putative tRNA genes. For our analysis, we used the human, chimpanzee, mouse, rat, horse, dog, chicken and fugu genomes to predict that the number of active tRNA genes in cow lies in the vicinity of 439. Of this set, 150 tRNA genes were 100% identical in their sequences across all nine vertebrate genomes studied. Using clustering analyses, we identified a new tRNA-Gly<sup>CCC </sup>subfamily present in all analyzed mammalian genomes. We suggest that this subfamily originated from an ancestral tRNA-Gly<sup>GCC </sup>gene via a point mutation prior to the radiation of the mammalian lineages. Lastly, in a separate analysis we created phylogenetic profiles for each putative cow tRNA gene using a representative set of genomes to gain an overview of common evolutionary histories of tRNA genes.</p> <p>Conclusion</p> <p>The use of a combination of bioinformatics and comparative genomics approaches has allowed the confident identification of a set of cow tRNA genes that will facilitate further studies in understanding the molecular evolution of cow tRNA genes.</p

    Characterisation and application of a bovine U6 promoter for expression of short hairpin RNAs

    Get PDF
    BackgroundThe use of small interfering RNA (siRNA) molecules in animals to achieve double-stranded RNA-mediated interference (RNAi) has recently emerged as a powerful method of sequence-specific gene knockdown. As DNA-based expression of short hairpin RNA (shRNA) for RNAi may offer some advantages over chemical and in vitro synthesised siRNA, a number of vectors for expression of shRNA have been developed. These often feature polymerase III (pol. III) promoters of either mouse or human origin.ResultsTo develop a shRNA expression vector specifically for bovine RNAi applications, we identified and characterised a novel bovine U6 small nuclear RNA (snRNA) promoter from bovine sequence data. This promoter is the putative bovine homologue of the human U6-8 snRNA promoter, and features a number of functional sequence elements that are characteristic of these types of pol. III promoters. A PCR based cloning strategy was used to incorporate this promoter sequence into plasmid vectors along with shRNA sequences for RNAi. The promoter was then used to express shRNAs, which resulted in the efficient knockdown of an exogenous reporter gene and an endogenous bovine gene.ConclusionWe have mined data from the bovine genome sequencing project to identify a functional bovine U6 promoter and used the promoter sequence to construct a shRNA expression vector. The use of this native bovine promoter in shRNA expression is an important component of our future development of RNAi therapeutic and transgenic applications in bovine species.<br /

    An Always Correlated gene expression landscape for ovine skeletal muscle, lessons learnt from comparison with an “equivalent” bovine landscape

    Get PDF
    BACKGROUND: We have recently described a method for the construction of an informative gene expression correlation landscape for a single tissue, longissimus muscle (LM) of cattle, using a small number (less than a hundred) of diverse samples. Does this approach facilitate interspecies comparison of networks? FINDINGS: Using gene expression datasets from LM samples from a single postnatal time point for high and low muscling sheep, and from a developmental time course (prenatal to postnatal) for normal sheep and sheep exhibiting the Callipyge muscling phenotype gene expression correlations were calculated across subsets of the data comparable to the bovine analysis. An “Always Correlated” gene expression landscape was constructed by integrating the correlations from the subsets of data and was compared to the equivalent landscape for bovine LM muscle. Whilst at the high level apparently equivalent modules were identified in the two species, at the detailed level overlap between genes in the equivalent modules was limited and generally not significant. Indeed, only 395 genes and 18 edges were in common between the two landscapes. CONCLUSIONS: Since it is unlikely that the equivalent muscles of two closely related species are as different as this analysis suggests, within tissue gene expression correlations appear to be very sensitive to the samples chosen for their construction, compounded by the different platforms used. Thus users need to be very cautious in interpretation of the differences. In future experiments, attention will be required to ensure equivalent experimental designs and use cross-species gene expression platform to enable the identification of true differences between different species

    A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds

    Get PDF
    The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability
    corecore