25 research outputs found

    Quality control of the sheep bacterial artificial chromosome library, CHORI-243

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The sheep CHORI-243 bacterial artificial chromosome (BAC) library is being used in the construction of the virtual sheep genome, the sequencing and construction of the actual sheep genome assembly and as a source of DNA for regions of the genome of biological interest. The objective of our study is to assess the integrity of the clones and plates which make up the CHORI-243 library using the virtual sheep genome.</p> <p>Findings</p> <p>A series of analyses were undertaken based on the mapping the sheep BAC-end sequences (BESs) to the virtual sheep genome. Overall, very few plate specific biases were identified, with only three of the 528 plates in the library significantly affected. The analysis of the number of tail-to-tail (concordant) BACs on the plates identified a number of plates with lower than average numbers of such BACs. For plates 198 and 213 a partial swap of the BESs determined with one of the two primers appear to have occurred. A third plate, 341, also with a significant deficit in tail-to-tail BACs, appeared to contain a substantial number of sequences determined from contaminating eubacterial 16 S rRNA DNA. Additionally a small number of eubacterial 16 S rRNA DNA sequences were present on two other plates, 111 and 338, in the library.</p> <p>Conclusions</p> <p>The comparative genomic approach can be used to assess BAC library integrity in the absence of fingerprinting. The sequences of the sheep CHORI-243 library BACs have high integrity, especially with the corrections detailed above. The library represents a high quality resource for use by the sheep genomics community.</p

    Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs <it>de novo</it>, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters.</p> <p>Results</p> <p>Unlike two single fragment reads, in paired-end sequence reads, such as BAC-end sequences, the two sequences in the pair have a known positional relationship in the original genome. This provides an additional level of confidence over match scores and e-values in the accuracy of the positional assignment of the reads in the comparative genome. Three commonly used sequence alignment programs: MegaBLAST, Blastz and PatternHunter were used to align a set of ovine BAC-end sequences against the equine genome assembly. A range of different search parameters, with a particular focus on contiguous and discontiguous seeds, were used for each program. The number of reads with a hit and the number of read pairs with hits for the two end sequences in the tail-to-tail paired-end configuration were plotted relative to the theoretical maximum expected curve. Of the programs tested, MegaBLAST with short contiguous seed lengths (word size 8-11) performed best in this particular task. In addition the data also provides estimates of the false positive and false negative rates, which can be used to determine the appropriate values of additional parameters, such as score cut-off, to balance sensitivity and specificity. To determine whether the approach also worked for the alignment of shorter reads, the first 240 bases of each BAC end sequence were also aligned to the equine genome. Again, contiguous MegaBLAST performed the best in optimising the sensitivity and specificity with which sheep BAC end reads map to the equine and bovine genomes.</p> <p>Conclusions</p> <p>Paired-end reads, such as BAC-end sequences, provide an efficient mechanism to optimise sequence alignment parameters, for example for comparative genome assemblies, by providing an objective standard to evaluate performance.</p

    Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping

    Get PDF
    Peer reviewe

    Detecting Signatures of Selection within the Dog Genome

    No full text
    Deciphering the genetic basis of phenotypic diversity is one of the central aims of biological research. Domestic animals provide a unique opportunity for making substantial progress towards this goal. Intense positive selection has lead to a rich reservoir of phenotypes and underlying genotypes that can be interrogated using genetic tools to gain insight into the genetic basis of phenotypic diversity. The dog is the most phenotypically diverse mammal. It was domesticated from the grey wolf 11-30,000 years ago. After domestication, a period of intense breeding has lead to the massive phenotypic diversity seen amongst dog breeds today. These two phases of strong positive selection at domestication and at breed creation are likely to have left their signature on the genome. In this thesis, we have analysed genome-wide patterns to detect genomic regions involved in selection in both of these phases. We used whole genome sequences from 60 dogs and 12 wolves, to detect dog domestication selective sweeps. We find evidence for genes involved in memory formation, neurotransmission and starch digestion. To decipher the genetic signals underlying breed diversity, we used genome-wide genotype data from &gt;170,000 SNPs in 509 dogs from 46 different breeds. We find evidence for genes under selection in many breeds, and only a few breeds. In addition, we identify novel sweeps underlying morphology and behavior. Recombination can influence the configuration of alleles present on a haplotype, and can thus increase or decrease the efficiency of selection. The PRDM9 protein has been shown to be important for determining recombination hotspot locations in humans and other mammals, but of all the mammals studied so far the dog is the only one to have a non-functional PRDM9. We used the genome-wide genotype data described above to characterise the fine scale recombination map in dogs. We find that recombination hotspots exist in dogs despite the absence of PRDM9. Moreover, we show that these hotspots are enriched for GC rich peaks and that these peaks are getting stronger over time. Our results show that the absence of PRDM9 has lead to the stabilisation of the recombination landscape in dogs.

    Web-based tools for the visualisation of over-represented components of the genetic regulatory network in microarray datasets

    No full text
    The advent of genome-wide high-throughput techniques has produced vast amounts of data that provide snapshots of cellular responses to change. The development of computational tools for the analyses and graphical representation of these data is a major challenge. For instance the development of DNA microarrays techniques has provided the faculty to study many aspects of gene regulation underlying cellular responses. Genes are regulated by transcription factor (TF) interactions at gene promoter regions and the genome-wide interaction of these constitutes the Genetic Regulatory Network (GRN). We are developing web-based bioinformatic tools for the analysis of user defined gene expression data that display computer generated graphics of the GRN underlying the cellular response. Users may also compare two data-sets of differing expression profiles (for instance up- and down-regulated genes) to generate graphics that represent how components of the GRN underlie genes with different expression profiles. The GRN graphic may be grouped according to a range of biological categories and choice of colour and shape provide an information rich display that cannot be achieved through text based representation alone. This provides investigators with an immediate overview of components of the GRN allowing further hypothesis to be generated

    Protein-protein interactions uncover candidate 'core genes' within omnigenic disease networks

    No full text
    Genome wide association studies (GWAS) of human diseases have generally identified many loci associated with risk with relatively small effect sizes. The omnigenic model attempts to explain this observation by suggesting that diseases can be thought of as networks, where genes with direct involvement in disease-relevant biological pathways are named 'core genes', while peripheral genes influence disease risk via their interactions or regulatory effects on core genes. Here, we demonstrate a method for identifying candidate core genes solely from genes in or near disease-associated SNPs (GWAS hits) in conjunction with protein-protein interaction network data. Applied to 1,381 GWAS studies from 5 ancestries, we identify a total of 1,865 candidate core genes in 343 GWAS studies. Our analysis identifies several well-known disease-related genes that are not identified by GWAS, including BRCA1 in Breast Cancer, Amyloid Precursor Protein (APP) in Alzheimer's Disease, INS in A1C measurement and Type 2 Diabetes, and PCSK9 in LDL cholesterol, amongst others. Notably candidate core genes are preferentially enriched for disease relevance over GWAS hits and are enriched for both Clinvar pathogenic variants and known drug targets-consistent with the predictions of the omnigenic model. We subsequently use parent term annotations provided by the GWAS catalog, to merge related GWAS studies and identify candidate core genes in over-arching disease processes such as cancer-where we identify 109 candidate core genes
    corecore