91 research outputs found

    Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Get PDF
    BACKGROUND: Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. RESULTS: Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9) change/site/year) was approximately half of the overall rate (1.9–2.0 × 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. CONCLUSION: This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies

    Application of machine learning in SNP discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNP) constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML) method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures.</p> <p>Results</p> <p>The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False). The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites) (12 Mb) in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb). SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV) (i.e., fraction of candidate SNP being real) were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes.</p> <p>Conclusion</p> <p>A machine learning (ML) method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study indicate that a trained ML classifier can significantly reduce human intervention and in this case achieved a 5–10 fold enhanced productivity. The optimized feature set and ML framework can also be applied to all polymorphism discovery software. ML support software is written in Perl and can be easily integrated into an existing SNP discovery pipeline.</p

    SNP-PHAGE – High throughput SNP discovery pipeline

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. RESULTS: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at . CONCLUSION: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers

    A Limousin Specific Myostatin Allele Affects Longissimus Muscle Area and Fatty Acid Profiles in a Wagyu-Limousin F2 Population

    Get PDF
    A microsatellite-based genome scan of a Wagyu x Limousin F(2) cross population previously demonstrated QTL affecting LM area and fatty acid composition were present in regions near the centromere of BTA2. In this study, we used 70 SNP markers to examine the centromeric 24 megabases (Mb) of BTA2, including the Limousin-specific F94L myostatin allele (AB076403.1; 415C \u3e A) located at approximately 6 Mb on the draft genome sequence of BTA2. A significant effect of the F94L marker was observed (F = 60.17) for LM area, which indicated that myostatin is most likely responsible for the effect. This is consistent with previous reports that the substitution of Leu for Phe at AA 94 of myostatin (caused by the 415C \u3e A transversion) is associated with increased muscle growth. Surprisingly, several fatty acid trait QTL, which affected the amount of unsaturated fats, also mapped to or very near the myostatin marker, including the ratio of C16:1 MUFA to C16:0 saturated fat (F = 16.72), C18:1 to C18:0 (F = 18.88), and total content of MUFA (F = 17.12). In addition, QTL for extent of marbling (F = 14.73) approached significance (P = 0.05), and CLA concentration (F = 9.22) was marginally significant (P = 0.18). We also observed associations of SNP located at 16.3 Mb with KPH (F = 15.00) and for the amount of SFA (F = 12.01). These results provide insight into genetic differences between the Wagyu and Limousin breeds and may lead to a better tasting and healthier product for consumers through improved selection for lipid content of beef

    Development and Characterization of a High Density SNP Genotyping Assay for Cattle

    Get PDF
    The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle

    MicroRNA transcriptome profiles during swine skeletal muscle development

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNA (miR) are a class of small RNAs that regulate gene expression by inhibiting translation of protein encoding transcripts. To evaluate the role of miR in skeletal muscle of swine, global microRNA abundance was measured at specific developmental stages including proliferating satellite cells, three stages of fetal growth, day-old neonate, and the adult.</p> <p>Results</p> <p>Twelve potential novel miR were detected that did not match previously reported sequences. In addition, a number of miR previously reported to be expressed in mammalian muscle were detected, having a variety of abundance patterns through muscle development. Muscle-specific miR-206 was nearly absent in proliferating satellite cells in culture, but was the highest abundant miR at other time points evaluated. In addition, miR-1 was moderately abundant throughout developmental stages with highest abundance in the adult. In contrast, miR-133 was moderately abundant in adult muscle and either not detectable or lowly abundant throughout fetal and neonate development. Changes in abundance of ubiquitously expressed miR were also observed. MiR-432 abundance was highest at the earliest stage of fetal development tested (60 day-old fetus) and decreased throughout development to the adult. Conversely, miR-24 and miR-27 exhibited greatest abundance in proliferating satellite cells and the adult, while abundance of miR-368, miR-376, and miR-423-5p was greatest in the neonate.</p> <p>Conclusion</p> <p>These data present a complete set of transcriptome profiles to evaluate miR abundance at specific stages of skeletal muscle growth in swine. Identification of these miR provides an initial group of miR that may play a vital role in muscle development and growth.</p

    A High Density Integrated Genetic Linkage Map of Soybean and the Development of a 1536 Universal Soy Linkage Panel for Quantitative Trait Locus Mapping

    Get PDF
    Single nucleotide polymorphisms (SNPs) are the marker of choice for many researchers due to their abundance and the high-throughput methods available for their multiplex analysis. Only recently have SNP markers been available to researchers in soybean [Glycine max (L.) Merr.] with the release of the third version of the consensus genetic linkage map that added 1141 SNP markers to the map. Our objectives were to add 2500 additional SNP markers to the soybean integrated map and select a set of 1536 SNPs to create a universal linkage panel for high-throughput soybean quantitative trait locus (QTL) mapping. The GoldenGate assay is one high-throughput analysis method capable of genotyping 1536 SNPs in 192 DNA samples over a 3-d period. We designed GoldenGate assays for 3456 SNPs (2956 new plus 500 previously mapped) which were used to screen three recombinant inbred line populations and diverse germplasm. A total of 3000 workable assays were obtained which added about 2500 new SNP markers to create a fourth version of the soybean integrated linkage map. To create a “Universal Soy Linkage Panel” (USLP 1.0) of 1536 SNP loci, SNPs were selected based on even distribution throughout each of the 20 consensus linkage groups and to have a broad range of allele frequencies in diverse germplasm. The 1536 USLP 1.0 will be able to quickly create a comprehensive genetic map in most QTL mapping populations and thus will serve as a useful tool for high-throughput QTL mapping

    Análise de associação por todo o genoma para identificar locos relacionados ao lucro líquido, à vida produtiva e ao escore de células somáticas na raça Jersey.

    Get PDF
    Foi realizada uma varredura por todo o genoma de animais da raça Jersey, nos EUA, utilizando marcadores do tipo SNP, visando identificar QTL associados ao lucro líquido, à vida produtiva e ao escore de células somáticas. Os dados usados neste estudo foram provenientes do Animal Improvement Programs Laboratory, USDA/EUA. Amostras de DNA coletadas de 2.380 animais da raça Jersey e as habilidades preditas de transmissão de 2.081 animais, publicadas em fevereiro/2009 (http://greenbook.usjersey.com/), foram utilizadas nas análises. Para a genotipagem dos SNPs foi usado o BovineSNP50 BeadChip da Illumina, com aproximadamente 54.000 SNPs. Todo SNP com call rate (<99%), em desequilíbrio de Hardy-Weinberg (teste exato p<0,01) e com frequência de um dos alelos menor do que 5% foram excluídos das análises finais (30.342 SNP usados). P-values corrigidos pelo teste de Bonferroni iguais a 0,01 foram usados. Para todas as características, SNP significativos foram encontrados em vários cromossomos, especialmente nos BTAs 3, 4, 5, 6, 10, 12, 23 e 25. Os resultados sugerem que as soluções para os efeitos dos marcadores nas avaliações genômicas podem identificar regiões cromossômicas que necessitem ser melhor estudadas

    An assessment of population structure in eight breeds of cattle using a whole genome SNP panel

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. Previously, these studies have used a low density of microsatellite markers, however, with the large number of single nucleotide polymorphism markers that are now available, it is possible to perform genome wide population genetic analyses in cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among eight cattle breeds sampled from <it>Bos indicus </it>and <it>Bos taurus</it>.</p> <p>Results</p> <p>Two thousand six hundred and forty one single nucleotide polymorphisms (SNPs) spanning all of the bovine autosomal genome were genotyped in Angus, Brahman, Charolais, Dutch Black and White Dairy, Holstein, Japanese Black, Limousin and Nelore cattle. Population structure was examined using the linkage model in the program STRUCTURE and Fst estimates were used to construct a neighbor-joining tree to represent the phylogenetic relationship among these breeds.</p> <p>Conclusion</p> <p>The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. The greatest level of genetic differentiation was detected between the <it>Bos taurus </it>and <it>Bos indicus </it>breeds. When the <it>Bos indicus </it>breeds were excluded from the analysis, genetic differences among beef versus dairy and European versus Asian breeds were detected among the <it>Bos taurus </it>breeds. Exploration of the number of SNP loci required to differentiate between breeds showed that for 100 SNP loci, individuals could only be correctly clustered into breeds 50% of the time, thus a large number of SNP markers are required to replace the 30 microsatellite markers that are currently commonly used in genetic diversity studies.</p
    corecore