210 research outputs found
Meta-analysis of genome-wide association studies for cattle stature identifies common genes that regulate body size in mammals
Stature is affected by many polymorphisms of small effect in humans1. In contrast, variation in dogs, even within breeds, has been suggested to be largely due to variants in a small number of genes2,3. Here we use data from cattle to compare the genetic architecture of stature to those in humans and dogs. We conducted a meta-analysis for stature using 58,265 cattle from 17 populations with 25.4 million imputed whole-genome sequence variants. Results showed that the genetic architecture of stature in cattle is similar to that in humans, as the lead variants in 163 significantly associated genomic regions (P \u3c 5 × 10−8) explained at most 13.8% of the phenotypic variance. Most of these variants were noncoding, including variants that were also expression quantitative trait loci (eQTLs) and in ChIP–seq peaks. There was significant overlap in loci for stature with humans and dogs, suggesting that a set of common genes regulates body size in mammals
Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences
BACKGROUND: Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. RESULTS: Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9) change/site/year) was approximately half of the overall rate (1.9–2.0 × 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. CONCLUSION: This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies
Predicting live weight of rural African goats using body measurements
The goal of the current study was to develop simple regression-based equations that allow small-scale producers to use simple body measurements to accurately predict live weight of typical African goats. The data used in this study were recorded in five African countries, and was composed of 814 individuals of 40 indigenous breeds or populations and crosses that included 158 males and 656 females. Records included the live weight measured with a hanging scale, linear body measurements, country, breed, owner, and age. Country, breed, age, chest girth, height at withers, body length, and shoulder width had large effects (p76 cm, the prediction model selected that included linear terms for chest girth, body length, shoulder width and height at withers plus a quadratic term for chest girth was selected as the most accurate. When analyzed within country from Uganda and Zimbabwe, animals with chest girth \u3c 55cm the linear model with additional quadratic terms for chest girth and body length was selected. For animals with chest girth 55-75cm the linear model with the added quadratic terms for chest girth and body length was selected for animals from Malawi and Zimbabwe while the linear model with a quadratic term for chest girth was selected for Mozambique, Tanzania and Uganda. For animals with chest girth of \u3e76 cm the linear model with a quadratic term for chest girth was chosen for Tanzania, while for the other countries the linear model with quadratic terms for chest girth and body length was most accurate. In all cases, the models produced smaller mean prediction errors than the BM method
Box–Cox Transformation and Random Regression Models for Fecal egg Count Data
Accurate genetic evaluation of livestock is based on appropriate modeling of phenotypic measurements. In ruminants, fecal egg count (FEC) is commonly used to measure resistance to nematodes. FEC values are not normally distributed and logarithmic transformations have been used in an effort to achieve normality before analysis. However, the transformed data are often still not normally distributed, especially when data are extremely skewed. A series of repeated FEC measurements may provide information about the population dynamics of a group or individual. A total of 6375 FEC measures were obtained for 410 animals between 1992 and 2003 from the Beltsville Agricultural Research Center Angus herd. Original data were transformed using an extension of the Box–Cox transformation to approach normality and to estimate (co)variance components. We also proposed using random regression models (RRM) for genetic and non-genetic studies of FEC. Phenotypes were analyzed using RRM and restricted maximum likelihood. Within the different orders of Legendre polynomials used, those with more parameters (order 4) adjusted FEC data best. Results indicated that the transformation of FEC data utilizing the Box–Cox transformation family was effective in reducing the skewness and kurtosis, and dramatically increased estimates of heritability, and measurements of FEC obtained in the period between 12 and 26 weeks in a 26-week experimental challenge period are genetically correlated
Application of machine learning in SNP discovery
<p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNP) constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML) method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures.</p> <p>Results</p> <p>The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False). The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites) (12 Mb) in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb). SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV) (i.e., fraction of candidate SNP being real) were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes.</p> <p>Conclusion</p> <p>A machine learning (ML) method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study indicate that a trained ML classifier can significantly reduce human intervention and in this case achieved a 5–10 fold enhanced productivity. The optimized feature set and ML framework can also be applied to all polymorphism discovery software. ML support software is written in Perl and can be easily integrated into an existing SNP discovery pipeline.</p
SNP-PHAGE – High throughput SNP discovery pipeline
BACKGROUND: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. RESULTS: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at . CONCLUSION: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers
Variants Within Genes \u3ci\u3eEDIL3\u3c/i\u3e and \u3ci\u3eADGRB3\u3c/i\u3e are Associated With Divergent Fecal Egg Counts in Katahdin Sheep at Weaning
Gastrointestinal nematodes (GIN) pose a severe threat to sheep production worldwide. Anthelmintic drug resistance coupled with growing concern regarding potential environmental effects of drug use have demonstrated the necessity of implementing other methods of GIN control. The aim of this study was to test for genetic variants associated with resistance or susceptibility to GIN in Katahdin sheep to improve the current understanding of the genetic mechanisms responsible for host response to GIN. Linear regression and casecontrol genome-wide association studies were conducted with high-density genotype data and cube-root transformed weaning fecal egg counts (tFEC) of 583 Katahdin sheep. The casecontrol GWAS identified two significant SNPs (P-values 1.49e-08 to 1.01e-08) within introns of the gene adhesion G protein-coupled receptor B3 (ADGRB3) associated with lower fecal egg counts. With linear regression, four significant SNPs (P-values 7.82e-08 to 3.34e-08) were identified within the first intron of the gene EGF-like repeats and discoidin domains 3 (EDIL3). These identified SNPs were in very high linkage disequilibrium (r2 of 0.996–1), and animals with alternate homozygous genotypes had significantly higher median weaning tFEC phenotypes compared to all other genotypes. Significant SNPs were queried through public databases to identify putative transcription factor binding site (TFBS) and potential lncRNA differences between reference and alternate alleles. Changes in TFBS were predicted at two SNPs, and one significant SNPwas found to bewithin a predicted lncRNA sequencewith greater than 90% similarity to a known lncRNA in the bovine genome. The gene EDIL3 has been described in other species for its roles in the inhibition and resolution of inflammation. Potential changes of EDIL3 expression mediated through lncRNA expression and/or transcription factor binding may impact the overall immune response and reduce the ability of Katahdin sheep to control GIN infection. This study lays the foundation for further research of EDIL3 and ADGRB3 towards understanding genetic mechanisms of susceptibility to GIN, and suggests these SNPs may contribute to genetic strategies for improving parasite resistance traits in sheep
Scaling up community-based goat breeding programmes via multi-stakeholder collaboration
Community-based livestock breeding programmes (CBBPs) have emerged as a potential approach to implement sustainable livestock breeding in smallholder systems. In Malawi and Uganda, goat CBBPs were introduced to improve production and productivity of indigenous goats through selective breeding. Scaling up CBBPs have recently received support due to evidence-based results from current implementation and results of CBBPs implemented in other regions of the world. This paper explores strategies for scaling up goat CBBPs in Malawi and Uganda, and documents experiences and lessons learned during implementation of the programme. A number of stakeholders supporting goat-based interventions for improving smallholders’ livelihoods exists. This offers an opportunity for different actors to work together by pooling financial resources and technical expertise for establishment and sustainability of goat CBBPs. Scaling up strategies should be an integral part of the pilot design hence dissemination partners need to be engaged during the design and inception stages of the pilot CBBPs. Creation of self-sustaining CBBPs requires early collaborative programme planning, meaningful investment and long-term concerted and coordinated efforts by collaborating partners. Permanently established actors, like government agencies and research and training institutions, are better placed to coordinate such efforts. The overall goal of the scaling up programme should be creation of a financially sustainable system, in which smallholders are able, on their own, to transact and sustain operations of their local breeding institutions using locally generated revenue/ resources. Since CBBP scaling up is a ‘learning by doing process’, an effective monitoring and evaluation system should be an integral part of the process
Development and Characterization of a High Density SNP Genotyping Assay for Cattle
The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle
- …