48 research outputs found

    A rapid conditional enumeration haplotyping method in pedigrees

    Get PDF
    Haplotyping in pedigrees provides valuable information for genetic studies (e.g., linkage analysis and association study). In order to identify a set of haplotype configurations with the highest likelihoods for a large pedigree with a large number of linked loci, in our previous work, we proposed a conditional enumeration haplotyping method which sets a threshold for the conditional probabilities of the possible ordered genotypes at every unordered individual-marker to delete some ordered genotypes with low conditional probabilities and then eliminate some haplotype configurations with low likelihoods. In this article we present a rapid haplotyping algorithm based on a modification of our previous method by setting an additional threshold for the ratio of the conditional probability of a haplotype configuration to the largest conditional probability of all haplotype configurations in order to eliminate those configurations with relatively low conditional probabilities. The new algorithm is much more efficient than our previous method and the widely used software SimWalk2

    Bayesian QTL mapping using skewed Student-t distributions

    Get PDF
    In most QTL mapping studies, phenotypes are assumed to follow normal distributions. Deviations from this assumption may lead to detection of false positive QTL. To improve the robustness of Bayesian QTL mapping methods, the normal distribution for residuals is replaced with a skewed Student-t distribution. The latter distribution is able to account for both heavy tails and skewness, and both components are each controlled by a single parameter. The Bayesian QTL mapping method using a skewed Student-t distribution is evaluated with simulated data sets under five different scenarios of residual error distributions and QTL effects

    Influence of priors in Bayesian estimation of genetic parameters for multivariate threshold models using Gibbs sampling

    Get PDF
    Simulated data were used to investigate the influence of the choice of priors on estimation of genetic parameters in multivariate threshold models using Gibbs sampling. We simulated additive values, residuals and fixed effects for one continuous trait and liabilities of four binary traits, and QTL effects for one of the liabilities. Within each of four replicates six different datasets were generated which resembled different practical scenarios in horses with respect to number and distribution of animals with trait records and availability of QTL information. (Co)Variance components were estimated using a Bayesian threshold animal model via Gibbs sampling. The Gibbs sampler was implemented with both a flat and a proper prior for the genetic covariance matrix. Convergence problems were encountered in > 50% of flat prior analyses, with indications of potential or near posterior impropriety between about round 10 000 and 100 000. Terminations due to non-positive definite genetic covariance matrix occurred in flat prior analyses of the smallest datasets. Use of a proper prior resulted in improved mixing and convergence of the Gibbs chain. In order to avoid (near) impropriety of posteriors and extremely poorly mixing Gibbs chains, a proper prior should be used for the genetic covariance matrix when implementing the Gibbs sampler

    Differential Protein Expression Analysis Using Stable Isotope Labeling and PQD Linear Ion Trap MS Technology

    Get PDF
    An isotope tags for relative and absolute quantitation (iTRAQ)-based reversed-phase liquid chromatography (RPLC)-tandem mass spectrometry (MS/MS) method was developed for differential protein expression profiling in complex cellular extracts. The estrogen positive MCF-7 cell line, cultured in the presence of 17β-estradiol (E2) and tamoxifen (Tam), was used as a model system. MS analysis was performed with a linear trap quadrupole (LTQ) instrument operated by using pulsed Q dissociation (PQD) detection. Optimization experiments were conducted to maximize the iTRAQ labeling efficiency and the number of quantified proteins. MS data filtering criteria were chosen to result in a false positive identification rate of <4%. The reproducibility of protein identifications was ∼60%–67% between duplicate, and ∼50% among triplicate LC-MS/MS runs, respectively. The run-to-run reproducibility, in terms of relative standard deviations (RSD) of global mean iTRAQ ratios, was better than 10%. The quantitation accuracy improved with the number of peptides used for protein identification. From a total of 530 identified proteins (P < 0.001) in the E2/Tam treated MCF-7 cells, a list of 255 proteins (quantified by at least two peptides) was generated for differential expression analysis. A method was developed for the selection, normalization, and statistical evaluation of such datasets. An approximate ∼2-fold change in protein expression levels was necessary for a protein to be selected as a biomarker candidate. According to this data processing strategy, ∼16 proteins involved in biological processes such as apoptosis, RNA processing/metabolism, DNA replication/transcription/repair, cell proliferation and metastasis, were found to be up- or down-regulated

    StatSeq Systems Genetics Benchmark

    Get PDF
    Description of published synthetic Systems Genetics datasets.The StatSeq benchmark dataset is meant to be used for training and evaluating algorithms and techniques for the inference of networks from systems genetics data. The goal is to comprehend which methodology has the best overall inferring performance, and which eventually performs better under particular conditions (i.e. population size, large or small marker distances, high or low heritability, network size). This short document describes how the data have been generated through SysGenSIM. Detailed information is provided about the construction of the gene networks, the simulation of the genotype and of the gene expression, and the submission and evaluation of the predictions

    Nonparametric Bayesian Variable Selection With Applications to Multiple Quantitative Trait Loci Mapping With Epistasis and Gene–Environment Interaction

    Get PDF
    The joint action of multiple genes is an important source of variation for complex traits and human diseases. However, mapping genes with epistatic effects and gene–environment interactions is a difficult problem because of relatively small sample sizes and very large parameter spaces for quantitative trait locus models that include such interactions. Here we present a nonparametric Bayesian method to map multiple quantitative trait loci (QTL) by considering epistatic and gene–environment interactions. The proposed method is not restricted to pairwise interactions among genes, as is typically done in parametric QTL analysis. Rather than modeling each main and interaction term explicitly, our nonparametric Bayesian method measures the importance of each QTL, irrespective of whether it is mostly due to a main effect or due to some interaction effect(s), via an unspecified function of the genotypes at all candidate QTL. A Gaussian process prior is assigned to this unknown function. In addition to the candidate QTL, nongenetic factors and covariates, such as age, gender, and environmental conditions, can also be included in the unspecified function. The importance of each genetic factor (QTL) and each nongenetic factor/covariate included in the function is estimated by a single hyperparameter, which enters the covariance function and captures any main or interaction effect associated with a given factor/covariate. An initial evaluation of the performance of the proposed method is obtained via analysis of simulated and real data

    Gaussian Process Based Bayesian Semiparametric Quantitative Trait Loci Interval Mapping

    Get PDF
    In linkage analysis, it is often necessary to include covariates such as age or weight to increase power or avoid spurious false positive findings. However, if a covariate term in the model is specified incorrectly (e.g., a quadratic term misspecified as a linear term), then the inclusion of the covariate may adversely affect power and accuracy of the identification of Quantitative Trait Loci (QTL). Furthermore, some covariates may interact with each other in a complicated fashion. We implement semiparametric models for single and multiple QTL mapping. Both mapping methods include an unspecified function of any covariate found or suspected to have a more complex than linear but unknown relationship with the response variable. They also allow for interactions among different covariates. This analysis is performed in a Bayesian inference framework using Markov chain Monte Carlo. The advantages of our methods are demonstrated via extensive simulations and real data analysis

    Transcriptome Prediction Performance Across Machine Learning Models and Diverse Ancestries

    Get PDF
    Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits

    Infection and genotype remodel the entire soybean transcriptome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High throughput methods, such as high density oligonucleotide microarray measurements of mRNA levels, are popular and critical to genome scale analysis and systems biology. However understanding the results of these analyses and in particular understanding the very wide range of levels of transcriptional changes observed is still a significant challenge. Many researchers still use an arbitrary cut off such as two-fold in order to identify changes that may be biologically significant. We have used a very large-scale microarray experiment involving 72 biological replicates to analyze the response of soybean plants to infection by the pathogen <it>Phytophthora sojae </it>and to analyze transcriptional modulation as a result of genotypic variation.</p> <p>Results</p> <p>With the unprecedented level of statistical sensitivity provided by the high degree of replication, we show unambiguously that almost the entire plant genome (97 to 99% of all detectable genes) undergoes transcriptional modulation in response to infection and genetic variation. The majority of the transcriptional differences are less than two-fold in magnitude. We show that low amplitude modulation of gene expression (less than two-fold changes) is highly statistically significant and consistent across biological replicates, even for modulations of less than 20%. Our results are consistent through two different normalization methods and two different statistical analysis procedures.</p> <p>Conclusion</p> <p>Our findings demonstrate that the entire plant genome undergoes transcriptional modulation in response to infection and genetic variation. The pervasive low-magnitude remodeling of the transcriptome may be an integral component of physiological adaptation in soybean, and in all eukaryotes.</p
    corecore