867 research outputs found

    Statistical issues and approaches in endophenotype research

    Get PDF
    This special topic comprises 9 papers which were presented at the Symposium. (Chinese Science Bulletin, 2011, v. 56 n. 32 Editorial. doi: 10.1007/s11434-011-4716-4)This journal issue entitled: SPECIAL TOPIC Endophenotype Strategies for the Study of Neuropsychiatric DisordersThe endophenotype concept was initially proposed to enhance the power of genetic studies of complex disorders. It is closely related to the genetic component in a liability-threshold model; a perfect endophenotype should have a correlation of 1 with the genetic component of the liability to disease. In reality, a putative endophenotype is unlikely to be a perfect representation of the genetic component of disease liability. The magnitude of the correlation between a putative endophenotype and the genetic component of disease liability can be estimated by fitting multivariate genetic models to twin data. A number of statistical methods have been developed for incorporating endophenotypes in genetic linkage and association analyses with the aim of improving statistical power. The most recent of such methods can handle multiple endophenotypes simultaneously for the greatest increase in power. In addition to increasing statistical power, endophenotype research plays an important role in helping to understand the mechanisms which connect the associated genetic variants with disease occurrence. Novel statistical approaches may be required for the analysis of the complex relationships between endophenotypes at different levels and how they converge to cause the occurrence of disease. © 2011 Science China Press and Springer-Verlag Berlin Heidelberg.published_or_final_versio

    FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution

    Get PDF
    Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10-9) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. © The Author(s) 2010. Published by Oxford University Press.published_or_final_versio

    A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection

    Get PDF
    Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias. © The Author(s) 2011.published_or_final_versionSpringer Open Choice, 21 Feb 201

    A knowledge-based weighting framework to boost the power of genome-wide association studies

    Get PDF
    Background: We are moving to second-wave analysis of genome-wide association studies (GWAS), characterized by comprehensive bioinformatical and statistical evaluation of genetic associations. Existing biological knowledge is very valuable for GWAS, which may help improve their detection power particularly for disease susceptibility loci of moderate effect size. However, a challenging question is how to utilize available resources that are very heterogeneous to quantitatively evaluate the statistic significances. Methodology/Principal Findings: We present a novel knowledge-based weighting framework to boost power of the GWAS and insightfully strengthen their explorative performance for follow-up replication and deep sequencing. Built upon diverse integrated biological knowledge, this framework directly models both the prior functional information and the association significances emerging from GWAS to optimally highlight single nucleotide polymorphisms (SNPs) for subsequent replication. In the theoretical calculation and computer simulation, it shows great potential to achieve extra over 15% power to identify an association signal of moderate strength or to use hundreds of whole-genome subjects fewer to approach similar power. In a case study on late-onset Alzheimer disease (LOAD) for a proof of principle, it highlighted some genes, which showed positive association with LOAD in previous independent studies, and two important LOAD related pathways. These genes and pathways could be originally ignored due to involved SNPs only having moderate association significance. Conclusions/Significance: With user-friendly implementation in an open-source Java package, this powerful framework will provide an important complementary solution to identify more true susceptibility loci with modest or even small effect size in current GWAS for complex diseases. © 2010 Li et al.published_or_final_versio

    A novel two stage approach for causal effect estimation based on predictions of the response variable

    Get PDF
    The Abstracts Book's website is located at https://sites.google.com/site/emgm2013/EMGM%202013%20Abstract%20Book-5.pdf?attredirects=0Session Risk prediction and complex phenotypes: no. A31Research builds upon pre-existing knowledge by investigating the relationship between novel variables and variables whose inter-relationships are well understood. This scenario can often be described as a complex model of known structure incorporating latent and manifest variables. In parts of the model relating to well understood variables, parameter values might be well estimated as a result of previous research. For the rest of the model the parameters are unknown. The research question can be framed as the estimation of these unknown model parameters. Traditionally this would be add...postprin

    Improving polygenic risk prediction from summary statistics by an empirical Bayes approach

    Get PDF
    Polygenic risk scores (PRS) from genome-wide association studies (GWAS) are increasingly used to predict disease risks. However some included variants could be false positives and the raw estimates of effect sizes from them may be subject to selection bias. In addition, the standard PRS approach requires testing over a range of p-value thresholds, which are often chosen arbitrarily. The prediction error estimated from the optimized threshold may also be subject to an optimistic bias. To improve genomic risk prediction, we proposed new empirical Bayes approaches to recover the underlying effect sizes and used them as weights to construct PRS. We applied the new PRS to twelve cardio-metabolic traits in the Northern Finland Birth Cohort and demonstrated improvements in predictive power (in R2) when compared to standard PRS at the best p-value threshold. Importantly, for eleven out of the twelve traits studied, the predictive performance from the entire set of genome-wide markers outperformed the best R2 from standard PRS at optimal p-value thresholds. Our proposed methodology essentially enables an automatic PRS weighting scheme without the need of choosing tuning parameters. The new method also performed satisfactorily in simulations. It is computationally simple and does not require assumptions on the effect size distributions. Improving polygenic risk prediction from summary statistics by an empirical Bayes approach. Available from: https://www.researchgate.net/publication/313258278_Improving_polygenic_risk_prediction_from_summary_statistics_by_an_empirical_Bayes_approach [accessed Sep 29, 2017].published_or_final_versio

    Genetics of Lumbar Disk Degeneration: Technology, Study Designs, and Risk Factors

    Get PDF
    Lumbar disk degeneration (LDD) is a common musculoskeletal condition. Genetic risk factors have been suggested to play a major role in its cause. This article reviews the main research strategies that have been used to study the genetics of LDD, and the genes that thus far have been identified to influence susceptibility to LDD. With the rapid progress in genomic technologies, further advances in the genetics of LDD are expected in the next few years. © 2011 Elsevier Inc.postprin

    Alterations in Gastric Microbiota After H. Pylori Eradication and in Different Histological Stages of Gastric Carcinogenesis

    Get PDF
    The role of bacteria other than Helicobacter pylori (HP) in the stomach remains elusive. We characterized the gastric microbiota in individuals with different histological stages of gastric carcinogenesis and after receiving HP eradication therapy. Endoscopic gastric biopsies were obtained from subjects with HP gastritis, gastric intestinal metaplasia (IM), gastric cancer (GC) and HP negative controls. Gastric microbiota was characterized by Illumina MiSeq platform targeting the 16 S rDNA. Apart from dominant H. pylori, we observed other Proteobacteria including Haemophilus, Serratia, Neisseria and Stenotrophomonas as the major components of the human gastric microbiota. Although samples were largely converged according to the relative abundance of HP, a clear separation of GC and other samples was recovered. Whilst there was a strong inverse association between HP relative abundance and bacterial diversity, this association was weak in GC samples which tended to have lower bacterial diversity compared with other samples with similar HP levels. Eradication of HP resulted in an increase in bacterial diversity and restoration of the relative abundance of other bacteria to levels similar to individuals without HP. In conclusion, HP colonization results in alterations of gastric microbiota and reduction in bacterial diversity, which could be restored by antibiotic treatment.published_or_final_versio

    dbPSHP: a database of recent positive selection across human populations

    Get PDF
    The dbPSHP database (http://jjwanglab.org/dbpshp) aims to help researchers to efficiently identify, validate and visualize putative positively selected loci in human evolution and further discover the mechanism governing these natural selections. Recent evolution of human populations at the genomic level reflects the adaptations to the living environments, including climate change and availability and stability of nutrients. Many genetic regions under positive selection have been identified, which assist us to understand how natural selection has shaped population differences. Here, we manually collect recent positive selections in different human populations, consisting of 15,472 loci from 132 publications. We further compiled a database that used 15 statistical terms of different evolutionary attributes for single nucleotide variant sites from the HapMap 3 and 1000 Genomes Project to identify putative regions under positive selection. These attributes include variant allele/genotype properties, variant heterozygosity, within population diversity, long-range haplotypes, pairwise population differentiation and evolutionary conservation. We also provide interactive pages for visualization and annotation of different selective signals. The database is freely available to the public and will be frequently updated.published_or_final_versio

    Using glycosylated haemoglobin to define the metabolic syndrome in adults in the United States

    Get PDF
    Introduction: Recently, the American Diabetes Association has proposed the use of glycosylated haemoglobin (GHb) in the definition of diabetes and the category of increased diabetes risk. We therefore investigated whether GHb can be used instead of fasting plasma glucose in identifying individuals with the metabolic syndrome, which is associated with increased risk of cardiovascular diseases. Methods: Participants of the US National Health and Nutrition Examination Survey (NHANES) 1999-2006 who had fasting blood glucose were included (n=3551 in 1999-2002 and n=3412 in 2003-2006). The metabolic syndrome was defined using International Diabetes Federation criteria in 2009. Raised blood glucose was defined either as fasting glucose ≥100 mg/dL (5.6 mmol/L), or as GHb ≥5.7%. Results: In 2003-2006, there was 91.3% agreement between GHb and fasting glucose when either is used to define the metabolic syndrome, although the use of GHb slightly lowered the syndrome’s prevalence (34.8% vs 38.8%, P=0.012). The agreement was good (≥87%) irrespective of age, sex, race/ethnicity and body mass index. Only 2.3% of the sample population had the metabolic syndrome defined using GHb but not using fasting glucose. The syndrome, defined using GHb alone, was associated with cardiovascular diseases (ischaemic heart disease, heart failure or stroke) [OR=1.95, P=0.002]. Similar results were found in 1999-2002. Conclusions: Using GHb instead of fasting glucose to define the metabolic syndrome is feasible. The syndrome defined in this way also identifies individuals with increased cardiovascular risk.published_or_final_versio
    corecore