47 research outputs found

    Large-scale whole exome sequencing studies identify two genes,CTSL and APOE, associated with lung cancer.

    Get PDF
    Common genetic variants associated with lung cancer have been well studied in the past decade. However, only 12.3% heritability has been explained by these variants. In this study, we investigate the contribution of rare variants (RVs) (minor allele frequency <0.01) to lung cancer through two large whole exome sequencing case-control studies. We first performed gene-based association tests using a novel Bayes Factor statistic in the International Lung Cancer Consortium, the discovery study (European, 1042 cases vs. 881 controls). The top genes identified are further assessed in the UK Biobank (European, 630 cases vs. 172 864 controls), the replication study. After controlling for the false discovery rate, we found two genes, CTSL and APOE, significantly associated with lung cancer in both studies. Single variant tests in UK Biobank identified 4 RVs (3 missense variants) in CTSL and 2 RVs (1 missense variant) in APOE stongly associated with lung cancer (OR between 2.0 and 139.0). The role of these genetic variants in the regulation of CTSL or APOE expression remains unclear. If such a role is established, this could have important therapeutic implications for lung cancer patients

    Bayes Factor Approaches for Region-Based Analysis of Rare Variants from Next Generation Sequencing Studies

    No full text
    The emergence of new high-throughput genotyping technologies, such as Next Generation Sequencing (NGS), allows the study of the human genome at an unprecedented depth and scale. The discovery of germline rare variants (RVs) through NGS is a very challenging issue in the field of human genetics. Since RVs have extremely low frequencies, traditional strategies that analyze one variant at a time are underpowered for detecting associations with RVs. Gene-level or region-based statistics can provide a first step in the analysis of RVs that can lead to further experimental validation. Bayesian analysis is not well developed for RV analysis. Our goal in this thesis is to develop such approaches and show their interests for germline RV analyses in the context of case-control studies. Chapter 1 gives a general overview about NGS data analysis and methods for association tests with RV data. In Chapter 2, we propose a novel region-based statistical approach based on the Bayes Factor (BF) to assess evidence of association between a set of RVs located on the same genomic region and a disease outcome in the context of case-control design. We derive the theoretical null distribution of the BF under our prior setting. Informative priors are introduced using prior evidence of association from a Kolmogorov-Smirnov test statistic. In Chapter 3, we introduce a Bayesian procedure to control the False Discovery Rate (BFDR) in the context of genome-wide inference. We develop a simulation program, sim1000G, to generate RV data similar to the 1,000 genomes sequencing project and assess our BFDR procedure. Our simulation studies show that the new BF statistic outperforms standard methods (SKAT, SKAT-O, Burden test) in case-control studies with moderate sample sizes and is equivalent to them under large sample size scenarios. Chapter 4 concludes this thesis with an extension of the BF approach that integrates individual-level and variant-level covariates by using a Bayesian regression approach and inference based on the Integrated Nested Laplace Approximation (INLA). Finally, the interests of our methodological developments are illustrated throughout the thesis by real data applications to a lung cancer case-control study seeking RV association with known and novel cancer genes.Ph.D

    sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs

    No full text
    Abstract Background Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming. Results To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters. Conclusion Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants

    Factors controlling spatial variation in soil aggregate stability in a semi-humid watershed

    No full text
    Soil aggregate stability (SAS) is a key soil property that affects soil erosion and soil ability to support ecosystem functions. The effects of different environmental factors on SAS are extensively documented. However, the relative importance of the factors that drive variation in SAS at watershed scale is not entirely clear. To investigate the effects of the interactions of environmental variables on spatial variation in SAS, 88 sampling sites were selected across an entire watershed (1.1 km2) on the Chinese Loess Plateau (CLP), from where undisturbed soil samples were collected at the 0-10 and 10-20 cm soil depths. Three indices were used to evaluate the SAS - water-stable aggregates greater than 0.25 mm (WSA>0.25, %), mean weight diameter (MWD, mm) and mean geometric diameter (MGD, mm). The results showed that variation of SAS across the watershed was moderate, with coefficient of variation (CV) of 23.5-38.9 %. From combined Spearman's correlation analysis (r), redundancy analysis (RDA) and structural equation modelling (SEM), it was found that soil intrinsic properties, mainly soil texture and organic carbon content (SOC), were the primary control on SAS variation. Topographic attributes, primarily wetness index (TWI) and altitude, were also important controls on SAS. These controls were either the direct or indirect effect through SOC dynamics, spatial distribution of land use (LUT) or vegetation cover (NDVI). The effect of LUT on SAS was mainly driven by SOC and TWI at the 0-10 cm depth but by NDVI and TWI at the 10-20 cm depth. SAS was positively correlated with sand content and SOC, but negatively correlated with silt content, altitude, TWI and NDVI. For LUT, SAS in the apple orchard was significantly lower than in shrubland and grassland, however, it was comparable with that in forest. Considering the effects of improving soil structure and the related economic cost, natural restoration of grassland was a good choice for preventing soil erosion in the study area. The results of this study could deepen our understanding of the controls on SAS variation and therefore become useful in soil management and vegetation restoration decisions on CLP and other regions with similar conditions

    Factors controlling the spatial variability of soil aggregates and associated organic carbon across a semi-humid watershed

    No full text
    Y Soil aggregates (SA) play crucial roles in soil organic carbon (SOC) sequestration. Different SA fractions contribute differently to the sequestration of SOC. However, few studies have examined the factors controlling SA fractions and associated SOC contents across a watershed. Soil samples were collected at 0-10 cm (surface layer) and 10-20 cm (subsurface layer) from 88 sites across a semi-humid watershed (1.1 km(2)) on the Loess Plateau, China. These samples were separated into macroaggregates (MA), microaggregates (MI), and silt + clay fractions (SC) by wet-sieving, and SOC content of each fraction was determined. The objectives were to: 1) investigate the spatial variability of SA fractions and associated SOC contents as well as their main controls across an entire watershed, and 2) explore the linkages between soil aggregation and SOC sequestration. The bulk and aggregate SOC contents of all SA fractions showed moderate variability, with coefficient of variations of 23.3-31.9%. Geostatistical analysis indicated that the spatial patterns of SA fractions and SOC content varied with aggregate size. From combined Spearman's correlation analysis and structural equation modelling, we found that soil texture was an important control on the spatial variability of all SA fractions and associated SOC contents. Vegetation dynamics and management practices associated with land use were also important controls on MA and MI and their associated SOC contents, especially in the surface layer. However, SC and its associated SOC content were more sensitive to ecohydrological processes related to topography. Among the land uses, grassland had the greatest SOC sequestration potential. The fine roots of herbs can wrap MI in MA and increase SOC content within MA, which is the primary mechanism responsible for SOC sequestration in grasslands. These results indicate that using vegetation with fine root systems for restoration is a good strategy to increase SOC sequestration in this region. (C) 2021 Elsevier B.V. All rights reserved

    A genome wide association study on Newfoundland colorectal cancer patients’ survival outcomes

    No full text
    Abstract Background In this study we performed genome-wide association studies to identify candidate SNPs that may predict the risk of disease outcome in colorectal cancer. Methods Patient cohort consisted of 505 unrelated patients with Caucasian ancestry. Germline DNA samples were genotyped using the Illumina® human Omni-1quad SNP chip. Associations of SNPs with overall and disease free survivals were examined primarily for 431 patients with microsatellite instability-low (MSI-L) or stable (MSS) colorectal tumors using Cox proportional hazards method adjusting for clinical covariates. Bootstrap method was applied for internal validation of results. As exploratory analyses, association analyses for the colon (n = 334) and rectal (n = 171) cancer patients were also performed. Results As a result, there was no SNP that reached the genomewide significance levels (p < 5x10−8) in any of the analyses. A small number of genetic markers (n = 10) showed nominal associations (p <10−6) for MSS/MSI-L, colon, or rectal cancer patient groups. These markers were located in two non-coding RNA genes or intergenic regions and none were amino acid substituting polymorphisms. Bootstrap analysis for the MSS/MSI-L cohort data suggested the robustness of the observed nominal associations. Conclusions Likely due to small number of patients, our study did not identify an acceptable level of association of SNPs with outcome in MSS/MSI-L, colon, or rectal cancer patients. A number of SNPs with sub-optimal p-values were, however, identified; these loci may be promising and examined in other larger-sized patient cohorts

    A Survival Association Study of 102 Polymorphisms Previously Associated with Survival Outcomes in Colorectal Cancer

    No full text
    Several published studies identified associations of a number of polymorphisms with a variety of survival outcomes in colorectal cancer. In this study, we aimed to explore 102 previously reported common genetic polymorphisms and their associations with overall survival (OS) and disease-free survival (DFS) in a colorectal cancer patient cohort from Newfoundland ( = 505). Genotypes were obtained using a genomewide SNP genotyping platform. For each polymorphism, the best possible genetic model was estimated for both overall survival and disease-free survival using a previously published approach. These SNPs were then analyzed under their genetic models by Cox regression method. Correction for multiple comparisons was performed by the False Discovery Rate (FDR) method. Univariate analysis results showed that RRM1-rs12806698, IFNGR1-rs1327474, DDX20-rs197412, and PTGS2-rs5275 polymorphisms were nominally associated with OS or DFS ( &lt; 0.01). In stage-adjusted analysis, the nominal associations of DDX20-rs197412, PTGS2-rs5275, and HSPA5-rs391957 with DFS were detected. However, after FDR correction none of these polymorphisms remained significantly associated with the survival outcomes. We conclude that polymorphisms investigated in this study are not associated with OS or DFS in our colorectal cancer patient cohort

    A Survival Association Study of 102 Polymorphisms Previously Associated with Survival Outcomes in Colorectal Cancer

    Get PDF
    Several published studies identified associations of a number of polymorphisms with a variety of survival outcomes in colorectal cancer. In this study, we aimed to explore 102 previously reported common genetic polymorphisms and their associations with overall survival (OS) and disease-free survival (DFS) in a colorectal cancer patient cohort from Newfoundland (n=505) . Genotypes were obtained using a genomewide SNP genotyping platform. For each polymorphism, the best possible genetic model was estimated for both overall survival and disease-free survival using a previously published approach. These SNPs were then analyzed under their genetic models by Cox regression method. Correction for multiple comparisons was performed by the False Discovery Rate (FDR) method. Univariate analysis results showed that RRM1-rs12806698, IFNGR1-rs1327474, DDX20-rs197412, and PTGS2-rs5275 polymorphisms were nominally associated with OS or DFS (p < 0.01) . In stage-adjusted analysis, the nominal associations of DDX20-rs197412, PTGS2-rs5275, and HSPA5-rs391957 with DFS were detected. However, after FDR correction none of these polymorphisms remained significantly associated with the survival outcomes. We conclude that polymorphisms investigated in this study are not associated with OS or DFS in our colorectal cancer patient cohort
    corecore