79 research outputs found

    Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks

    Get PDF
    [[abstract]]With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance.[[notice]]補正完

    Determining Population Stratification and Subgroup effects in Association Studies of Rare Genetic Variants for Nicotine Dependence

    Get PDF
    [[abstract]]Background Rare variants (minor allele frequency < 1% or 5 %) can help researchers to deal with the confounding issue of ‘missing heritability’ and have a proven role in dissecting the etiology for human diseases and complex traits. Methods We extended the combined multivariate and collapsing (CMC) and weighted sum statistic (WSS) methods and accounted for the effects of population stratification and subgroup effects using stratified analyses by the principal component analysis, named here as ‘str-CMC’ and ‘str-WSS’. To evaluate the validity of the extended methods, we analyzed the Genetic Architecture of Smoking and Smoking Cessation database, which includes African Americans and European Americans genotyped on Illumina Human Omni2.5, and we compared the results with those obtained with the sequence kernel association test (SKAT) and its modification, SKAT-O that included population stratification and subgroup effect as covariates. We utilized the Cochran–Mantel–Haenszel test to check for possible differences in single nucleotide polymorphism allele frequency between subgroups within a gene. We aimed to detect rare variants and considered population stratification and subgroup effects in the genomic region containing 39 acetylcholine receptor-related genes. Results The Cochran–Mantel–Haenszel test as applied to GABRG2 (P = 0.001) was significant. However, GABRG2 was detected both by str-CMC (P= 8.04E-06) and str-WSS (P= 0.046) in African Americans but not by SKAT or SKAT-O. Conclusions Our results imply that if associated rare variants are only specific to a subgroup, a stratified analysis might be a better approach than a combined analysis.[[notice]]補正完

    Modeling expression quantitative trait loci in data combining ethnic populations

    Get PDF
    Background Combining data from different ethnic populations in a study can increase efficacy of methods designed to identify expression quantitative trait loci (eQTL) compared to analyzing each population independently. In such studies, however, the genetic diversity of minor allele frequencies among populations has rarely been taken into account. Due to the fact that allele frequency diversity and population-level expression differences are present in populations, a consensus regarding the optimal statistical approach for analysis of eQTL in data combining different populations remains inconclusive. Results In this report, we explored the applicability of a constrained two-way model to identify eQTL for combined ethnic data that might contain genetic diversity among ethnic populations. In addition, gene expression differences resulted from ethnic allele frequency diversity between populations were directly estimated and analyzed by the constrained two-way model. Through simulation, we investigated effects of genetic diversity on eQTL identification by examining gene expression data pooled from normal quantile transformation of each population. Using the constrained two-way model to reanalyze data from Caucasians and Asian individuals available from HapMap, a large number of eQTL were identified with similar genetic effects on the gene expression levels in these two populations. Furthermore, 19 single nucleotide polymorphisms with inter-population differences with respect to both genotype frequency and gene expression levels directed by genotypes were identified and reflected a clear distinction between Caucasians and Asian individuals. Conclusions This study illustrates the influence of minor allele frequencies on common eQTL identification using either separate or combined population data. Our findings are important for future eQTL studies in which different datasets are combined to increase the power of eQTL identification.補正完

    Phenome-wide analysis of Taiwan Biobank reveals novel glycemia-related loci and genetic risks for diabetes

    Get PDF
    To explore the complex genetic architecture of common diseases and traits, we conducted comprehensive PheWAS of ten diseases and 34 quantitative traits in the community-based Taiwan Biobank (TWB). We identified 995 significantly associated loci with 135 novel loci specific to Taiwanese population. Further analyses highlighted the genetic pleiotropy of loci related to complex disease and associated quantitative traits. Extensive analysis on glycaemic phenotypes (T2D, fasting glucose and Hb

    Lack of association of genetic variants for diabetic retinopathy in Taiwanese patients with diabetic nephropathy

    Get PDF
    [[abstract]]Objective Diabetic nephropathy (DN) and diabetic retinopathy (DR) comprise major microvascular complications of diabetes that occur with a high concordance rate in patients and are considered to potentially share pathogeneses. In this case-control study, we sought to investigate whether DR-related single nucleotide polymorphisms (SNPs) exert pleiotropic effects on renal function outcomes among patients with diabetes. Research design and methods A total of 33 DR-related SNPs were identified by replicating published SNPs and via a genome-wide association study. Furthermore, we assessed the cumulative effects by creating a weighted genetic risk score and evaluated the discriminatory and prediction ability of these genetic variants using DN cases according to estimated glomerular filtration rate (eGFR) status along with a cohort with early renal functional decline (ERFD). Results Multivariate logistic regression models revealed that the DR-related SNPs afforded no individual or cumulative genetic effect on the nephropathy risk, eGFR status or ERFD outcome among patients with type two diabetes in Taiwan. Conclusion Our findings indicate that larger studies would be necessary to clearly ascertain the effects of individual genetic variants and further investigation is also required to identify other genetic pathways underlying DN.[[notice]]補正完

    Amyloid-Beta (Aβ) D7H Mutation Increases Oligomeric Aβ42 and Alters Properties of Aβ-Zinc/Copper Assemblies

    Get PDF
    Amyloid precursor protein (APP) mutations associated with familial Alzheimer's disease (AD) usually lead to increases in amyloid β-protein (Aβ) levels or aggregation. Here, we identified a novel APP mutation, located within the Aβ sequence (AβD7H), in a Taiwanese family with early onset AD and explored the pathogenicity of this mutation. Cellular and biochemical analysis reveal that this mutation increased Aβ production, Aβ42/40 ratio and prolonged Aβ42 oligomer state with higher neurotoxicity. Because the D7H mutant Aβ has an additional metal ion-coordinating residue, histidine, we speculate that this mutation may promote susceptibility of Aβ to ion. When co-incubated with Zn2+ or Cu2+, AβD7H aggregated into low molecular weight oligomers. Together, the D7H mutation could contribute to AD pathology through a “double punch” effect on elevating both Aβ production and oligomerization. Although the pathogenic nature of this mutation needs further confirmation, our findings suggest that the Aβ N-terminal region potentially modulates APP processing and Aβ aggregation, and further provides a genetic indication of the importance of Zn2+ and Cu2+ in the etiology of AD

    Predicting Risks of Dry Eye Disease Development Using a Genome-Wide Polygenic Risk Score Model

    No full text
    Purpose: The purpose of this study was to conduct a large-scale genome-wide association study (GWAS) and construct a polygenic risk score (PRS) for risk stratification in patients with dry eye disease (DED) using the Taiwan Biobank (TWB) databases. Methods: This retrospective case-control study involved 40,112 subjects of Han Chinese ancestry, sourced from the publicly available TWB. Cases were patients with DED (n = 14,185), and controls were individuals without DED (n = 25,927). The patients with DED were further divided into 8072 young (<60 years old) and 6113 old participants (≥60 years old). Using PLINK (version 1.9) software, quality control was carried out, followed by logistic regression analysis with adjustments for sex, age, body mass index, depression, and manic episodes as covariates. We also built PRS prediction models using the standard clumping and thresholding method and evaluated their performance (area under the curve [AUC]) through five-fold cross-validation. Results: Eleven independent risk loci were identified for these patients with DED at the genome-wide significance levels, including DNAJB6, MAML3, LINC02267, DCHS1, SIRPB3P, HULC, MUC16, GAS2L3, and ZFPM2. Among these, MUC16 encodes mucin family protein. The PRS model incorporated 932 and 740 genetic loci for young and old populations, respectively. A higher PRS score indicated a greater DED risk, with the top 5% of PRS individuals having a 10-fold higher risk. After integrating these covariates into the PRS model, the area under the receiver operating curve (AUROC) increased from 0.509 and 0.537 to 0.600 and 0.648 for young and old populations, respectively, demonstrating the genetic-environmental interaction. Conclusions: Our study prompts potential candidates for the mechanism of DED and paves the way for more personalized medication in the future. Translational Relevance: Our study identified genes related to DED and constructed a PRS model to improve DED prediction.補正完畢US

    Comparative analysis of genetic risk scores for predicting biochemical recurrence in prostate cancer patients after radical prostatectomy

    No full text
    Background In recent years, Genome-Wide Association Studies (GWAS) has identified risk variants related to complex diseases, but most genetic variants have less impact on phenotypes. To solve the above problems, methods that can use variants with low genetic effects, such as genetic risk score (GRS), have been developed to predict disease risk. Methods As the GRS model with the most incredible prediction power for complex diseases has not been determined, our study used simulation data and prostate cancer data to explore the disease prediction power of three GRS models, including the simple count genetic risk score (SC-GRS), the direct logistic regression genetic risk score (DL-GRS), and the explained variance weighted GRS based on directed logistic regression (EVDL-GRS). Results and Conclusions We used 26 SNPs to establish GRS models to predict the risk of biochemical recurrence (BCR) after radical prostatectomy. Combining clinical variables such as age at diagnosis, body mass index, prostate-specific antigen, Gleason score, pathologic T stage, and surgical margin and GRS models has better predictive power for BCR. The results of simulation data (statistical power = 0.707) and prostate cancer data (area under curve = 0.8462) show that DL-GRS has the best prediction performance. The rs455192 was the most relevant locus for BCR (p = 2.496 × 10–6) in our study.補正完畢US
    corecore