79 research outputs found
Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks
[[abstract]]With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase in medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case–control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in human biobanks, the direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case–control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan plot and genetic disease information from the Taiwan Biobank to adjust the imbalance in the case–control ratio by SMOTE, called “TW-SMOTE.” We further used a deep learning image recognition system to identify the TW-SMOTE. We found that TW-SMOTE can achieve the same results as that of SAIGE and the UK Biobank (UKB). The processing of the technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as that of SAIGE in addressing data imbalance.[[notice]]補正完
Determining Population Stratification and Subgroup effects in Association Studies of Rare Genetic Variants for Nicotine Dependence
[[abstract]]Background Rare variants (minor allele frequency < 1% or 5 %) can help researchers to deal with the confounding issue of ‘missing heritability’ and have a proven role in dissecting the etiology for human diseases and complex traits.
Methods We extended the combined multivariate and collapsing (CMC) and weighted sum statistic (WSS) methods and accounted for the effects of population stratification and subgroup effects using stratified analyses by the principal component analysis, named here as ‘str-CMC’ and ‘str-WSS’. To evaluate the validity of the extended methods, we analyzed the Genetic Architecture of Smoking and Smoking Cessation database, which includes African Americans and European Americans genotyped on Illumina Human Omni2.5, and we compared the results with those obtained with the sequence kernel association test (SKAT) and its modification, SKAT-O that included population stratification and subgroup effect as covariates. We utilized the Cochran–Mantel–Haenszel test to check for possible differences in single nucleotide polymorphism allele frequency between subgroups within a gene. We aimed to detect rare variants and considered population stratification and subgroup effects in the genomic region containing 39 acetylcholine receptor-related genes.
Results The Cochran–Mantel–Haenszel test as applied to GABRG2 (P = 0.001) was significant. However, GABRG2 was detected both by str-CMC (P= 8.04E-06) and str-WSS (P= 0.046) in African Americans but not by SKAT or SKAT-O.
Conclusions
Our results imply that if associated rare variants are only specific to a subgroup, a stratified analysis might be a better approach than a combined analysis.[[notice]]補正完
Determining population stratification and subgroup effects in association studies of rare genetic variants for nicotine dependence
Modeling expression quantitative trait loci in data combining ethnic populations
Background
Combining data from different ethnic populations in a study can increase efficacy of methods designed to identify expression quantitative trait loci (eQTL) compared to analyzing each population independently. In such studies, however, the genetic diversity of minor allele frequencies among populations has rarely been taken into account. Due to the fact that allele frequency diversity and population-level expression differences are present in populations, a consensus regarding the optimal statistical approach for analysis of eQTL in data combining different populations remains inconclusive.
Results
In this report, we explored the applicability of a constrained two-way model to identify eQTL for combined ethnic data that might contain genetic diversity among ethnic populations. In addition, gene expression differences resulted from ethnic allele frequency diversity between populations were directly estimated and analyzed by the constrained two-way model. Through simulation, we investigated effects of genetic diversity on eQTL identification by examining gene expression data pooled from normal quantile transformation of each population. Using the constrained two-way model to reanalyze data from Caucasians and Asian individuals available from HapMap, a large number of eQTL were identified with similar genetic effects on the gene expression levels in these two populations. Furthermore, 19 single nucleotide polymorphisms with inter-population differences with respect to both genotype frequency and gene expression levels directed by genotypes were identified and reflected a clear distinction between Caucasians and Asian individuals.
Conclusions
This study illustrates the influence of minor allele frequencies on common eQTL identification using either separate or combined population data. Our findings are important for future eQTL studies in which different datasets are combined to increase the power of eQTL identification.補正完
Phenome-wide analysis of Taiwan Biobank reveals novel glycemia-related loci and genetic risks for diabetes
To explore the complex genetic architecture of common diseases and traits, we conducted comprehensive PheWAS of ten diseases and 34 quantitative traits in the community-based Taiwan Biobank (TWB). We identified 995 significantly associated loci with 135 novel loci specific to Taiwanese population. Further analyses highlighted the genetic pleiotropy of loci related to complex disease and associated quantitative traits. Extensive analysis on glycaemic phenotypes (T2D, fasting glucose and Hb
Lack of association of genetic variants for diabetic retinopathy in Taiwanese patients with diabetic nephropathy
[[abstract]]Objective Diabetic nephropathy (DN) and diabetic
retinopathy (DR) comprise major microvascular
complications of diabetes that occur with a high
concordance rate in patients and are considered to
potentially share pathogeneses. In this case-control
study, we sought to investigate whether DR-related single
nucleotide polymorphisms (SNPs) exert pleiotropic effects
on renal function outcomes among patients with diabetes.
Research design and methods A total of 33 DR-related
SNPs were identified by replicating published SNPs and
via a genome-wide association study. Furthermore, we
assessed the cumulative effects by creating a weighted
genetic risk score and evaluated the discriminatory and
prediction ability of these genetic variants using DN cases
according to estimated glomerular filtration rate (eGFR)
status along with a cohort with early renal functional
decline (ERFD).
Results Multivariate logistic regression models revealed
that the DR-related SNPs afforded no individual or
cumulative genetic effect on the nephropathy risk, eGFR
status or ERFD outcome among patients with type two
diabetes in Taiwan.
Conclusion Our findings indicate that larger studies would
be necessary to clearly ascertain the effects of individual
genetic variants and further investigation is also required
to identify other genetic pathways underlying DN.[[notice]]補正完
Amyloid-Beta (Aβ) D7H Mutation Increases Oligomeric Aβ42 and Alters Properties of Aβ-Zinc/Copper Assemblies
Amyloid precursor protein (APP) mutations associated with familial Alzheimer's disease (AD) usually lead to increases in amyloid β-protein (Aβ) levels or aggregation. Here, we identified a novel APP mutation, located within the Aβ sequence (AβD7H), in a Taiwanese family with early onset AD and explored the pathogenicity of this mutation. Cellular and biochemical analysis reveal that this mutation increased Aβ production, Aβ42/40 ratio and prolonged Aβ42 oligomer state with higher neurotoxicity. Because the D7H mutant Aβ has an additional metal ion-coordinating residue, histidine, we speculate that this mutation may promote susceptibility of Aβ to ion. When co-incubated with Zn2+ or Cu2+, AβD7H aggregated into low molecular weight oligomers. Together, the D7H mutation could contribute to AD pathology through a “double punch” effect on elevating both Aβ production and oligomerization. Although the pathogenic nature of this mutation needs further confirmation, our findings suggest that the Aβ N-terminal region potentially modulates APP processing and Aβ aggregation, and further provides a genetic indication of the importance of Zn2+ and Cu2+ in the etiology of AD
Predicting Risks of Dry Eye Disease Development Using a Genome-Wide Polygenic Risk Score Model
Purpose: The purpose of this study was to conduct a large-scale genome-wide association study (GWAS) and construct a polygenic risk score (PRS) for risk stratification in patients with dry eye disease (DED) using the Taiwan Biobank (TWB) databases.
Methods: This retrospective case-control study involved 40,112 subjects of Han Chinese ancestry, sourced from the publicly available TWB. Cases were patients with DED (n = 14,185), and controls were individuals without DED (n = 25,927). The patients with DED were further divided into 8072 young (<60 years old) and 6113 old participants (≥60 years old). Using PLINK (version 1.9) software, quality control was carried out, followed by logistic regression analysis with adjustments for sex, age, body mass index, depression, and manic episodes as covariates. We also built PRS prediction models using the standard clumping and thresholding method and evaluated their performance (area under the curve [AUC]) through five-fold cross-validation.
Results: Eleven independent risk loci were identified for these patients with DED at the genome-wide significance levels, including DNAJB6, MAML3, LINC02267, DCHS1, SIRPB3P, HULC, MUC16, GAS2L3, and ZFPM2. Among these, MUC16 encodes mucin family protein. The PRS model incorporated 932 and 740 genetic loci for young and old populations, respectively. A higher PRS score indicated a greater DED risk, with the top 5% of PRS individuals having a 10-fold higher risk. After integrating these covariates into the PRS model, the area under the receiver operating curve (AUROC) increased from 0.509 and 0.537 to 0.600 and 0.648 for young and old populations, respectively, demonstrating the genetic-environmental interaction.
Conclusions: Our study prompts potential candidates for the mechanism of DED and paves the way for more personalized medication in the future.
Translational Relevance: Our study identified genes related to DED and constructed a PRS model to improve DED prediction.補正完畢US
Comparative analysis of genetic risk scores for predicting biochemical recurrence in prostate cancer patients after radical prostatectomy
Background
In recent years, Genome-Wide Association Studies (GWAS) has identified risk variants related to complex diseases, but most genetic variants have less impact on phenotypes. To solve the above problems, methods that can use variants with low genetic effects, such as genetic risk score (GRS), have been developed to predict disease risk.
Methods
As the GRS model with the most incredible prediction power for complex diseases has not been determined, our study used simulation data and prostate cancer data to explore the disease prediction power of three GRS models, including the simple count genetic risk score (SC-GRS), the direct logistic regression genetic risk score (DL-GRS), and the explained variance weighted GRS based on directed logistic regression (EVDL-GRS).
Results and Conclusions
We used 26 SNPs to establish GRS models to predict the risk of biochemical recurrence (BCR) after radical prostatectomy. Combining clinical variables such as age at diagnosis, body mass index, prostate-specific antigen, Gleason score, pathologic T stage, and surgical margin and GRS models has better predictive power for BCR. The results of simulation data (statistical power = 0.707) and prostate cancer data (area under curve = 0.8462) show that DL-GRS has the best prediction performance. The rs455192 was the most relevant locus for BCR (p = 2.496 × 10–6) in our study.補正完畢US
- …
