Search CORE

63 research outputs found

Detecting disease-associated genotype patterns

Author: Long Quan
Ott Jurg
Zhang Qingrun
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In addition to single-locus (main) effects of disease variants, there is a growing consensus that gene-gene and gene-environment interactions may play important roles in disease etiology. However, for the very large numbers of genetic markers currently in use, it has proven difficult to develop suitable and efficient approaches for detecting effects other than main effects due to single variants. Results We developed a method for jointly detecting disease-causing single-locus effects and gene-gene interactions. Our method is based on finding differences of genotype pattern frequencies between case and control individuals. Those single-nucleotide polymorphism markers with largest single-locus association test statistics are included in a pattern. For a logistic regression model comprising three disease variants exerting main and epistatic interaction effects, we demonstrate that our method is vastly superior to the traditional approach of looking for single-locus effects. In addition, our method is suitable for estimating the number of disease variants in a dataset. We successfully apply our approach to data on Parkinson Disease and heroin addiction. Conclusion Our approach is suitable and powerful for detecting disease susceptibility variants with potentially small main effects and strong interaction effects. It can be applied to large numbers of genetic markers.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Author: Li Da
Wu Jingjing
Zhang Qingrun
Publication venue
Publication date: 13/08/2023
Field of study

Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex optimization problems. Instead of the actual optimization problem, a series of smoothed optimization problems that can be achieved in various ways are solved in the graduated optimization approach. In this work, a formal formulation of the graduated optimization is provided based on the nonnegative approximate identity, which generalizes the idea of Gaussian smoothing. Also, an asymptotic convergence result is achieved with the techniques in variational analysis. Then, we show that the traditional SGD method can be applied to solve the smoothed optimization problem. The Monte Carlo integration is used to achieve the gradient in the smoothed problem, which may be consistent with distributed computing schemes in real-life applications. From the assumptions on the actual optimization problem, the convergence results of SGD for the smoothed problem can be derived straightforwardly. Numerical examples show evidence that the graduated optimization approach may provide more accurate training results in certain cases.Comment: 23 pages, 4 figure

arXiv.org e-Print Archive

Universal primers for HBV genome DNA amplification across subtypes: a case study for designing more effective viral primers

Author: Jia Shan'gang
Richards Elliott
Wu Guanghua
Zeng Changqing
Zhang Qingrun
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The highly heterogenic characteristic of viruses is the major obstacle to efficient DNA amplification. Taking advantage of the large number of virus DNA sequences in public databases to select conserved sites for primer design is an optimal way to tackle the difficulties in virus genome amplification. Results Here we use hepatitis B virus as an example to introduce a simple and efficient way for virus primer design. Based on the alignment of HBV sequences in public databases and a program BxB in Perl script, our method selected several optimal sites for HBV primer design. Polymerase chain reaction showed that compared with the success rate of the most popular primers for whole genome amplification of HBV, one set of primers for full length genome amplification and four sets of walking primers showed significant improvement. These newly designed primers are suitable for most subtypes of HBV. Conclusion Researchers can extend the method described here to design universal or subtype specific primers for various types of viruses. The BxB program based on multiple sequence alignment not only can be used as a separate tool but also can be integrated in any open source primer design software to select conserved regions for primer design.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Stabilized COre Gene and Pathway Election Uncovers Pan-Cancer Shared Pathways and a Cancer-Specific Driver

Author: Cai Weijia
Kossinna Pathum
Lu Xuewen
Shemanko Carrie S
Zhang Qingrun
Publication venue: Jefferson Digital Commons
Publication date: 21/12/2022
Field of study

Approaches systematically characterizing interactions via transcriptomic data usually follow two systems: (i) coexpression network analyses focusing on correlations between genes and (ii) linear regressions (usually regularized) to select multiple genes jointly. Both suffer from the problem of stability: A slight change of parameterization or dataset could lead to marked alterations of outcomes. Here, we propose Stabilized COre gene and Pathway Election (SCOPE), a tool integrating bootstrapped least absolute shrinkage and selection operator and coexpression analysis, leading to robust outcomes insensitive to variations in data. By applying SCOPE to six cancer expression datasets (BRCA, COAD, KIRC, LUAD, PRAD, and THCA) in The Cancer Genome Atlas, we identified core genes capturing interaction effects in crucial pan-cancer pathways related to genome instability and DNA damage response. Moreover, we highlighted the pivotal role of CD63 as an oncogenic driver and a potential therapeutic target in kidney cancer. SCOPE enables stabilized investigations toward complex interactions using transcriptome data

PubMed Central

Jefferson Digital Commons

Editorial: Statistical methods for genome-wide association studies (GWAS) and transcriptome-wide association studies (TWAS) and their applications

Author: Chen Cao
Huiyan Sun
Jingni He
Juexin Wang
Mengting Shao
Qingrun Zhang
Zilong Zhang
Publication venue: Frontiers Media S.A.
Publication date: 01/09/2023
Field of study

Directory of Open Access Journals

Recommended from our members

Comprehensive Analysis of CRP, CFH Y402H and Environmental Risk Factors on Risk of Neovascular Age-Related Macular Degeneration

Author: Adams Scott
Capone Antonio
DeAngelis Margaret
Dryja Thaddeus Peter
Ji Fei
Kim Ivana Kyung
Lane Anne Marie
Miller Joan Whitten
Morrison Margaux A.
Ott Jurg
Zhang Qingrun
Publication venue: Molecular Vision
Publication date: 22/02/2011
Field of study

Purpose: To examine if the gene encoding C-reactive protein (CRP), a biomarker of inflammation, confers risk for neovascular age-related macular degeneration (AMD) in the presence of other modifiers of inflammation, including body mass index (BMI), diabetes, smoking, and complement factor H (CFH) Y402 genotype. Additionally we examined the degree to which CRP common variation was in linkage disequilibrium (LD) within our cohort. Methods: We ascertained 244 individuals from 104 families where at least one member had neovascular AMD, and a sibling had normal maculae and was past the age of the index patient’s diagnosis of neovascular AMD. We employed a direct sequencing approach to analyze the 5′-promoter region as well as the entire coding region and the 3′-untranslated region of the CRP gene. CFH Y402 genotype data was available for all participants. Lifestyle and medical factors were obtained via administration of a standardized questionnaire. The family-based association test, haplotype analysis, McNemar’s test, and conditional logistic regression were used to determine significant associations and interactions. Haploview was used to calculate the degree of LD (r2) between all CRP variants identified. Results: Six single nucleotide polymorphisms (SNPs; rs3091244, rs1417938, rs1800947, rs1130864, rs1205, and rs3093068) comprised one haplotype block of which only rs1130864 and rs1417938 were in high LD (r2=0.94). SNP rs3093068 was in LD but less so with rs3093059 (r2=0.83), which is not part of the haplotype block. Six SNPs made up six different haplotypes with ≥ 5% frequency, none of which were significantly associated with AMD risk. No statistically significant association was detected between any of the nine common variants in CRP and neovascular AMD when considering disease status alone or when controlling for smoking exposure, BMI, diabetes, or CFH genotype. Significant interactions were not found between CRP genotypes and any of the risk factors studied. No novel CRP variation was identified. Conclusions: We provide evidence that if elevated serum/plasma levels of CRP are associated with neovascular AMD, it is likely not due to genetic variation within CRP, but likely due to variations in some other genetic as well as epidemiological factors

Harvard University - DASH

The Structural Characterization and Antigenicity of the S Protein of SARS-CoV

Author: Bi Shengli
Deng Yajun
Dong Wei
Han Yujun
Ji Jia
Jiang Fanbo
Li Hongyan
Li Jingxiang
Li Shengbin
Li Wei
Li Yudong
Luo Chunqing
Tang Lin
Tong Wei
Wang Jian
Wang Jing
Wei Wei
Xu Zhao
Yang Huanming
Ye Jia
Zhang Qingrun
Publication venue: Beijing Institute of Genomics, the Chinese Academy of Sciences and the Genetics Society of China. Production and hosting by Elsevier B.V.
Publication date: 31/05/2003
Field of study

The corona-like spikes or peplomers on the surface of the virion under electronic microscope are the most striking features of coronaviruses. The S (spike) protein is the largest structural protein, with 1,255 amino acids, in the viral genome. Its structure can be divided into three regions: a long N-terminal region in the exterior, a characteristic transmembrane (TM) region, and a short C-terminus in the interior of a virion. We detected fifteen substitutions of nucleotides by comparisons with the seventeen published SARS-CoV genome sequences, eight (53.3%) of which are non-synonymous mutations leading to amino acid alternations with predicted physiochemical changes. The possible antigenic determinants of the S protein are predicted, and the result is confirmed by ELISA (enzyme-linked immunosorbent assay) with synthesized peptides. Another profound finding is that three disulfide bonds are defined at the C-terminus with the N-terminus of the E (envelope) protein, based on the typical sequence and positions, thus establishing the structural connection with these two important structural proteins, if confirmed. Phylogenetic analysis reveals several conserved regions that might be potent drug targets

Elsevier - Publisher Connector

PoolHap: Inferring Haplotype Frequencies from Pooled Samples by Next Generation Sequencing

Author: A Fagotti
Chris Tyler-Smith
Daniel C. Jeffares
E Nanak
H Jiang
H Li
H Li
HP Liu
IP Gorlov
J Supabandhu
K Khrapko
Kai Ye
LE Fuhrman
M Al-Hajj
M Stephens
Magnus Nordborg
P Medvedev
P Navas
Qingrun Zhang
Quan Long
Thomas Mailund
TL Turner
Viktoria Nizhynska
Zemin Ning
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 01/01/2011
Field of study

With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e. g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central