Search CORE

127 research outputs found

Relative effects of mutability and selection on single nucleotide polymorphisms in transcribed regions of the human genome

Author: Amos Christopher I
Gorlov Ivan P
Gorlova Olga Y
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Motivation Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation in humans. However, the factors that affect SNP density are poorly understood. The goal of this study was to estimate the relative effects of mutability and selection on SNP density in transcribed regions of human genes. It is important for prediction of the regions that harbor functional polymorphisms. Results We used frequency-validated SNPs resulting from single-nucleotide substitutions. SNPs were subdivided into five functional categories: (i) 5' untranslated region (UTR) SNPs, (ii) 3' UTR SNPs, (iii) synonymous SNPs, (iv) SNPs producing conservative missense mutations, and (v) SNPs producing radical missense mutations. Each of these categories was further subdivided into nine mutational categories on the basis of the single-nucleotide substitution type. Thus, 45 functional/mutational categories were analyzed. The relative mutation rate in each mutational category was estimated on the basis of published data. The proportion of segregating sites (PSSs) for each functional/mutational category was estimated by dividing the observed number of SNPs by the number of potential sites in the genome for a given functional/mutational category. By analyzing each functional group separately, we found significant positive correlations between PSSs and relative mutation rates (Spearman's correlation coefficient, at least r = 0.96, df = 9, <it>P </it>< 0.001). We adjusted the PSSs for the mutation rate and found that the functional category had a significant effect on SNP density (F = 5.9, df = 4, <it>P </it>= 0.001), suggesting that selection affects SNP density in transcribed regions of the genome. We used analyses of variance and covariance to estimate the relative effects of selection (functional category) and mutability (relative mutation rate) on the PSSs and found that approximately 87% of variation in PSS was due to variation in the mutation rate and approximately 13% was due to selection, suggesting that the probability that a site located in a transcribed region of a gene is polymorphic mostly depends on the mutability of the site.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genes With a Large Intronic Burden Show Greater Evolutionary Conservation on the Protein Level

Author: Amos Christopher
Fedorov Alexey
Gorlov Ivan
Gorlova Olga
Logothetis Christopher
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2014
Field of study

Background: The existence of introns in eukaryotic genes is believed to provide an evolutionary advantage by increasing protein diversity through exon shuffling and alternative splicing. However, this eukaryotic feature is associated with the necessity of exclusion of intronic sequences, which requires considerable energy expenditure and can lead to splicing errors. The relationship between intronic burden and evolution is poorly understood. The goal of this study was to analyze the relationship between the intronic burden and the level of evolutionary conservation of the gene. Results: We found a positive correlation between the level of evolutionary conservation of a gene and its intronic burden. The level of evolutionary conservation was estimated using the conservation index (CI). The CI value was determined on the basis of the most distant ortholog of the human protein sequence and ranged from 0 (the gene was unique to the human genome) to 9 (an ortholog of the human gene was detected in plants). In multivariable model, both the number of introns and total intron size remained significant predictors of CI. We also found that the number of alternative splice variants was positively correlated with CI. The expression level of a gene was negatively correlated with the number of introns and total size of intronic region. Genes with a greater intronic burden had lower density of missense and nonsense mutations in the coding regions of the gene, which suggests that they are under a stronger pressure from purifying selection. Conclusions: We identified a positive association between intronic burden and CI. One of the possible explanations of this is the idea of a cost-benefits balance. Evolutionarily conserved (functionally important) genes can “afford” the negative consequences of maintaining multiple introns because these consequences are outweighed by the benefit of maintaining the gene. Evolutionarily conserved and functionally important genes may use introns to create novel splice variants to tune the gene function to developmental stage and tissue type

Crossref

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Building a Statistical Model for Predicting Cancer Genes

Author: Amos Christopher
Fang Shenying
Gorlov Ivan P
Gorlova Olga Y.
Logothetis Christopher J
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2012
Field of study

More than 400 cancer genes have been identified in the human genome. The list is not yet complete. Statistical models predicting cancer genes may help with identification of novel cancer gene candidates. We used known prostate cancer (PCa) genes (identified through KnowledgeNet) as a training set to build a binary logistic regression model identifying PCa genes. Internal and external validation of the model was conducted using a validation set (also from KnowledgeNet), permutations, and external data on genes with recurrent prostate tumor mutations. We evaluated a set of 33 gene characteristics as predictors. Sixteen of the original 33 predictors were significant in the model. We found that a typical PCa gene is a prostate-specific transcription factor, kinase, or phosphatase with high interindividual variance of the expression level in adjacent normal prostate tissue and differential expression between normal prostate tissue and primary tumor. PCa genes are likely to have an antiapoptotic effect and to play a role in cell proliferation, angiogenesis, and cell adhesion. Their proteins are likely to be ubiquitinated or sumoylated but not acetylated. A number of novel PCa candidates have been proposed. Functional annotations of novel candidates identified antiapoptosis, regulation of cell proliferation, positive regulation of kinase activity, positive regulation of transferase activity, angiogenesis, positive regulation of cell division, and cell adhesion as top functions. We provide the list of the top 200 predicted PCa genes, which can be used as candidates for experimental validation. The model may be modified to predict genes for other cancer sites

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

FigShare

How to Get the Most from Microarray Data: Advice from Reverse Genomics

Author: Amos Christopher
Byun Jinyoung
Do Kim-Anh
Gorlov Ivan P
Gorlova Olga Y
Logothetis Christopher
Yang Ji-Yeon
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2014
Field of study

Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data–derived predictor of known cancer associated genes. We found that the traditional approach of identifying cancer genes—identifying differentially expressed genes—is not very efficient. The analysis of interindividual variation of gene expression in tumor samples identifies cancer-associated genes more effectively. The results were consistent across 4 major types of cancer: breast, colorectal, lung, and prostate. We used recently reported cancer-associated genes (2011–2012) for validation and found that novel cancer-associated genes can be best identified by elevated variance of the gene expression in tumor samples

Crossref

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Prediction of the Gene Expression in Normal Lung Tissue by the Gene Expression in Blood

Author: Amos Christopher I. I
Byun Jinyoung
Gorlov Ivan P
Gorlova Olga Y
Halloran Justin W
Qian David C
Zhu Dakai
Publication venue: Dartmouth Digital Commons
Publication date: 12/08/2015
Field of study

Background: Comparative analysis of gene expression in human tissues is important for understanding the molecular mechanisms underlying tissue-specific control of gene expression. It can also open an avenue for using gene expression in blood (which is the most easily accessible human tissue) to predict gene expression in other (less accessible) tissues, which would facilitate the development of novel gene expression based models for assessing disease risk and progression. Until recently, direct comparative analysis across different tissues was not possible due to the scarcity of paired tissue samples from the same individuals. Methods: In this study we used paired whole blood/lung gene expression data from the Genotype-Tissue Expression (GTEx) project. We built a generalized linear regression model for each gene using gene expression in lung as the outcome and gene expression in blood, age and gender as predictors. Results: For ~18 % of the genes, gene expression in blood was a significant predictor of gene expression in lung. We found that the number of single nucleotide polymorphisms (SNPs) influencing expression of a given gene in either blood or lung, also known as the number of quantitative trait loci (eQTLs), was positively associated with efficacy of blood-based prediction of that gene’s expression in lung. This association was strongest for shared eQTLs: those influencing gene expression in both blood and lung. Conclusions: In conclusion, for a considerable number of human genes, their expression levels in lung can be predicted using observable gene expression in blood. An abundance of shared eQTLs may explain the strong blood/lung correlations in the gene expression

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Modified Logistic Regression Models Using Gene Coexpression and Clinical Features to Predict Prostate Cancer Progression

Author: Christopher J. Logothetis
Hongya Zhao
Ivan P. Gorlov
Jia Zeng
Jianguo Dai
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Predicting disease progression is one of the most challenging problems in prostate cancer research. Adding gene expression data to prediction models that are based on clinical features has been proposed to improve accuracy. In the current study, we applied a logistic regression (LR) model combining clinical features and gene co-expression data to improve the accuracy of the prediction of prostate cancer progression. The top-scoring pair (TSP) method was used to select genes for the model. The proposed models not only preserved the basic properties of the TSP algorithm but also incorporated the clinical features into the prognostic models. Based on the statistical inference with the iterative cross validation, we demonstrated that prediction LR models that included genes selected by the TSP method provided better predictions of prostate cancer progression than those using clinical variables only and/or those that included genes selected by the one-gene-at-a-time approach. Thus, we conclude that TSP selection is a useful tool for feature (and/or gene) selection to use in prognostic models and our model also provides an alternative for predicting prostate cancer progression

Crossref

Directory of Open Access Journals

Variants at IRF5-TNPO3, 17q12-21 and MMEL1 are associated with primary biliary cirrhosis

Author: Chen Wei
Coltescu Catalina
Gorlov Ivan P.
Han Younghun
Hirschfield Gideon M.
Juran Brian D.
Liu Xiangdong
Lu Yan
Lu Yue
Xu Chun
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/08/2010
Field of study

We genotyped individuals with primary biliary cirrhosis and unaffected controls for suggestive risk loci (genome-wide association P \u3c 1 × 10−4) identified in a previous genome-wide association study. Combined analysis of the genome-wide association and replication datasets identified IRF5-TNPO3 (combined P = 8.66 × 10−13), 7q12-21 (combined P = 3.50 × 10−13) and MMEL1 (combined P = 3.15 × 10−8) as new primary biliary cirrhosis susceptibility loci. Fine-mapping studies showed that a single variant accounts for the IRF5-TNPO3 association. As these loci are implicated in other autoimmune conditions, these findings confirm genetic overlap among such diseases

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

INPP4B suppresses prostate cancer cell invasion

Author: Agoulnik Irina U.
Deryugina Elena I.
Gorlov Ivan P.
Hodgson Myles C.
Lin Dong
Lopez Sandra M.
Suarez Egla
Wang Yuzhuo
Xue Hui
Publication venue: FIU Digital Commons
Publication date: 01/01/2014
Field of study

Background INPP4B and PTEN dual specificity phosphatases are frequently lost during progression of prostate cancer to metastatic disease. We and others have previously shown that loss of INPP4B expression correlates with poor prognosis in multiple malignancies and with metastatic spread in prostate cancer. Results We demonstrate that de novo expression of INPP4B in highly invasive human prostate carcinoma PC-3 cells suppresses their invasion both in vitro and in vivo. Using global gene expression analysis, we found that INPP4B regulates a number of genes associated with cell adhesion, the extracellular matrix, and the cytoskeleton. Importantly, de novo expressed INPP4B suppressed the proinflammatory chemokine IL-8 and induced PAK6. These genes were regulated in a reciprocal manner following downregulation of INPP4B in the independently derived INPP4B-positive LNCaP prostate cancer cell line. Inhibition of PI3K/Akt pathway, which is highly active in both PC-3 and LNCaP cells, did not reproduce INPP4B mediated suppression of IL-8 mRNA expression in either cell type. In contrast, inhibition of PKC signaling phenocopied INPP4B-mediated inhibitory effect on IL-8 in either prostate cancer cell line. In PC-3 cells, INPP4B overexpression caused a decline in the level of metastases associated BIRC5 protein, phosphorylation of PKC, and expression of the common PKC and IL-8 downstream target, COX-2. Reciprocally, COX-2 expression was increased in LNCaP cells following depletion of endogenous INPP4B. Conclusion Taken together, we discovered that INPP4B is a novel suppressor of oncogenic PKC signaling, further emphasizing the role of INPP4B in maintaining normal physiology of the prostate epithelium and suppressing metastatic potential of prostate tumors

Crossref

Springer - Publisher Connector

PubMed Central

DigitalCommons@Florida International University

Dartmouth Digital Commons (Dartmouth College)

FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.

Author: Amos Christopher I
Byun Jinyoung
Cai Guoshuai
Cornelis Olivier
Dennis Joe
Dinulos James E
Easton Douglas
Gorlov Ivan
Han Younghun
Li Yafang
Seldin Michael F
Xiao Xiangjun
Publication venue: BMC Bioinformatics
Publication date: 01/03/2016
Field of study

BACKGROUND: Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the eigenvectors generated by Eigenstrat are sample specific and thus cannot be generalized to other individuals. RESULTS: We developed FastPop, an efficient R package that fills the gap between Structure and Eigenstrat. It can: 1, generate PCA scores that identify ancestral origins and can be used for multiple studies; 2, infer ancestry information for data arising from two or more intercontinental origins. We demonstrate the use of FastPop using 2318 SNP markers selected from the genome based on high variability among European, Asian and West African (African) populations. We conducted an analysis of 505 Hapmap samples with European, African or Asian ancestry along with 19661 additional samples of unknown ancestry. The results from FastPop are highly consistent with those obtained by Structure across the 19661 samples we studied. The correlations of the results between FastPop and Structure are 0.99, 0.97 and 0.99 for European, African and Asian ancestry scores, respectively. Compared with Structure, FastPop is more efficient as it finished ancestry inference for 19661 samples in 16 min compared with 21-24 h required by Structure. FastPop also provided scores based on SNP weights so the scores of reference population can be applied to other studies provided the same set of markers are used. We also present application of the method for studying four continental populations (European, Asian, African, and Native American). CONCLUSIONS: We developed an algorithm that can infer ancestries on data involving two or more intercontinental origins. It is efficient for analyzing large datasets. Additionally the PCA derived scores can be applied to multiple data sets to ensure the same ancestry analysis is applied to all studies

Crossref

PubMed Central

eScholarship - University of California

Apollo (Cambridge)

Dartmouth Digital Commons (Dartmouth College)

GWAS Meets Microarray: Are the Results of Genome-Wide Association Studies and Gene-Expression Profiling Consistent? Prostate Cancer as an Example

Author: AR Ramjaun
Christopher Amos
Christopher J. Logothetis
D Duggan
DW Huang
E Delva
Eshel Ben-Jacob
G Thomas
Gary E. Gallick
H Goel
IP Gorlov
IP Gorlov
Ivan P. Gorlov
J Gudmundsson
L Lacroix
M Fornaro
M Piao
M Takkunen
MC Brown
MD Hansen
MD Mason
MI McCarthy
Olga Y. Gorlova
PA Konstantinopoulos
R Chen
R Rosenthal
RK Nam
S Etienne-Manneville
SA Ochsner
SJ Moschos
SR Browning
T Bao
VM Bazas
XS Ke
Publication venue: Public Library of Science
Publication date: 01/08/2009
Field of study

Genome-wide association studies (GWASs) and global profiling of gene expression (microarrays) are two major technological breakthroughs that allow hypothesis-free identification of candidate genes associated with tumorigenesis. It is not obvious whether there is a consistency between the candidate genes identified by GWAS (GWAS genes) and those identified by profiling gene expression (microarray genes).We used the Cancer Genetic Markers Susceptibility database to retrieve single nucleotide polymorphisms from candidate genes for prostate cancer. In addition, we conducted a large meta-analysis of gene expression data in normal prostate and prostate tumor tissue. We identified 13,905 genes that were interrogated by both GWASs and microarrays. On the basis of P values from GWASs, we selected 1,649 most significantly associated genes for functional annotation by the Database for Annotation, Visualization and Integrated Discovery. We also conducted functional annotation analysis using same number of the top genes identified in the meta-analysis of the gene expression data. We found that genes involved in cell adhesion were overrepresented among both the GWAS and microarray genes.We conclude that the results of these analyses suggest that combining GWAS and microarray data would be a more effective approach than analyzing individual datasets and can help to refine the identification of candidate genes and functions associated with tumor development

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central