102 research outputs found

    A novel hypothesis-unbiased method for gene ontology enrichment based on transcriptome data

    Get PDF
    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer's disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology.Mario Fruzangohar, Esmaeil Ebrahimie, David L. Adelso

    The Zebrafish equivalent of Alzheimer's disease-associated PRESENILIN Isoform PS2V regulates inflammatory and other responses to hypoxic stress

    Get PDF
    Dominant mutations in the PRESENILIN genes PSEN1 and PSEN2 cause familial Alzheimer's disease (fAD) that usually shows onset before 65 years of age. In contrast, genetic variation at the PSEN1 and PSEN2 loci does not appear to contribute to risk for the sporadic, late onset form of the disease (sAD), leading to doubts that these genes play a role in the majority of AD cases. However, a truncated isoform of PSEN2, PS2V, is upregulated in sAD brains and is induced by hypoxia and high cholesterol intake. PS2V can increase γ-secretase activity and suppress the unfolded protein response (UPR), but detailed analysis of its function has been hindered by lack of a suitable, genetically manipulable animal model since mice and rats lack this PRESENILIN isoform. We recently showed that zebrafish possess an isoform, PS1IV, that is cognate to human PS2V. Using an antisense morpholino oligonucleotide, we can block specifically the induction of PS1IV that normally occurs under hypoxia. Here, we exploit this ability to identify gene regulatory networks that are modulated by PS1IV. When PS1IV is absent under hypoxia-like conditions, we observe changes in expression of genes controlling inflammation (particularly sAD-associated IL1B and CCR5), vascular development, the UPR, protein synthesis, calcium homeostasis, catecholamine biosynthesis, TOR signaling, and cell proliferation. Our results imply an important role for PS2V in sAD as a component of a pathological mechanism that includes hypoxia/oxidative stress and support investigation of the role of PS2V in other diseases, including schizophrenia, when these are implicated in the pathology.Esmaeil Ebrahimie, Seyyed Hani Moussavi Nik, Morgan Newman, Mark Van Der Hoek and Michael Lardell

    Amino Acid Features of P1B-ATPase Heavy Metal Transporters Enabling Small Numbers of Organisms to Cope with Heavy Metal Pollution

    Get PDF
    Phytoremediation refers to the use of plants for extraction and detoxification of pollutants, providing a new and powerful weapon against a polluted environment. In some plants, such as Thlaspi spp, heavy metal ATPases are involved in overall metal ion homeostasis and hyperaccumulation. P1B-ATPases pump a wide range of cations, especially heavy metals, across membranes against their electrochemical gradients. Determination of the protein characteristics of P1B-ATPases in hyperaccumulator plants provides a new opportuntity for engineering of phytoremediating plants. In this study, using diverse weighting and modeling approaches, 2644 protein characteristics of primary, secondary, and tertiary structures of P1B-ATPases in hyperaccumulator and nonhyperaccumulator plants were extracted and compared to identify differences between proteins in hyperaccumulator and nonhyperaccumulator pumps. Although the protein characteristics were variable in their weighting, tree and rule induction models; glycine count, frequency of glutamine-valine, and valine-phenylalanine count were the most important attributes highlighted by 10, five, and four models, respectively. In addition, a precise model was built to discriminate P1B-ATPases in different organisms based on their structural protein features. Moreover, reliable models for prediction of the hyperaccumulating activity of unknown P1B-ATPase pumps were developed. Uncovering important structural features of hyperaccumulator pumps in this study has provided the knowledge required for future modification and engineering of these pumps by techniques such as site-directed mutagenesis

    Amino acid features: a missing compartment of prediction of protein function

    Get PDF
    AbstractEnormous computational efforts have been carried out to predict structure and function of protein. However, nearly all of these efforts have been focused on prediction of function based on primary nucleic acid sequence or modeling 3D structure of protein from its nucleic acid sequence. In fact, it seems that amino acid attributes, which is an intermediate phase between DNA/RNA and advanced protein structure, has been missed.From 2010, we examined the possibility of precise prediction of structural protein function based on amino acid features by improving the following three aspects of amino acid research: (1) Increasing the number of computationally calculated amino acid features, (2) Testing different feature selection (attribute weighting) algorithms and selection of the most important amino acid attributes based on the overall conclusion of algorithms, (3) Examining different supervised and unsupervised data mining (machine learning) algorithms, and (4) Joining attribute weighting with different data mining algorithms. We applied the discovered procedure in different biological examples including: protein thermostability, halostability, prediction of function of heavy metal transporters, cancer diagnosis and prediction, and pursuing the EST-SSRs in amino acid level.In thermostability study, we successfully established an accurate expert system to predict the thermostability of any input sequence trough mining of its calculated amino acid features. Interestingly, performance of a clustering algorithm such as EMC can vary from 0.0% to 100%, depending upon which attribute weighting algorithm had summarized the attributes of the dataset prior to running the clustering algorithm.In another recent study on halostability, the results showed that amino acid composition can be used to efficiently discriminate halostable protein groups with up to 98% accuracy implying the possibility of precise prediction of halostability when an appropriate machine learning algorithm mines a large number of structural amino acid attributes of primary protein structure.Using our approach, simple amino acid features, without the need of advanced features of protein structure, could explain the difference between P1B-ATPases in hyperaccumulator and nonhyperaccumulator plants. More importantly, a precise model was built to discriminate P1B-ATPases in different organisms based on their structural amino acid features. In addition, for the first time, reliable models for prediction of the hyperaccumulating activity of unknown P1B-ATPase pumps were developed.We employed our method in monitoring and prediction of breast cancer. The results confirmed that amino acid composition can be used to discriminate between protein groups expressed in two forms of breast cancer: malignant and benign. This study was strong evidence that malignancy can be predicted out from amino acid, and malignant proteins can be distinguished based on the amino acid composition of their proteomes without further need for protein separation. An important outcome was the discovery of the role of dipeptides, in particular Ile-Ile, in cancer progression. In addition, Generalized Rule Induction (GRI) found association rules in the data showing the 100 most important rules classifying benign, malignant, and commonly expressed proteins expressed in breast cancers.In another investigation, we found that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. This can be supposed as a glimpse of invention of a new sort of biomarkers based on frequency of amino acids.Up to now, phylogenic trees, drawn by nucleic acid or amino acid sequence alignments, have been employed as the base of evolutionary studies. However, this method does not take into account the structural and functional features of sequences during evolution. On the contrary, the presented classification here, based on the decision tree, anomaly detection model and feature weighting, provides an evolutionary separation of organisms based on their structural reasons of this diversity.Our findings have the potential to be efficiently used in the following area: filling the gap between laboratory engineering of proteins and computational biology, developing amino acid feature based-biomarkers, increasing the accuracy of prediction of 3D protein structure based on important amino acid features, and developing websites/software for prediction of the results of mutation. In addition, important discovered amino acid features can be employed as clues for discovering important DNA mutations and increasing prediction accuracy of 3D structure from DNA sequence. Furthermore, this study offers new for protein function, irrespective of similarity searches.Esmaeil Ebrahimie, Mansour Ebrahimi, Mahdi Ebrahim

    Gene Ontology-based analysis of zebrafish’ omics data using the web tool Comparative Gene Ontology

    Get PDF
    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: ‘‘Molecular Function,’’ ‘‘Biological Process,’’ and ‘‘Cellular Component.’’ GOanalysis can assist in revealing functionalmechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 (www.comparativego.com). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.Esmaeil Ebrahimie, Mario Fruzangohar, Seyyed Hani Moussavi-Nik, and Morgan Newma

    Cross-species meta-analysis of transcriptomic data in combination with supervised machine learning models identifies the common gene signature of lactation process

    Get PDF
    Lactation, a physiologically complex process, takes place in mammary gland after parturition. The expression profile of the effective genes in lactation has not comprehensively been elucidated. Herein, meta-analysis, using publicly available microarray data, was conducted identify the differentially expressed genes (DEGs) between pre- and post-peak milk production. Three microarray datasets of Rat, Bos Taurus, and Tammar wallaby were used. Samples related to pre-peak (n = 85) and post-peak (n = 24) milk production were selected. Meta-analysis revealed 31 DEGs across the studied species. Interestingly, 10 genes, including MRPS18B, SF1, UQCRC1, NUCB1, RNF126, ADSL, TNNC1, FIS1, HES5 and THTPA, were not detected in original studies that highlights meta-analysis power in biosignature discovery. Common target and regulator analysis highlighted the high connectivity of CTNNB1, CDD4 and LPL as gene network hubs. As data originally came from three different species, to check the effects of heterogeneous data sources on DEGs, 10 attribute weighting (machine learning) algorithms were applied. Attribute weighting results showed that the type of organism had no or little effect on the selected gene list. Systems biology analysis suggested that these DEGs affect the milk production by improving the immune system performance and mammary cell growth. This is the first study employing both meta-analysis and machine learning approaches for comparative analysis of gene expression pattern of mammary glands in two important time points of lactation process. The finding may pave the way to use of publically available to elucidate the underlying molecular mechanisms of physiologically complex traits such as lactation in mammals.Mohammad Farhadian, Seyed A. Rafat, Karim Hasanpur, Mansour Ebrahimi and Esmaeil Ebrahimi

    Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments

    Get PDF
    Background: Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. Methods: To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. Result: We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. Conclusion: Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.Fatemeh Kargarfard, Ashkan Sami, Manijeh Mohammadi-Dehcheshmeh and Esmaeil Ebrahimi

    Prediction of potential cancer-risk regions based on transcriptome data: towards a comprehensive view

    Get PDF
    A novel integrative pipeline is presented for discovery of potential cancer-susceptibility regions (PCSRs) by calculating the number of altered genes at each chromosomal region, using expression microarray datasets of different human cancers (HCs). Our novel approach comprises primarily predicting PCSRs followed by identification of key genes in these regions to obtain potential regions harboring new cancer-associated variants. In addition to finding new cancer causal variants, another advantage in prediction of such risk regions is simultaneous study of different types of genomic variants in line with focusing on specific chromosomal regions. Using this pipeline we extracted numbers of regions with highly altered expression levels in cancer condition. Regulatory networks were also constructed for different types of cancers following the identification of altered mRNA and microRNAs. Interestingly, results showed that GAPDH, LIFR, ZEB2, mir-21, mir-30a, mir-141 and mir-200c, all located at PCSRs, are common altered factors in constructed networks. We found a number of clusters of altered mRNAs and miRNAs on predicted PCSRs (e.g.12p13.31) and their common regulators including KLF4 and SOX10. Large scale prediction of risk regions based on transcriptome data can open a window in comprehensive study of cancer risk factors and the other human diseases.Arghavan Alisoltani, Hossein Fallahi, Mahdi Ebrahimi, Mansour Ebrahimi, Esmaeil Ebrahimi
    • …
    corecore