127 research outputs found
A novel hypothesis-unbiased method for gene ontology enrichment based on transcriptome data
Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer's disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology.Mario Fruzangohar, Esmaeil Ebrahimie, David L. Adelso
Amino Acid Features of P1B-ATPase Heavy Metal Transporters Enabling Small Numbers of Organisms to Cope with Heavy Metal Pollution
Phytoremediation refers to the use of plants for extraction and detoxification of pollutants, providing a new and powerful weapon against a polluted environment. In some plants, such as Thlaspi spp, heavy metal ATPases are involved in overall metal ion homeostasis and hyperaccumulation. P1B-ATPases pump a wide range of cations, especially heavy metals, across membranes against their electrochemical gradients. Determination of the protein characteristics of P1B-ATPases in hyperaccumulator plants provides a new opportuntity for engineering of phytoremediating plants. In this study, using diverse weighting and modeling approaches, 2644 protein characteristics of primary, secondary, and tertiary structures of P1B-ATPases in hyperaccumulator and nonhyperaccumulator plants were extracted and compared to identify differences between proteins in hyperaccumulator and nonhyperaccumulator pumps. Although the protein characteristics were variable in their weighting, tree and rule induction models; glycine count, frequency of glutamine-valine, and valine-phenylalanine count were the most important attributes highlighted by 10, five, and four models, respectively. In addition, a precise model was built to discriminate P1B-ATPases in different organisms based on their structural protein features. Moreover, reliable models for prediction of the hyperaccumulating activity of unknown P1B-ATPase pumps were developed. Uncovering important structural features of hyperaccumulator pumps in this study has provided the knowledge required for future modification and engineering of these pumps by techniques such as site-directed mutagenesis
Efficient and simple production of insulin-producing cells from embryonal carcinoma stem cells using mouse neonate pancreas extract, as a natural inducer
An attractive approach to replace the destroyed insulin-producing cells (IPCs) is the generation of functional β cells from stem cells. Embryonal carcinoma (EC) stem cells are pluripotent cells which can differentiate into all cell types. The present study was carried out to establish a simple nonselective inductive culture system for generation of IPCs from P19 EC cells by 1–2 weeks old mouse pancreas extract (MPE). Since, mouse pancreatic islets undergo further remodeling and maturation for 2–3 weeks after birth, we hypothesized that the mouse neonatal MPE contains essential factors to induce in vitro differentiation of pancreatic lineages. Pluripotency of P19 cells were first confirmed by expression analysis of stem cell markers, Oct3/4, Sox-2 and Nanog. In order to induce differentiation, the cells were cultured in a medium supplemented by different concentrations of MPE (50, 100, 200 and 300 µg/ml). The results showed that P19 cells could differentiate into IPCs and form dithizone-positive cell clusters. The generated P19-derived IPCs were immunoreactive to proinsulin, insulin and insulin receptor beta. The expression of pancreatic β cell genes including, PDX-1, INS1 and INS2 were also confirmed. The peak response at the 100 µg/ml MPE used for investigation of EP300 and CREB1 gene expression. When stimulated with glucose, these cells synthesized and secreted insulin. Network analysis of the key transcription factors (PDX-1, EP300, CREB1) during the generation of IPCs resulted in introduction of novel regulatory candidates such as MIR17, and VEZF1 transcription factors, as well as MORN1, DKFZp761P0212, and WAC proteins. Altogether, we demonstrated the possibility of generating IPCs from undifferentiated EC cells, with the characteristics of pancreatic β cells. The derivation of pancreatic cells from EC cells which are ES cell siblings would provide a valuable experimental tool in study of pancreatic development and function as well as rapid production of IPCs for transplantation.Marzieh Ebrahimie, Fariba Esmaeili, Somayeh Cheraghi, Fariba Houshmand, Leila Shabani, Esmaeil Ebrahimi
Genome-wide analysis of alternative splicing events in Hordeum vulgare: highlighting retention of intron-based splicing and its possible function through network analysis
In this study, using homology mapping of assembled expressed sequence tags against the genomic data, we identified alternative splicing events in barley. Results demonstrated that intron retention is frequently associated with specific abiotic stresses. Network analysis resulted in discovery of some specific sub-networks between miRNAs and transcription factors in genes with high number of alternative splicing, such as cross talk between SPL2, SPL10 and SPL11 regulated by miR156 and miR157 families. To confirm the alternative splicing events, elongation factor protein (MLOC_3412) was selected followed by experimental verification of the predicted splice variants by Semi quantitative Reverse Transcription PCR (qRT-PCR). Our novel integrative approach opens a new avenue for functional annotation of alternative splicing through regulatory-based network discovery.Bahman Panahi, Seyed Abolghasem Mohammadi, Reyhaneh Ebrahimi Khaksefidi, Jalil Fallah Mehrabadi, Esmaeil Ebrahimi
Amino acid features: a missing compartment of prediction of protein function
AbstractEnormous computational efforts have been carried out to predict structure and function of protein. However, nearly all of these efforts have been focused on prediction of function based on primary nucleic acid sequence or modeling 3D structure of protein from its nucleic acid sequence. In fact, it seems that amino acid attributes, which is an intermediate phase between DNA/RNA and advanced protein structure, has been missed.From 2010, we examined the possibility of precise prediction of structural protein function based on amino acid features by improving the following three aspects of amino acid research: (1) Increasing the number of computationally calculated amino acid features, (2) Testing different feature selection (attribute weighting) algorithms and selection of the most important amino acid attributes based on the overall conclusion of algorithms, (3) Examining different supervised and unsupervised data mining (machine learning) algorithms, and (4) Joining attribute weighting with different data mining algorithms. We applied the discovered procedure in different biological examples including: protein thermostability, halostability, prediction of function of heavy metal transporters, cancer diagnosis and prediction, and pursuing the EST-SSRs in amino acid level.In thermostability study, we successfully established an accurate expert system to predict the thermostability of any input sequence trough mining of its calculated amino acid features. Interestingly, performance of a clustering algorithm such as EMC can vary from 0.0% to 100%, depending upon which attribute weighting algorithm had summarized the attributes of the dataset prior to running the clustering algorithm.In another recent study on halostability, the results showed that amino acid composition can be used to efficiently discriminate halostable protein groups with up to 98% accuracy implying the possibility of precise prediction of halostability when an appropriate machine learning algorithm mines a large number of structural amino acid attributes of primary protein structure.Using our approach, simple amino acid features, without the need of advanced features of protein structure, could explain the difference between P1B-ATPases in hyperaccumulator and nonhyperaccumulator plants. More importantly, a precise model was built to discriminate P1B-ATPases in different organisms based on their structural amino acid features. In addition, for the first time, reliable models for prediction of the hyperaccumulating activity of unknown P1B-ATPase pumps were developed.We employed our method in monitoring and prediction of breast cancer. The results confirmed that amino acid composition can be used to discriminate between protein groups expressed in two forms of breast cancer: malignant and benign. This study was strong evidence that malignancy can be predicted out from amino acid, and malignant proteins can be distinguished based on the amino acid composition of their proteomes without further need for protein separation. An important outcome was the discovery of the role of dipeptides, in particular Ile-Ile, in cancer progression. In addition, Generalized Rule Induction (GRI) found association rules in the data showing the 100 most important rules classifying benign, malignant, and commonly expressed proteins expressed in breast cancers.In another investigation, we found that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. This can be supposed as a glimpse of invention of a new sort of biomarkers based on frequency of amino acids.Up to now, phylogenic trees, drawn by nucleic acid or amino acid sequence alignments, have been employed as the base of evolutionary studies. However, this method does not take into account the structural and functional features of sequences during evolution. On the contrary, the presented classification here, based on the decision tree, anomaly detection model and feature weighting, provides an evolutionary separation of organisms based on their structural reasons of this diversity.Our findings have the potential to be efficiently used in the following area: filling the gap between laboratory engineering of proteins and computational biology, developing amino acid feature based-biomarkers, increasing the accuracy of prediction of 3D protein structure based on important amino acid features, and developing websites/software for prediction of the results of mutation. In addition, important discovered amino acid features can be employed as clues for discovering important DNA mutations and increasing prediction accuracy of 3D structure from DNA sequence. Furthermore, this study offers new for protein function, irrespective of similarity searches.Esmaeil Ebrahimie, Mansour Ebrahimi, Mahdi Ebrahim
The Zebrafish equivalent of Alzheimer's disease-associated PRESENILIN Isoform PS2V regulates inflammatory and other responses to hypoxic stress
Dominant mutations in the PRESENILIN genes PSEN1 and PSEN2 cause familial Alzheimer's disease (fAD) that usually shows onset before 65 years of age. In contrast, genetic variation at the PSEN1 and PSEN2 loci does not appear to contribute to risk for the sporadic, late onset form of the disease (sAD), leading to doubts that these genes play a role in the majority of AD cases. However, a truncated isoform of PSEN2, PS2V, is upregulated in sAD brains and is induced by hypoxia and high cholesterol intake. PS2V can increase γ-secretase activity and suppress the unfolded protein response (UPR), but detailed analysis of its function has been hindered by lack of a suitable, genetically manipulable animal model since mice and rats lack this PRESENILIN isoform. We recently showed that zebrafish possess an isoform, PS1IV, that is cognate to human PS2V. Using an antisense morpholino oligonucleotide, we can block specifically the induction of PS1IV that normally occurs under hypoxia. Here, we exploit this ability to identify gene regulatory networks that are modulated by PS1IV. When PS1IV is absent under hypoxia-like conditions, we observe changes in expression of genes controlling inflammation (particularly sAD-associated IL1B and CCR5), vascular development, the UPR, protein synthesis, calcium homeostasis, catecholamine biosynthesis, TOR signaling, and cell proliferation. Our results imply an important role for PS2V in sAD as a component of a pathological mechanism that includes hypoxia/oxidative stress and support investigation of the role of PS2V in other diseases, including schizophrenia, when these are implicated in the pathology.Esmaeil Ebrahimie, Seyyed Hani Moussavi Nik, Morgan Newman, Mark Van Der Hoek and Michael Lardell
In silico analysis of high affinity potassium transporter (HKT) isoforms in different plants
BACKGROUND: High affinity potassium transporters (HKTs) are located in the plasma membrane of the vessels and have significant influence on salt tolerance in some plants. They exclude Na(+) from the parenchyma cells to reduce Na(+) concentration. Despite many studies, the underlying regulatory mechanisms and the exact functions of HKTs within different genomic backgrounds are relatively unknown. In this study, various bioinformatics techniques, including promoter analysis, identification of HKT-surrounding genes, and construction of gene networks, were applied to investigate the HKT regulatory mechanism. RESULTS: Promoter analysis showed that rice HKTs carry ABA response elements. Additionally, jasmonic acid response elements were detected on promoter region of TmHKT1;5. In silico synteny highlighted several unknown and new loci near rice, Arabidopsis thaliana and Physcomitrella patent HKTs, which may play a significant role in salt stress tolerance in concert with HKTs. Gene network prediction unravelled that crosstalk between jasmonate and ethylene reduces AtHKT1;1 expression. Furthermore, antiporter and transferase proteins were found in AtHKT1;1 gene network. Interestingly, regulatory elements on the promoter region of HKT in wild genotype (TmHKT1;5) were more frequent and variable than the ones in cultivated wheat (TaHKT1;5) which provides the possibility of rapid response and better understanding of environmental conditions for wild genotype. CONCLUSION: Detecting ABA and jasmonic acid response elements on promoter regions of HKTs provide valuable clues on underlying regulatory mechanisms of HKTs. In silico synteny and pathway discovery indicated several candidates which act in concert with HKTs in stress condition. We highlighted different arrangement of regulatory elements on promoter region of wild wheat (TmHKT1;5) compared to bread wheat (TaHKT1;5) in this study.Mahbobeh Zamani Babgohari, Esmaeil Ebrahimie, and Ali Niaz
Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments
Background: Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. Methods: To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. Result: We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. Conclusion: Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.Fatemeh Kargarfard, Ashkan Sami, Manijeh Mohammadi-Dehcheshmeh and Esmaeil Ebrahimi
Gene Ontology-based analysis of zebrafish’ omics data using the web tool Comparative Gene Ontology
Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: ‘‘Molecular Function,’’ ‘‘Biological Process,’’ and ‘‘Cellular Component.’’ GOanalysis can assist in revealing functionalmechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 (www.comparativego.com). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.Esmaeil Ebrahimie, Mario Fruzangohar, Seyyed Hani Moussavi-Nik, and Morgan Newma
Determining the most important physiological and agronomic traits contributing to maize grain yield through machine learning algorithms: a new avenue in intelligent agriculture
Prediction is an attempt to accurately forecast the outcome of a specific situation while using input information obtained from a set of variables that potentially describe the situation. They can be used to project physiological and agronomic processes; regarding this fact, agronomic traits such as yield can be affected by a large number of variables. In this study, we analyzed a large number of physiological and agronomic traits by screening, clustering, and decision tree models to select the most relevant factors for the prospect of accurately increasing maize grain yield. Decision tree models (with nearly the same performance evaluation) were the most useful tools in understanding the underlying relationships in physiological and agronomic features for selecting the most important and relevant traits (sowing date-location, kernel number per ear, maximum water content, kernel weight, and season duration) corresponding to the maize grain yield. In particular, decision tree generated by C&RT algorithm was the best model for yield prediction based on physiological and agronomical traits which can be extensively employed in future breeding programs. No significant differences in the decision tree models were found when feature selection filtering on data were used, but positive feature selection effect observed in clustering models. Finally, the results showed that the proposed model techniques are useful tools for crop physiologists to search through large datasets seeking patterns for the physiological and agronomic factors, and may assist the selection of the most important traits for the individual site and field. In particular, decision tree models are method of choice with the capability of illustrating different pathways of yield increase in breeding programs, governed by their hierarchy structure of feature ranking as well as pattern discovery via various combinations of features.Avat Shekoofa, Yahya Emam, Navid Shekoufa, Mansour Ebrahimi, Esmaeil Ebrahimi
- …