1,529 research outputs found

    Gene ontology based transfer learning for protein subcellular localization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as <it>GO</it>, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the <it>GO </it>terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology.</p> <p>Results</p> <p>In this paper, we propose a Gene Ontology Based Transfer Learning Model (<it>GO-TLM</it>) for large-scale protein subcellular localization. The model transfers the signature-based homologous <it>GO </it>terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false <it>GO </it>terms that are resulted from evolutionary divergence. We derive three <it>GO </it>kernels from the three aspects of gene ontology to measure the <it>GO </it>similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for protein subcellular localization. We evaluate <it>GO-TLM </it>performance against three baseline models: <it>MultiLoc, MultiLoc-GO </it>and <it>Euk-mPLoc </it>on the benchmark datasets the baseline models adopted. 5-fold cross validation experiments show that <it>GO-TLM </it>achieves substantial accuracy improvement against the baseline models: 80.38% against model <it>Euk-mPLoc </it>67.40% with <it>12.98% </it>substantial increase; 96.65% and 96.27% against model <it>MultiLoc-GO </it>89.60% and 89.60%, with <it>7.05% </it>and <it>6.67% </it>accuracy increase on dataset <it>MultiLoc plant </it>and dataset <it>MultiLoc animal</it>, respectively; 97.14%, 95.90% and 96.85% against model <it>MultiLoc-GO </it>83.70%, 90.10% and 85.70%, with accuracy increase <it>13.44%</it>, <it>5.8% </it>and <it>11.15% </it>on dataset <it>BaCelLoc plant</it>, dataset <it>BaCelLoc fungi </it>and dataset <it>BaCelLoc animal </it>respectively. For <it>BaCelLoc </it>independent sets, <it>GO-TLM </it>achieves 81.25%, 80.45% and 79.46% on dataset <it>BaCelLoc plant holdout</it>, dataset <it>BaCelLoc plant holdout </it>and dataset <it>BaCelLoc animal holdout</it>, respectively, as compared against baseline model <it>MultiLoc-GO </it>76%, 60.00% and 73.00%, with accuracy increase <it>5.25%</it>, <it>20.45% </it>and <it>6.46%</it>, respectively.</p> <p>Conclusions</p> <p>Since direct homology-based <it>GO </it>term transfer may be prone to introducing noise and outliers to the target protein, we design an explicitly weighted kernel learning system (called Gene Ontology Based Transfer Learning Model, <it>GO-TLM</it>) to transfer to the target protein the known knowledge about related homologous proteins, which can reduce the risk of outliers and share knowledge between homologous proteins, and thus achieve better predictive performance for protein subcellular localization. Cross validation and independent test experimental results show that the homology-based <it>GO </it>term transfer and explicitly weighing the <it>GO </it>kernels substantially improve the prediction performance.</p

    Rapid identification of the medicinal plant Taraxacum formosanum and distinguishing of this plant from its adulterants by ribosomal DNA internal transcribed spacer (ITS) based DNA barcode

    Get PDF
    Original identification of medicinal plants is essential for quality control. In this study, the internal transcribed spacer 2 (ITS2) nuclear ribosomal DNA served as a DNA barcode and was amplified by allele-specific PCR. This approach was exploited to differentiate Taraxacum formosanum from five related adulterants. Using a set of designed PCR primers, a highly specific 223 bp PCR product of T. formosanum was successfully amplified by PCR. However, no similar DNA fragment was amplified from any of the other adulterants. This indicates that, our allele specific primers have high specificity and can accurately discriminate T. formosanum from its adulterant plants.Key words: Medicinal plant, polymerase chain reaction (PCR), authentication, Taraxacum formosanum, traditional Chinese medicinal, internal transcribed spacers 2 (ITS2)

    Direct calibration of PICKY-designed microarrays

    Get PDF
    Abstract Background Few microarrays have been quantitatively calibrated to identify optimal hybridization conditions because it is difficult to precisely determine the hybridization characteristics of a microarray using biologically variable cDNA samples. Results Using synthesized samples with known concentrations of specific oligonucleotides, a series of microarray experiments was conducted to evaluate microarrays designed by PICKY, an oligo microarray design software tool, and to test a direct microarray calibration method based on the PICKY-predicted, thermodynamically closest nontarget information. The complete set of microarray experiment results is archived in the GEO database with series accession number GSE14717. Additional data files and Perl programs described in this paper can be obtained from the website http://www.complex.iastate.edu under the PICKY Download area. Conclusion PICKY-designed microarray probes are highly reliable over a wide range of hybridization temperatures and sample concentrations. The microarray calibration method reported here allows researchers to experimentally optimize their hybridization conditions. Because this method is straightforward, uses existing microarrays and relatively inexpensive synthesized samples, it can be used by any lab that uses microarrays designed by PICKY. In addition, other microarrays can be reanalyzed by PICKY to obtain the thermodynamically closest nontarget information for calibration

    A Multi-Label Predictor for Identifying the Subcellular Locations of Singleplex and Multiplex Eukaryotic Proteins

    Get PDF
    Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website http://levis.tongji.edu.cn:8080/bioinfo/Euk-ECC-mPLoc/. We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins

    Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

    Get PDF
    Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

    Shared probe design and existing microarray reanalysis using PICKY

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations.</p> <p>Results</p> <p><smcaps>PICKY</smcaps> 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. <smcaps>PICKY</smcaps> 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, <smcaps>PICKY</smcaps> does not sacrifice the quality of shared probes when choosing them. The latest <smcaps>PICKY</smcaps> 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making <smcaps>PICKY</smcaps> 2.1 more versatile to microarray users.</p> <p>Conclusions</p> <p>Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.</p

    The Impact of Schistosoma japonicum Infection and Treatment on Ultrasound-Detectable Morbidity: A Five-Year Cohort Study in Southwest China

    Get PDF
    Schistosomiasis is a water-borne parasite that infects approximately 200 million people worldwide. Schistosoma japonicum, found in Asia, causes disease by releasing eggs in the liver, leading to fibrosis, anemia, and, in children, impaired growth. Ultrasound can assess liver pathology from schistosomiasis; however more information is needed to evaluate the relevance of standard ultrasound measures. We followed 578 people for up to five years, testing for schistosomiasis infection and conducting ultrasound examinations to assess the relationship between infection and seven ultrasound measures and to evaluate the impact of treatment with anti-schistosomiasis chemotherapy (praziquantel) on morbidity. All infections were promptly treated. Fibrosis of the liver parenchyma, pathology unique to S. japonicum, was associated with schistosomiasis infection, and was most advanced in people with high worm burdens. Liver fibrosis declined significantly following treatment, but reversal of severe liver fibrosis was rare. Other ultrasound measures were not consistently related to schistosomiasis infection or treatment. These findings suggest parenchymal fibrosis can be used to measure morbidity attributable to S. japonicum and evaluate the impact of disease control efforts. Because reversal of severe fibrosis was limited, disease control efforts will be most effective if they can not only treat existing infections but also prevent new infections

    Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences.</p> <p>Results</p> <p>The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes.</p> <p>Conclusions</p> <p>The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

    MicroRNA-34a modulates genes involved in cellular motility and oxidative phosphorylation in neural precursors derived from human umbilical cord mesenchymal stem cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mesenchymal stem cell (MSC) found in bone marrow (BM-MSCs) and the Wharton's jelly matrix of human umbilical cord (WJ-MSCs) are able to transdifferentiate into neuronal lineage cells both <it>in vitro </it>and <it>in vivo </it>and therefore hold the potential to treat neural disorders such as stroke or Parkinson's disease. In bone marrow MSCs, miR-130a and miR-206 have been show to regulate the synthesis of neurotransmitter substance P in human mesenchymal stem cell-derived neuronal cells. However, how neuronal differentiation is controlled in WJ-MSC remains unclear.</p> <p>Methods</p> <p>WJ-MSCs were isolated from human umbilical cords. We subjected WJ-MSCs into neurogenesis by a published protocol, and the miRNome patterns of WJ-MSCs and their neuronal progenitors (day 9 after differentiation) were analyzed by the Agilent microRNA microarray.</p> <p>Results</p> <p>Five miRNAs were enriched in WJ-MSCs, including miR-345, miR-106a, miR-17-5p, miR-20a and miR-20b. Another 11 miRNAs (miR-206, miR-34a, miR-374, miR-424, miR-100, miR-101, miR-323, miR-368, miR-137, miR-138 and miR-377) were abundantly expressed in transdifferentiated neuronal progenitors. Among these miRNAs, miR-34a and miR-206 were the only 2 miRNAs been linked to BM-MSC neurogenesis. Overexpressing miR-34a in cells suppressed the expression of 136 neuronal progenitor genes, which all possess putative miR-34a binding sites. Gene enrichment analysis according to the Gene Ontology database showed that those 136 genes were associated with cell motility, energy production (including those with oxidative phosphorylation, electron transport and ATP synthesis) and actin cytoskeleton organization, indicating that miR-34a plays a critical role in precursor cell migration. Knocking down endogenous miR-34a expression in WJ-MSCs resulted in the augment of WJ-MSC motility.</p> <p>Conclusions</p> <p>Our data suggest a critical role of miRNAs in MSC neuronal differentiation, and miR-34a contributes in neuronal precursor motility, which may be crucial for stem cells to home to the target sites they should be.</p

    Synergistic Anti-Tumor Effects of Combination of Photodynamic Therapy and Arsenic Compound in Cervical Cancer Cells: In Vivo and In Vitro Studies

    Get PDF
    The effects of As4O6 as adjuvant on photodynamic therapy (PDT) were studied. As4O6 is considered to have anticancer activity via several biological actions, such as free radical production and inhibition of VEGF expression. PDT or As4O6 significantly inhibited TC-1 cell proliferation in a dose-dependent manner (P<0.05) by MTT assay. The anti-proliferative effect of the combination treatment was significantly higher than in TC-1 cells treated with either photodynamic therapy or As4O6 alone (62.4 and 52.5% decrease compared to vehicle-only treated TC-1 cells, respectively, P<0.05). In addition, cell proliferation in combination of photodynamic therapy and As4O6 treatment significantly decreased by 77.4% (P<0.05). Cell survival pathway (Naip1, Tert and Aip1) and p53-dependent pathway (Bax, p21Cip1, Fas, Gadd45, IGFBP-3 and Mdm-2) were markedly increased by combination treatment of photodynamic therapy and As4O6. In addition, the immune response in the NEAT pathway (Ly-12, CD178 and IL-2) was also modulated after combination treatment, suggesting improved antitumor effects by controlling unwanted growth-stimulatory pathways. The combination effect apparently reflected concordance with in vitro data, in restricting tumor growth in vivo and in relation to some common signaling pathways to those observed in vitro. These findings suggest the benefit of combinatory treatment with photodynamic therapy and As4O6 for inhibition of cervical cancer cell growth
    corecore