4,187 research outputs found

    Multi-Label Multi-Kernel Transfer Learning for Human Protein Subcellular Localization

    Get PDF
    Recent years have witnessed much progress in computational modelling for protein subcellular localization. However, the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance, and the gene ontology (GO) based models may take the risk of performance overestimation for novel proteins. Furthermore, many human proteins have multiple subcellular locations, which renders the computational modelling more complicated. Up to the present, there are far few researches specialized for predicting the subcellular localization of human proteins that may reside in multiple cellular compartments. In this paper, we propose a multi-label multi-kernel transfer learning model for human protein subcellular localization (MLMK-TLM). MLMK-TLM proposes a multi-label confusion matrix, formally formulates three multi-labelling performance measures and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which to further extends our published work GO-TLM (gene ontology based transfer learning model for protein subcellular localization) and MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for multiplex human protein subcellular localization. With the advantages of proper homolog knowledge transfer, comprehensive survey of model performance for novel protein and multi-labelling capability, MLMK-TLM will gain more practical applicability. The experiments on human protein benchmark dataset show that MLMK-TLM significantly outperforms the baseline model and demonstrates good multi-labelling ability for novel human proteins. Some findings (predictions) are validated by the latest Swiss-Prot database. The software can be freely downloaded at http://soft.synu.edu.cn/upload/msy.rar

    Prediction of Protein Domain with mRMR Feature Selection and Analysis

    Get PDF
    The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28–40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine

    BOLD Temporal Dynamics of Rat Superior Colliculus and Lateral Geniculate Nucleus following Short Duration Visual Stimulation

    Get PDF
    Background: The superior colliculus (SC) and lateral geniculate nucleus (LGN) are important subcortical structures for vision. Much of our understanding of vision was obtained using invasive and small field of view (FOV) techniques. In this study, we use non-invasive, large FOV blood oxygenation level-dependent (BOLD) fMRI to measure the SC and LGN's response temporal dynamics following short duration (1 s) visual stimulation. Methodology/Principal Findings: Experiments are performed at 7 tesla on Sprague Dawley rats stimulated in one eye with flashing light. Gradient-echo and spin-echo sequences are used to provide complementary information. An anatomical image is acquired from one rat after injection of monocrystalline iron oxide nanoparticles (MION), a blood vessel contrast agent. BOLD responses are concentrated in the contralateral SC and LGN. The SC BOLD signal measured with gradient-echo rises to 50% of maximum amplitude (PEAK) 0.2±0.2 s before the LGN signal (p<0.05). The LGN signal returns to 50% of PEAK 1.4±1.2 s before the SC signal (p<0.05). These results indicate the SC signal rises faster than the LGN signal but settles slower. Spin-echo results support these findings. The post-MION image shows the SC and LGN lie beneath large blood vessels. This subcortical vasculature is similar to that in the cortex, which also lies beneath large vessels. The LGN lies closer to the large vessels than much of the SC. Conclusions/Significance: The differences in response timing between SC and LGN are very similar to those between deep and shallow cortical layers following electrical stimulation, which are related to depth-dependent blood vessel dilation rates. This combined with the similarities in vasculature between subcortex and cortex suggest the SC and LGN timing differences are also related to depth-dependent dilation rates. This study shows for the first time that BOLD responses in the rat SC and LGN following short duration visual stimulation are temporally different. © 2011 Lau et al

    A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem

    Get PDF
    <div><p>A nearly complete genome sequence of <em>Candidatus</em> ‘Acetothermum autotrophicum’, a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that <em>Ca.</em> ‘A. autotrophicum’ is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO<sub>2</sub> fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of <em>Ca.</em> ‘A. autotrophicum’ is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of <em>Ca.</em> ‘A. autotrophicum’ support the view that the first bacterial and archaeal lineages were H<sub>2</sub>-dependent acetogens and methanogenes living in hydrothermal environments.</p> </div

    Severe Acute Respiratory Syndrome–associated Coronavirus Infection

    Get PDF
    Whether severe acute respiratory syndrome–associated coronavirus (SARS-CoV) infection can be asymptomatic is unclear. We examined the seroprevalence of SARS-CoV among 674 healthcare workers from a hospital in which a SARS outbreak had occurred. A total of 353 (52%) experienced mild self-limiting illnesses, and 321 (48%) were asymptomatic throughout the course of these observations. None of these healthcare workers had antibody to SARS CoV, indicating that subclinical or mild infection attributable to SARS CoV in adults is rare

    Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network

    Get PDF
    One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well

    Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

    Get PDF
    Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

    A comparison of isolated circulating tumor cells and tissue biopsies using whole-genome sequencing in prostate cancer

    Get PDF
    Previous studies have demonstrated focal but limited molecular similarities between circulating tumor cells (CTCs) and biopsies using isolated genetic assays. We hypothesized that molecular similarity between CTCs and tissue exists at the single cell level when characterized by whole genome sequencing (WGS). By combining the NanoVelcro CTC Chip with laser capture microdissection (LCM), we developed a platform for single-CTC WGS. We performed this procedure on CTCs and tissue samples from a patient with advanced prostate cancer who had serial biopsies over the course of his clinical history. We achieved 30X depth and ≥ 95% coverage. Twenty-nine percent of the somatic single nucleotide variations (SSNVs) identified were founder mutations that were also identified in CTCs. In addition, 86% of the clonal mutations identified in CTCs could be traced back to either the primary or metastatic tumors. In this patient, we identified structural variations (SVs) including an intrachromosomal rearrangement in chr3 and an interchromosomal rearrangement between chr13 and chr15. These rearrangements were shared between tumor tissues and CTCs. At the same time, highly heterogeneous short structural variants were discovered in PTEN, RB1, and BRCA2 in all tumor and CTC samples. Using high-quality WGS on single-CTCs, we identified the shared genomic alterations between CTCs and tumor tissues. This approach yielded insight into the heterogeneity of the mutational landscape of SSNVs and SVs. It may be possible to use this approach to study heterogeneity and characterize the biological evolution of a cancer during the course of its natural history

    Piperidinols that show anti-tubercular activity as inhibitors of arylamine N-acetyltransferase: an essential enzyme for mycobacterial survival inside macrophages

    Get PDF
    Latent M. tuberculosis infection presents one of the major obstacles in the global eradication of tuberculosis (TB). Cholesterol plays a critical role in the persistence of M. tuberculosis within the macrophage during latent infection. Catabolism of cholesterol contributes to the pool of propionyl-CoA, a precursor that is incorporated into cell-wall lipids. Arylamine N-acetyltransferase (NAT) is encoded within a gene cluster that is involved in the cholesterol sterol-ring degradation and is essential for intracellular survival. The ability of the NAT from M. tuberculosis (TBNAT) to utilise propionyl-CoA links it to the cholesterol-catabolism pathway. Deleting the nat gene or inhibiting the NAT enzyme prevents intracellular survival and results in depletion of cell-wall lipids. TBNAT has been investigated as a potential target for TB therapies. From a previous high-throughput screen, 3-benzoyl-4-phenyl-1-methylpiperidinol was identified as a selective inhibitor of prokaryotic NAT that exhibited antimycobacterial activity. The compound resulted in time-dependent irreversible inhibition of the NAT activity when tested against NAT from M. marinum (MMNAT). To further evaluate the antimycobacterial activity and the NAT inhibition of this compound, four piperidinol analogues were tested. All five compounds exert potent antimycobacterial activity against M. tuberculosis with MIC values of 2.3-16.9 µM. Treatment of the MMNAT enzyme with this set of inhibitors resulted in an irreversible time-dependent inhibition of NAT activity. Here we investigate the mechanism of NAT inhibition by studying protein-ligand interactions using mass spectrometry in combination with enzyme analysis and structure determination. We propose a covalent mechanism of NAT inhibition that involves the formation of a reactive intermediate and selective cysteine residue modification. These piperidinols present a unique class of antimycobacterial compounds that have a novel mode of action different from known anti-tubercular drugs
    corecore