46 research outputs found

    Performance analysis of text classification algorithms for PubMed articles

    Get PDF
    The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary developed by the US National Library of Medicine (NLM) for indexing articles in Pubmed Central (PMC) archive. The annotation process is a complex and time-consuming task relying on subjective manual assignment of MeSH concepts. Automating such tasks with machine learning may provide a more efficient way of organizing biomedical literature in a less ambiguous way. This research provides a case study which compares the performance of several different machine learning algorithms (Topic Modelling, Random Forest, Logistic Regression, Support Vector Classifiers, Multinomial Naive Bayes, Convolutional Neural Network and Long Short-Term Memory (LSTM)) in reproducing manually assigned MeSH annotations. Records for this study were retrieved from Pubmed using the E-utilities API to the Entrez system of databases at NCBI (National Centre for Biotechnology Information). The MeSH vocabulary is organised in a hierarchical structure and article abstracts labelled with a single MeSH term from the top second two layers were selected for training the machine learning models. Various strategies for text multiclass classification were considered. One was a Chi-square test for feature selection which identified words relevant to each MeSH label. The second approach used Named Entity Recognition (NER) to extract entities from the unstructured text and another approach relied on word embeddings able to capture latent knowledge from literature. At the start of the study text was tokenised using the Term Frequency Inverse Document Frequency (Tf-idf) technique and topic modelling performed with the objective to ascertain the correlation between assigned topics (unsupervised learning task) and MeSH terms in PubMed. Findings revealed the degree of coupling was low although significant. Of all of the classifier models trained, logistic regression on Tf-idf vectorised entities achieved highest accuracy. Performance varied across the different MeSH categories. In conclusion automated curation of articles by abstract may be possible for those target classes classified reliably and reproducibly

    Propionate metabolism in Mycobacterium tuberculosis: characterization of the vitamin B12-dependent methylmalonyl pathway

    Get PDF
    Propionyl-CoA is a three-carbon (C3) short-chain fatty acid (SCFA) derivative of branchedchain amino acids, branched- and odd-chain fatty acids and cholesterol. Degradation of propionyl-CoA-generating carbon sources during infection (Pandey and Sassetti, 2008) requires the concomitant ability to oxidise this metabolite as a carbon and energy source, so as to avoid its cytotoxic effects if accumulated. The methylcitrate cycle in Mycobacterium tuberculosis (MTB) has been characterized and is essential for propionate oxidation in vitro, although dispensable for growth and persistence in mice (Muñoz-Elias et al., 2006). This study reveals that MTB possesses an alternative pathway for propionate metabolism, the vitamin B12-dependent methylmalonyl pathway. Specifically, we demonstrate the ability of MTB to utilise propionyl-CoA-generating carbon sources in the absence of the methylcitrate cycle, provided that vitamin B12 is supplied exogenously. This ability is shown to be dependent on methylmalonyl-CoA mutase (MCM; MutAB), which requires the adenosylcobalamin derivative of vitamin B12 for activity. The inability of MTB to synthesise vitamin B12 (Warner et al., 2007) is consistent with the essentiality of the methylcitrate cycle for growth on propionate (Muñoz-Elias et al., 2006). The demonstrated functionality of the methylmalonyl pathway offers an explanation for the dispensability of the methylcitrate cycle for survival of the mycobacterium in vivo where access to vitamin B12 may be unrestricted. Gene expression analysis was used to interpret flux through the two pathways on propionate (C3) and valerate (C5) odd-chain fatty acids. In the presence of a functional methylmalonyl pathway, expression of methylcitrate dehydratase (MCD) and methylcitrate lyase (MCL) was reduced. Consistent with reduced levels of bifunctional isocitrate lyase (ICL)1/ MCL in MTB (Gould et al., 2006; Muñoz-Elias et al., 2006), growth on propionate and valerate was shown to by-pass the requirement for carbon anaplerosis by the glyoxylate cycle when propionyl- CoA was converted to the tricarboxylic acid cycle (TCA) intermediate, succinyl-CoA, through the methylmalonyl pathway. The potential of an autonomous methylmalonyl pathway in MTB is demonstrated which underscores the importance of vitamin B12 in MTB physiology. Alternately, MTB deficient for the methylcitrate cycle was able to grow on heptadecanoate (C17) without vitamin B12 supplementation. In the absence of either propionate oxidizing pathway, derivative propionyl-CoA may be used as a key precursor for the biosynthesis of several cell wall virulence lipids (Jain et al., 2007)

    The Constrained Maximal Expression Level Owing to Haploidy Shapes Gene Content on the Mammalian X Chromosome.

    Get PDF
    X chromosomes are unusual in many regards, not least of which is their nonrandom gene content. The causes of this bias are commonly discussed in the context of sexual antagonism and the avoidance of activity in the male germline. Here, we examine the notion that, at least in some taxa, functionally biased gene content may more profoundly be shaped by limits imposed on gene expression owing to haploid expression of the X chromosome. Notably, if the X, as in primates, is transcribed at rates comparable to the ancestral rate (per promoter) prior to the X chromosome formation, then the X is not a tolerable environment for genes with very high maximal net levels of expression, owing to transcriptional traffic jams. We test this hypothesis using The Encyclopedia of DNA Elements (ENCODE) and data from the Functional Annotation of the Mammalian Genome (FANTOM5) project. As predicted, the maximal expression of human X-linked genes is much lower than that of genes on autosomes: on average, maximal expression is three times lower on the X chromosome than on autosomes. Similarly, autosome-to-X retroposition events are associated with lower maximal expression of retrogenes on the X than seen for X-to-autosome retrogenes on autosomes. Also as expected, X-linked genes have a lesser degree of increase in gene expression than autosomal ones (compared to the human/Chimpanzee common ancestor) if highly expressed, but not if lowly expressed. The traffic jam model also explains the known lower breadth of expression for genes on the X (and the Z of birds), as genes with broad expression are, on average, those with high maximal expression. As then further predicted, highly expressed tissue-specific genes are also rare on the X and broadly expressed genes on the X tend to be lowly expressed, both indicating that the trend is shaped by the maximal expression level not the breadth of expression per se. Importantly, a limit to the maximal expression level explains biased tissue of expression profiles of X-linked genes. Tissues whose tissue-specific genes are very highly expressed (e.g., secretory tissues, tissues abundant in structural proteins) are also tissues in which gene expression is relatively rare on the X chromosome. These trends cannot be fully accounted for in terms of alternative models of biased expression. In conclusion, the notion that it is hard for genes on the Therian X to be highly expressed, owing to transcriptional traffic jams, provides a simple yet robustly supported rationale of many peculiar features of X's gene content, gene expression, and evolution

    A Riboswitch Regulates Expression of the Coenzyme B(12)-Independent Methionine Synthase in Mycobacterium tuberculosis: Implications for Differential Methionine Synthase Function in Strains H37Rv and CDC1551

    No full text
    We observed vitamin B(12)-mediated growth inhibition of Mycobacterium tuberculosis strain CDC1551. The B(12) sensitivity was mapped to a polymorphism in metH, encoding a coenzyme B(12)-dependent methionine synthase. Vitamin B(12)-resistant suppressor mutants of CDC1551 containing mutations in a B(12) riboswitch upstream of the metE gene, which encodes a B(12)-independent methionine synthase, were isolated. Expression analysis confirmed that the B(12) riboswitch is a transcriptional regulator of metE in M. tuberculosis

    Functional characterization of a vitamin B12-dependent methylmalonyl pathway in Mycobacterium tuberculosis: implications for propionate metabolism during growth on fatty acids

    No full text
    Mycobacterium tuberculosis is predicted to subsist on alternative carbon sources during persistence within the human host. Catabolism of odd- and branched-chain fatty acids, branched-chain amino acids, and cholesterol generates propionyl-coenzyme A (CoA) as a terminal, three-carbon (C(3)) product. Propionate constitutes a key precursor in lipid biosynthesis but is toxic if accumulated, potentially implicating its metabolism in M. tuberculosis pathogenesis. In addition to the well-characterized methylcitrate cycle, the M. tuberculosis genome contains a complete methylmalonyl pathway, including a mutAB-encoded methylmalonyl-CoA mutase (MCM) that requires a vitamin B(12)-derived cofactor for activity. Here, we demonstrate the ability of M. tuberculosis to utilize propionate as the sole carbon source in the absence of a functional methylcitrate cycle, provided that vitamin B(12) is supplied exogenously. We show that this ability is dependent on mutAB and, furthermore, that an active methylmalonyl pathway allows the bypass of the glyoxylate cycle during growth on propionate in vitro. Importantly, although the glyoxylate and methylcitrate cycles supported robust growth of M. tuberculosis on the C(17) fatty acid heptadecanoate, growth on valerate (C(5)) was significantly enhanced through vitamin B(12) supplementation. Moreover, both wild-type and methylcitrate cycle mutant strains grew on B(12)-supplemented valerate in the presence of 3-nitropropionate, an inhibitor of the glyoxylate cycle enzyme isocitrate lyase, indicating an anaplerotic role for the methylmalonyl pathway. The demonstrated functionality of MCM reinforces the potential relevance of vitamin B(12) to mycobacterial pathogenesis and suggests that vitamin B(12) availability in vivo might resolve the paradoxical dispensability of the methylcitrate cycle for the growth and persistence of M. tuberculosis in mice

    IL-4Rα-Dependent Alternative Activation of Macrophages Is Not Decisive for <i>Mycobacterium tuberculosis</i> Pathology and Bacterial Burden in Mice

    Get PDF
    <div><p>Classical activation of macrophages (caMph or M1) is crucial for host protection against <i>Mycobacterium tuberculosis</i> (<i>Mtb</i>) infection. Evidence suggests that IL-4/IL-13 alternatively activated macrophages (aaMph or M2) are exploited by <i>Mtb</i> to divert microbicidal functions of caMph. To define the functions of M2 macrophages during tuberculosis (TB), we infected mice deficient for IL-4 receptor α on macrophages (LysM<sup>cre</sup>IL-4Rα<sup>-/lox</sup>) with <i>Mtb</i>. We show that absence of IL-4Rα on macrophages does not play a major role during infection with <i>Mtb</i> H37Rv, or the clinical Beijing strain HN878. This was demonstrated by similar mortality, bacterial burden, histopathology and T cell proliferation between infected wild-type (WT) and LysM<sup>cre</sup>IL-4Rα<sup>-/lox</sup> mice. Interestingly, we observed no differences in the lung expression of inducible nitric oxide synthase (iNOS) and Arginase 1 (Arg1), well-established markers for M1/M2 macrophages among the <i>Mtb</i>-infected groups. Kinetic expression studies of IL-4/IL-13 activated bone marrow-derived macrophages (BMDM) infected with HN878, followed by gene set enrichment analysis, revealed that the MyD88 and IL-6, IL-10, G-CSF pathways are significantly enriched, but not the IL-4Rα driven pathway. Together, these results suggest that IL-4Rα-macrophages do not play a central role in TB disease progression.</p></div

    MyD88 and IL-6, IL-10, G-CSF-dependent pathway genes are significantly enriched in HN878 infected vs. non-infected macrophages

    No full text
    <p>. BMDM were stimulated with IL-4/IL-13 or left untreated. After 24 hours of stimulation, cells were infected with HN878. Total RNA was extracted at 4, 12 and 48 hours PI for microarray and GSEA analysis. Enrichment plots and heat maps for (A) MyD88, (B) IL-6, IL-10, G-CSF and (C) IL-4Rα pathway are shown. Enrichment analysis compared log2-fold changes in <i>Mtb</i>-infected samples vs. non-infected samples. The rows in heat maps are listed according to pre-ranking metric scores. Replicates shown are from two independent experiments.</p

    No major differences in expression of iNOS, Arg1, lung immune cell populations and T cell proliferation between wild-type and macrophage cell-specific IL-4Rα deficient mice following low-dose <i>Mtb</i> H37Rv infection (100 CFU/mouse).

    No full text
    <p>(A) iNOS and Arg1 staining (brown colour) from lung sections collected at indicated times PI, original magnification: 40X. Lung sections from 5 mice/group were quantified. N.D. = not detectable. (B) iNOS and Arg1 expression on various immune cells were analysed by flow cytometry at 18 weeks PI (6–7 mice/group, *<i>P</i> < 0.05). (C) T cell proliferation with co-cultured CD11c-sorted macrophages from the lung tissue of naïve non-infected and mice infected with H37Rv (100 CFU/mouse) by aerosol at 4 and 18 weeks. Data shown in A is representative of two independent experiments and results obtained in B and C are from one experiment.</p
    corecore