49 research outputs found

    Fizzy: feature subset selection for metagenomics

    Get PDF
    BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection - a sub-field of machine learning - can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. RESULTS: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. CONCLUSIONS: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

    Cytoplasmic chromatin triggers inflammation in senescence and cancer

    Get PDF
    Chromatin is traditionally viewed as a nuclear entity that regulates gene expression and silencing. However, we recently discovered the presence of cytoplasmic chromatin fragments that pinch off from intact nuclei of primary cells during senescence, a form of terminal cell-cycle arrest associated with pro-inflammatory responses. The functional significance of chromatin in the cytoplasm is unclear. Here we show that cytoplasmic chromatin activates the innate immunity cytosolic DNA-sensing cGAS-STING (cyclic GMP-AMP synthase linked to stimulator of interferon genes) pathway, leading both to short-term inflammation to restrain activated oncogenes and to chronic inflammation that associates with tissue destruction and cancer. The cytoplasmic chromatin-cGAS-STING pathway promotes the senescence-associated secretory phenotype in primary human cells and in mice. Mice deficient in STING show impaired immuno-surveillance of oncogenic RAS and reduced tissue inflammation upon ionizing radiation. Furthermore, this pathway is activated in cancer cells, and correlates with pro-inflammatory gene expression in human cancers. Overall, our findings indicate that genomic DNA serves as a reservoir to initiate a pro-inflammatory pathway in the cytoplasm in senescence and cancer. Targeting the cytoplasmic chromatin-mediated pathway may hold promise in treating inflammation-related disorders

    Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms

    Get PDF
    BACKGROUND: Currently, the naĂŻve Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa. PRINCIPAL FINDINGS: Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we "train" a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naĂŻve Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3-15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present). CONCLUSIONS: We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are "highly" similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database "doubles" in the future

    Job losses, unemployment duration, and new jobs in Spain

    No full text
    The definitive version is available at www.blackwell-synergy.comThis article focuses on workers who permanently lost their jobs for involuntary reasons in Spain. We use a 1985 representative survey of the Spanish labor force containing retrospective questions related to workers' mobility. We evaluate several characteristics of job losers, as compared to other unemployed workers with experience by the end of 1985. Thereafter, an analysis of job losses addresses the following questions: (1) What types of jobs were lost? (2) How did workers perform after their job loss? (3) How long have they been out of work? (4) What are the characteristics of the new jobs found?Publicad

    The taxa breakdown when testing on all the taxonomic levels.

    No full text
    <p> <i>While genera have 29% novel representation in the test set (in 15% of the sequences mentioned in </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032491#pone-0032491-g002" target="_blank"><i>Fig. 2</i></a><i>), 14.3% of the families are novel (in 15% of the sequences), 11% of the orders are novel (in 5% of the sequences), 4% of the classes are novel (in 2.4% of the sequences), and 5% of the phyla are novel (in 0.07% of the sequences).</i></p
    corecore