9 research outputs found

    metaQuantome : an integrated, quantitative metaproteomics approach reveals connections between taxonomy and protein function in complex microbiomes

    Get PDF
    Microbiome research offers promising insights into the impact of microorganisms on biological systems. Metaproteomics, the study of microbial proteins at the community level, integrates genomic, transcriptomic, and proteomic data to determine the taxonomic and functional state of a microbiome. However, standard metaproteomics software is subject to several limitations, commonly supporting only spectral counts, emphasizing exploratory analysis rather than hypothesis testing and rarely offering the ability to analyze the interaction of function and taxonomy -that is, which taxa are responsible for different processes. Here we present metaQuantome, a novel, multifaceted software suite that analyzes the state of a microbiome by leveraging complex taxonomic and functional hierarchies to summarize peptide-level quantitative information, emphasizing label-free intensity-based methods. For experiments with multiple experimental conditions, metaQuantome offers differential abundance analysis, principal components analysis, and clustered heat map visualizations, as well as exploratory analysis for a single sample or experimental condition. We benchmark metaQuantome analysis against standard methods, using two previously published datasets: (1) an artificially assembled microbial community dataset (taxonomy benchmarking) and (2) a dataset with a range of recombinant human proteins spiked into an Escherichia coli background (functional benchmarking). Furthermore, we demonstrate the use of metaQuantome on a previously published human oral microbiome dataset. In both the taxonomic and functional benchmarking analyses, metaQuantome quantified taxonomic and functional terms more accurately than standard summarization- based methods. We use the oral microbiome dataset to demonstrate metaQuantome's ability to produce publication- quality figures and elucidate biological processes of the oral microbiome. metaQuantome enables advanced investigation of metaproteomic datasets, which should be broadly applicable to microbiome-related research. In the interest of accessible, flexible, and reproducible analysis, metaQuantome is open source and available on the command line and in Galaxy

    Improve your Galaxy text life: The Query Tabular Tool [version 1; referees: 1 approved, 2 approved with reservations]

    Get PDF
    Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise.  Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline.  Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process and decreasing usability, especially for non-expert bench researchers focused on obtaining results.  In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of most users.  As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data.  This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps.  Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions.  Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing.  This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs

    Mathematical Modeling of Human Papillomavirus: Questioning Assumptions About Sexual Behavior

    No full text
    Mathematical models of disease are tools for guiding health policy, such as human papillomavirus (HPV) vaccination. Model development requires the careful consideration of the necessary assumptions and simplifications. We introduce basic sexually transmitted infection modeling theory, then consider two assumptions in the context of HPV: first, we improve upon a representation of sexual partnerships between age groups that is not based on sexual behavior data; and second, we examine the exclusion of non-heterosexual partnerships from the vast majority of sexually transmitted infection models. We calculate the effect of the simplifying assumptions on estimates of vaccine benefits

    An update on the moFF algorithm for label-free quantitative proteomics

    No full text
    moFF is a modular and operating-system-independent tool for quantitative analysis of label-free mass-spectrometry-based proteomics data. The moFF workflow, comprising matching-between-runs and apex quantification, can be applied to any upstream search engine's output, along with the corresponding Thermo or mzML raw file. We here present moFF 2.0, with improvements in speed through multithreading, the use of a new raw file access library, and a novel filtering approach in the matching-between-runs module. This filter allows moFF to correctly identify features that are present in one run but not in another, as demonstrated using spiked-in iRT peptides. Moreover, moFF 2.0 also provides a new peptide summary export that can be used in downstream statistical analysis. moFF is open source and freely available and can be downloaded from https://github.com/compomics/moF

    Survey of metaproteomics software tools for functional microbiome analysis

    Get PDF
    To gain a thorough appreciation of microbiome dynamics, researchers characterize the functional relevance of expressed microbial genes or proteins. This can be accomplished through metaproteomics, which characterizes the protein expression of microbiomes. Several software tools exist for analyzing microbiomes at the functional level by measuring their combined proteome-level response to environmental perturbations. In this survey, we explore the performance of six available tools, to enable researchers to make informed decisions regarding software choice based on their research goals. Tandem mass spectrometry-based proteomic data obtained from dental caries plaque samples grown with and without sucrose in paired biofilm reactors were used as representative data for this evaluation. Microbial peptides from one sample pair were identified by the X! tandem search algorithm via SearchGUI and subjected to functional analysis using software tools including eggNOG-mapper, MEGAN5, MetaGOmics, MetaProteomeAnalyzer (MPA), ProPHAnE, and Unipept to generate functional annotation through Gene Ontology (GO) terms. Among these software tools, notable differences in functional annotation were detected after comparing differentially expressed protein functional groups. Based on the generated GO terms of these tools we performed a peptide-level comparison to evaluate the quality of their functional annotations. A BLAST analysis against the NCBI non-redundant database revealed that the sensitivity and specificity of functional annotation varied between tools. For example, eggNOG-mapper mapped to the most number of GO terms, while Unipept generated more accurate GO terms. Based on our evaluation, metaproteomics researchers can choose the software according to their analytical needs and developers can use the resulting feedback to further optimize their algorithms. To make more of these tools accessible via scalable metaproteomics workflows, eggNOG-mapper and Unipept 4.0 were incorporated into the Galaxy platform.Peer reviewe

    Improve your Galaxy text life: The Query Tabular Tool [version 2; referees: 2 approved, 1 approved with reservations]

    No full text
    Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs

    Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework

    Get PDF
    The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics “Contribution Fest“ undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.Peer reviewe

    Exploring Online Crowdfunding for Cancer-Related Costs Among LGBTQ+ (Lesbian, Gay, Bisexual, Transgender, Queer, Plus) Cancer Survivors: Integration of Community-Engaged and Technology-Based Methodologies

    No full text
    BackgroundCancer survivors frequently experience cancer-related financial burdens. The extent to which Lesbian, Gay, Bisexual, Transgender, Queer, Plus (LGBTQ+) populations experience cancer-related cost-coping behaviors such as crowdfunding is largely unknown, owing to a lack of sexual orientation and gender identity data collection and social stigma. Web-scraping has previously been used to evaluate inequities in online crowdfunding, but these methods alone do not adequately engage populations facing inequities. ObjectiveWe describe the methodological process of integrating technology-based and community-engaged methods to explore the financial burden of cancer among LGBTQ+ individuals via online crowdfunding. MethodsTo center the LGBTQ+ community, we followed community engagement guidelines by forming a study advisory board (SAB) of LGBTQ+ cancer survivors, caregivers, and professionals who were involved in every step of the research. SAB member engagement was tracked through quarterly SAB meeting attendance and an engagement survey. We then used web-scraping methods to extract a data set of online crowdfunding campaigns. The study team followed an integrated technology-based and community-engaged process to develop and refine term dictionaries for analyses. Term dictionaries were developed and refined in order to identify crowdfunding campaigns that were cancer- and LGBTQ+-related. ResultsAdvisory board engagement was high according to metrics of meeting attendance, meeting participation, and anonymous board feedback. In collaboration with the SAB, the term dictionaries were iteratively edited and refined. The LGBTQ+ term dictionary was developed by the study team, while the cancer term dictionary was refined from an existing dictionary. The advisory board and analytic team members manually coded against the term dictionary and performed quality checks until high confidence in correct classification was achieved using pairwise agreement. Through each phase of manual coding and quality checks, the advisory board identified more misclassified campaigns than the analytic team alone. When refining the LGBTQ+ term dictionary, the analytic team identified 11.8% misclassification while the SAB identified 20.7% misclassification. Once each term dictionary was finalized, the LGBTQ+ term dictionary resulted in a 95% pairwise agreement, while the cancer term dictionary resulted in an 89.2% pairwise agreement. ConclusionsThe classification tools developed by integrating community-engaged and technology-based methods were more accurate because of the equity-based approach of centering LGBTQ+ voices and their lived experiences. This exemplar suggests integrating community-engaged and technology-based methods to study inequities is highly feasible and has applications beyond LGBTQ+ financial burden research

    Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform

    No full text
    For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform
    corecore