7 research outputs found

    A proteomics sample metadata representation for multiomics integration and big data analysis

    Get PDF
    The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.publishedVersio

    Galaxy Training: A powerful framework for teaching!

    Get PDF
    There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments

    Reproducible proteomics sample preparation for single FFPE tissue slices using acid-labile surfactant and direct trypsinization

    No full text
    Abstract Background Proteomic analyses of clinical specimens often rely on human tissues preserved through formalin-fixation and paraffin embedding (FFPE). Minimal sample consumption is the key to preserve the integrity of pathological archives but also to deal with minimal invasive core biopsies. This has been achieved by using the acid-labile surfactant RapiGest in combination with a direct trypsinization (DTR) strategy. A critical comparison of the DTR protocol with the most commonly used filter aided sample preparation (FASP) protocol is lacking. Furthermore, it is unknown how common histological stainings influence the outcome of the DTR protocol. Methods Four single consecutive murine kidney tissue specimens were prepared with the DTR approach or with the FASP protocol using both 10 and 30 k filter devices and analyzed by label-free, quantitative liquid chromatography–tandem mass spectrometry (LC–MS/MS). We compared the different protocols in terms of proteome coverage, relative label-free quantitation, missed cleavages, physicochemical properties and gene ontology term annotations of the proteins. Additionally, we probed compatibility of the DTR protocol for the analysis of common used histological stainings, namely hematoxylin & eosin (H&E), hematoxylin and hemalaun. These were proteomically compared to an unstained control by analyzing four human tonsil FFPE tissue specimens per condition. Results On average, the DTR protocol identified 1841 ± 22 proteins in a single, non-fractionated LC–MS/MS analysis, whereas these numbers were 1857 ± 120 and 1970 ± 28 proteins for the FASP 10 and 30 k protocol. The DTR protocol showed 15% more missed cleavages, which did not adversely affect quantitation and intersample comparability. Hematoxylin or hemalaun staining did not adversely impact the performance of the DTR protocol. A minor perturbation was observed for H&E staining, decreasing overall protein identification by 13%. Conclusions In essence, the DTR protocol can keep up with the FASP protocol in terms of qualitative and quantitative reproducibility and performed almost as well in terms of proteome coverage and missed cleavages. We highlight the suitability of the DTR protocol as a viable and straightforward alternative to the FASP protocol for proteomics-based clinical research

    A Galaxy of informatics resources for MS-based proteomics

    No full text
    International audienceIntroduction: Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. Areas covered: The Galaxy ecosystem meets these requirements by offering a multitude of opensource tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. Expert opinion: The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxybased resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem

    A proteomics sample metadata representation for multiomics integration and big data analysis

    No full text
    The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets
    corecore