15,795 research outputs found

    Mining whole sample mass spectrometry proteomics data for biomarkers: an overview

    No full text
    In this paper we aim to provide a concise overview of designing and conducting an MS proteomics experiment in such a way as to allow statistical analysis that may lead to the discovery of novel biomarkers. We provide a summary of the various stages that make up such an experiment, highlighting the need for experimental goals to be decided upon in advance. We discuss issues in experimental design at the sample collection stage, and good practise for standardising protocols within the proteomics laboratory. We then describe approaches to the data mining stage of the experiment, including the processing steps that transform a raw mass spectrum into a useable form. We propose a permutation-based procedure for determining the significance of reported error rates. Finally, because of its general advantages in speed and cost, we suggest that MS proteomics may be a good candidate for an early primary screening approach to disease diagnosis, identifying areas of risk and making referrals for more specific tests without necessarily making a diagnosis in its own right. Our discussion is illustrated with examples drawn from experiments on bovine blood serum conducted in the Centre for Proteomic Research (CPR) at Southampton University

    Making open data work for plant scientists

    Get PDF
    Despite the clear demand for open data sharing, its implementation within plant science is still limited. This is, at least in part, because open data-sharing raises several unanswered questions and challenges to current research practices. In this commentary, some of the challenges encountered by plant researchers at the bench when generating, interpreting, and attempting to disseminate their data have been highlighted. The difficulties involved in sharing sequencing, transcriptomics, proteomics, and metabolomics data are reviewed. The benefits and drawbacks of three data-sharing venues currently available to plant scientists are identified and assessed: (i) journal publication; (ii) university repositories; and (iii) community and project-specific databases. It is concluded that community and project-specific databases are the most useful to researchers interested in effective data sharing, since these databases are explicitly created to meet the researchers’ needs, support extensive curation, and embody a heightened awareness of what it takes to make data reuseable by others. Such bottom-up and community-driven approaches need to be valued by the research community, supported by publishers, and provided with long-term sustainable support by funding bodies and government. At the same time, these databases need to be linked to generic databases where possible, in order to be discoverable to the majority of researchers and thus promote effective and efficient data sharing. As we look forward to a future that embraces open access to data and publications, it is essential that data policies, data curation, data integration, data infrastructure, and data funding are linked together so as to foster data access and research productivity

    Extensive mass spectrometry-based analysis of the fission yeast proteome: the Schizosaccharomyces pombe PeptideAtlas

    Get PDF
    We report a high quality and system-wide proteome catalogue covering 71% (3,542 proteins) of the predicted genes of fission yeast, Schizosaccharomyces pombe, presenting the largest protein dataset to date for this important model organism. We obtained this high proteome and peptide (11.4 peptides/protein) coverage by a combination of extensive sample fractionation, high resolution Orbitrap mass spectrometry, and combined database searching using the iProphet software as part of the Trans-Proteomics Pipeline. All raw and processed data are made accessible in the S. pombe PeptideAtlas. The identified proteins showed no biases in functional properties and allowed global estimation of protein abundances. The high coverage of the PeptideAtlas allowed correlation with transcriptomic data in a system-wide manner indicating that post-transcriptional processes control the levels of at least half of all identified proteins. Interestingly, the correlation was not equally tight for all functional categories ranging from r(s) >0.80 for proteins involved in translation to r(s) <0.45 for signal transduction proteins. Moreover, many proteins involved in DNA damage repair could not be detected in the PeptideAtlas despite their high mRNA levels, strengthening the translation-on-demand hypothesis for members of this protein class. In summary, the extensive and publicly available S. pombe PeptideAtlas together with the generated proteotypic peptide spectral library will be a useful resource for future targeted, in-depth, and quantitative proteomic studies on this microorganism

    Proteomic analysis of heart failure hospitalization among patients with chronic kidney disease: The Heart and Soul Study.

    Get PDF
    BACKGROUND:Patients with chronic kidney disease (CKD) are at increased risk for heart failure (HF). We aimed to investigate differences in proteins associated with HF hospitalizations among patients with and without CKD in the Heart and Soul Study. METHODS AND RESULTS:We measured 1068 unique plasma proteins from baseline samples of 974 participants in The Heart and Soul Study who were followed for HF hospitalization over a median of 7 years. We sequentially applied forest regression and Cox survival analyses to select prognostic proteins. Among participants with CKD, four proteins were associated with HF at Bonferroni-level significance (p&lt;2.5x10(-4)): Angiopoietin-2 (HR[95%CI] 1.45[1.33, 1.59]), Spondin-1 (HR[95%CI] 1.13 [1.06, 1.20]), tartrate-resistant acid phosphatase type 5 (HR[95%CI] 0.65[0.53, 0.78]) and neurogenis locus notch homolog protein 1 (NOTCH1) (HR[95%CI] 0.67[0.55, 0.80]). These associations persisted at p&lt;0.01 after adjustment for age, estimated glomerular filtration and history of HF. CKD was a significant interaction term in the associations of NOTCH1 and Spondin-1 with HF. Pathway analysis showed a trend for higher representation of the Cardiac Hypertrophy and Complement/Coagulation pathways among proteins prognostic of HF in the CKD sub-group. CONCLUSIONS:These results suggest that markers of heart failure differ between patients with and without CKD. Further research is needed to validate novel markers in cohorts of patients with CKD and adjudicated HF events

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    From access and integration to mining of secure genomic data sets across the grid

    Get PDF
    The UK Department of Trade and Industry (DTI) funded BRIDGES project (Biomedical Research Informatics Delivered by Grid Enabled Services) has developed a Grid infrastructure to support cardiovascular research. This includes the provision of a compute Grid and a data Grid infrastructure with security at its heart. In this paper we focus on the BRIDGES data Grid. A primary aim of the BRIDGES data Grid is to help control the complexity in access to and integration of a myriad of genomic data sets through simple Grid based tools. We outline these tools, how they are delivered to the end user scientists. We also describe how these tools are to be extended in the BBSRC funded Grid Enabled Microarray Expression Profile Search (GEMEPS) to support a richer vocabulary of search capabilities to support mining of microarray data sets. As with BRIDGES, fine grain Grid security underpins GEMEPS

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Charting the protein complexome in yeast by mass spectrometry

    Get PDF
    It has become evident over the past few years that many complex cellular processes, including control of the cell cycle and ubiquitin-dependent proteolysis, are carried out by sophisticated multisubunit protein machines that are dynamic in abundance, post-translational modification state, and composition. To understand better the nature of the macromolecular assemblages that carry out the cell cycle and ubiquitin-dependent proteolysis, we have used mass spectrometry extensively over the past few years to characterize both the composition of various protein complexes and the modification states of their subunits. In this article we review some of our recent efforts, and describe a promising new approach for using mass spectrometry to dissect protein interaction networks
    corecore