2,169 research outputs found

    Exploring Information Technologies to Support Shotgun Proteomics

    Get PDF
    Shotgun proteomics refers to the direct analysis of complex protein mixtures to create a profile of the proteins present in the cell. These profiles can be used to study the underlying biological basis for cancer development. Closely studying the profiles as the cancer proliferates reveals the molecular interactions in the cell. They provide clues to researchers on potential drug targets to treat the disease. A little more than a decade old, shotgun proteomics is a relatively new form of discovery, one that is data intensive and requires complex data analysis. Early studies indicated a gap between the ability to analyze biological samples with a mass spectrometer and the information systems available to process and analyze this data. This thesis reflects on an automated proteomic information system at the University of Colorado Central Analytical Facility. Investigators there are using cutting edge proteomic techniques to analyze melanoma cell lines responsible for skin cancer in patients. The paper will provide insight on key design processes in the development of an Oracle relational database and automation system to support high-throughput shotgun proteomics in the facility. It will also discuss significant contributions, technologies, software, a data standard, and leaders in the field developing solutions and products in proteomics

    The Proteomics Identifications database: 2010 update

    Get PDF
    The Proteomics Identifications database (PRIDE, http://www.ebi.ac.uk/pride) at the European Bioinformatics Institute has become one of the main repositories of mass spectrometry-derived proteomics data. For the last 2 years, PRIDE data holdings have grown substantially, comprising 60 different species, more than 2.5 million protein identifications, 11.5 million peptides and over 50 million spectra by September 2009. We here describe several new and improved features in PRIDE, including the revised submission process, which now includes direct submission of fragment ion annotations. Correspondingly, it is now possible to visualize spectrum fragmentation annotations on tandem mass spectra, a key feature for compliance with journal data submission requirements. We also describe recent developments in the PRIDE BioMart interface, which now allows integrative queries that can join PRIDE data to a growing number of biological resources such as Reactome, Ensembl, InterPro and UniProt. This ability to perform extremely powerful across-domain queries will certainly be a cornerstone of future bioinformatics analyses. Finally, we highlight the importance of data sharing in the proteomics field, and the corresponding integration of PRIDE with other databases in the ProteomExchange consortium.European Union (ProDaC grant LSHG-CT-2006-036814)Burroughs Wellcome Fund (Grant WT085949MA

    A guide to the Proteomics Identifications Database proteomics data repository

    Get PDF
    The Proteomics Identifications Database (PRIDE, http://www.ebi.ac.uk/pride) is one of the main repositories of MS derived proteomics data. Here, we point out the main functionalities of PRIDE both as a submission repository and as a source for proteomics data. We describe the main features for data retrieval and visualization available through the PRIDE web and BioMart interfaces. We also highlight the mechanism by which tailored queries in the BioMart can join PRIDE to other resources such as Reactome, Ensembl or UniProt to execute extremely powerful across-domain queries. We then present the latest improvements in the PRIDE submission process, using the new easy-to-use, platform-independent graphical user interface submission tool PRIDE Converter. Finally, we speak about future plans and the role of PRIDE in the ProteomExchange consortium

    Computational methods and tools for protein phosphorylation analysis

    Get PDF
    Signaling pathways represent a central regulatory mechanism of biological systems where a key event in their correct functioning is the reversible phosphorylation of proteins. Protein phosphorylation affects at least one-third of all proteins and is the most widely studied posttranslational modification. Phosphorylation analysis is still perceived, in general, as difficult or cumbersome and not readily attempted by many, despite the high value of such information. Specifically, determining the exact location of a phosphorylation site is currently considered a major hurdle, thus reliable approaches are necessary for the detection and localization of protein phosphorylation. The goal of this PhD thesis was to develop computation methods and tools for mass spectrometry-based protein phosphorylation analysis, particularly validation of phosphorylation sites. In the first two studies, we developed methods for improved identification of phosphorylation sites in MALDI-MS. In the first study it was achieved through the automatic combination of spectra from multiple matrices, while in the second study, an optimized protocol for sample loading and washing conditions was suggested. In the third study, we proposed and evaluated the hypothesis that in ESI-MS, tandem CID and HCD spectra of phosphopeptides can be accurately predicted and used in spectral library searching. This novel strategy for phosphosite validation and identification offered accuracy that outperformed the other currently existing popular methods and proved applicable to complex biological samples. And finally, we significantly improved the performance of our command-line prototype tool, added graphical user interface, and options for customizable simulation parameters and filtering of selected spectra, peptides or proteins. The new software, SimPhospho, is open-source and can be easily integrated in a phosphoproteomics data analysis workflow. Together, these bioinformatics methods and tools enable confident phosphosite assignment and improve reliable phosphoproteome identification and reportin

    Web Server for Protein Interaction Searching

    Get PDF
    Tato práce se zabývá zbůsoby, jimiž je možné získávat data z bioinformatických databází obsahujících data týkajících se interakcí mezi proteiny. Od souvislostí okolo vzniku bioinformatiky sloučením informatiky a biologie tato práce uvede čtenáře do problematiky přístupu k datům týkajících se interakcí mezi proteiny. Tato práce vysvětlí důvody vzniku IMEx konsorcia, jeho cíle a prostředky, kterými svých cílů dosahuje. IMEx konsorcium dalo vzniknout mnoha standardům, které usnadňují přístup k datům členů konsorcia a výměnu těchto dat mezi nimi. Jedním z výtvorů IMEx konsorcia je i webová služba PSICQUIC, která byla navržena s využitím architektonického stylu REST, a která je přístupná i pomocí protokolu SOAP. Obě tyto kategorie přístupů k webových službám jsou v rámci této práce studovány a na základě výsledků výzkumu je implementována aplikace pro získávání interakcí mezi proteiny z databází, jenž jsou členy IMEx konsorcia.This thesis deals with different possibilities, how to collect data from bioinforatics databases containing protein interaction data. Reader is put into context by introducing him problematics of emergence of bioinformatics by connecting two fields of human knowledge: biology and informatics. Then the reader will get acquainted with the importance of protein interactions and possible ways of retrieving protein interaction data from protein interaction databases. This thesis also elucidates the motivation for IMEx consortium existence. IMEx faciliattes access to data and data exchange between its members by issuing new standards and data formats. I a list of IMEx consortium sucecsses is also PSICQUIC web service. PSICQUIC is REST-compliant web service, which can be also accessed via SOAP protocol. Both REST and SOAP approaches are studied and compared in this thesis and on the basis of this research is implemented application for retreiving protein interaction data from PSICQUIC members' databases.

    QALM - a tool for automating quantitative analysis of LC-MS-MS/MS data

    Get PDF
    The goal of bioinformatics is to support science and research in the field of biology through the application of information technology. Proteomics is a field within biology that deals with the study of proteins. This paper describes QALM, an application developed to automate and simplify a specific type of proteomics analysis. QALM is first and foremost a proof of concept through which certain options for implementing such automation have been explored. Although a functional and usable application has been created, this should primarily be considered a stepping stone for similar applications in the future. Currently QALM is a desktop tool for importing and exporting data, inte- grating and communicating with external systems for the analysis of such data, and finally generating reports to present the results. It currently runs only un- der the Linux operating system, but it should be possible to change this fairly easily.Master i InformatikkMAMN-INFINF39

    Francisella tularensis novicida proteomic and transcriptomic data integration and annotation based on semantic web technologies

    Get PDF
    This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, using multiple technologies, perform experiments to understand the mechanism of virulence. It is hard to integrate these data sources in a flexible manner that allows new experimental data to be added and compared when required
    corecore