14,132 research outputs found
ProteoClade: A taxonomic toolkit for multi-species and metaproteomic analysis
We present ProteoClade, a Python toolkit that performs taxa-specific peptide assignment, protein inference, and quantitation for multi-species proteomics experiments. ProteoClade scales to hundreds of millions of protein sequences, requires minimal computational resources, and is open source, multi-platform, and accessible to non-programmers. We demonstrate its utility for processing quantitative proteomic data derived from patient-derived xenografts and its speed and scalability enable a novel de novo proteomic workflow for complex microbiota samples
ISPIDER Central: an integrated database web-server for proteomics
Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics
A network approach for managing and processing big cancer data in clouds
Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data
Proteomic Analysis of Chloroplast-to-Chromoplast Transition in Tomato Reveals Metabolic Shifts Coupled with Disrupted Thylakoid Biogenesis Machinery and Elevated Energy-Production Components
A comparative proteomic approach was performed to identify differentially expressed proteins in plastids at three stages of tomato(Solanum lycopersicum) fruit ripening (mature-green, breaker, red). Stringent curation and processing of the data from three independent replicates identified 1,932 proteins among which 1,529 were quantified by spectral counting. The quantification procedures have been subsequently validated by immunoblot analysis of six proteins representative of distinct metabolic or regulatory pathways. Among the main features of the chloroplast-to-chromoplast transition revealed by the study, chromoplastogenesis appears to be associated with major metabolic shifts: (1) strong decrease in abundance of proteins of light reactions (photosynthesis, Calvin cycle, photorespiration)and carbohydrate metabolism (starch synthesis/degradation), mostly between breaker and red stages and (2) increase in terpenoid biosynthesis (including carotenoids) and stress-response proteins (ascorbate-glutathione cycle, abiotic stress, redox, heat shock). These metabolic shifts are preceded by the accumulation of plastid-encoded acetyl Coenzyme A carboxylase D proteins accounting for the generation of a storage matrix that will accumulate carotenoids. Of particular note is the high abundance of proteins involved in providing energy and in metabolites import. Structural differentiation of the chromoplast is characterized by a sharp and continuous decrease of thylakoid proteins whereas envelope and stroma proteins remain remarkably stable. This is coincident with the disruption of the machinery for thylakoids and photosystem biogenesis (vesicular trafficking, provision of material for thylakoid biosynthesis, photosystems assembly) and the loss of the plastid division machinery. Altogether, the data provide new insights on the chromoplast differentiation process while enriching our knowledge of the plant plastid proteome
Automation on the generation of genome scale metabolic models
Background: Nowadays, the reconstruction of genome scale metabolic models is
a non-automatized and interactive process based on decision taking. This
lengthy process usually requires a full year of one person's work in order to
satisfactory collect, analyze and validate the list of all metabolic reactions
present in a specific organism. In order to write this list, one manually has
to go through a huge amount of genomic, metabolomic and physiological
information. Currently, there is no optimal algorithm that allows one to
automatically go through all this information and generate the models taking
into account probabilistic criteria of unicity and completeness that a
biologist would consider. Results: This work presents the automation of a
methodology for the reconstruction of genome scale metabolic models for any
organism. The methodology that follows is the automatized version of the steps
implemented manually for the reconstruction of the genome scale metabolic model
of a photosynthetic organism, {\it Synechocystis sp. PCC6803}. The steps for
the reconstruction are implemented in a computational platform (COPABI) that
generates the models from the probabilistic algorithms that have been
developed. Conclusions: For validation of the developed algorithm robustness,
the metabolic models of several organisms generated by the platform have been
studied together with published models that have been manually curated. Network
properties of the models like connectivity and average shortest mean path of
the different models have been compared and analyzed.Comment: 24 pages, 2 figures, 2 table
Establishment of a integrative multi-omics expression database CKDdb in the context of chronic kidney disease (CKD)
Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis
Neuroinformatics: From Bioinformatics to Databasing the Brain
Neuroinformatics seeks to create and maintain web-accessible databases of experimental and computational data, together with innovative software tools, essential for understanding the nervous system in its normal function and in neurological disorders. Neuroinformatics includes traditional bioinformatics of gene and protein sequences in the brain; atlases of brain anatomy and localization of genes and proteins; imaging of brain cells; brain imaging by positron emission tomography (PET), functional magnetic resonance imaging (fMRI), electroencephalography (EEG), magnetoencephalography (MEG) and other methods; many electrophysiological recording methods; and clinical neurological data, among others. Building neuroinformatics databases and tools presents difficult challenges because they span a wide range of spatial scales and types of data stored and analyzed. Traditional bioinformatics, by comparison, focuses primarily on genomic and proteomic data (which of course also presents difficult challenges). Much of bioinformatics analysis focus on sequences (DNA, RNA, and protein molecules), as the type of data that are stored, compared, and sometimes modeled. Bioinformatics is undergoing explosive growth with the addition, for example, of databases that catalog interactions between proteins, of databases that track the evolution of genes, and of systems biology databases which contain models of all aspects of organisms. This commentary briefly reviews neuroinformatics with clarification of its relationship to traditional and modern bioinformatics
- …