104 research outputs found

    In silico proteome analysis to facilitate proteomics experiments using mass spectrometry

    Get PDF
    Proteomics experiments typically involve protein or peptide separation steps coupled to the identification of many hundreds to thousands of peptides by mass spectrometry. Development of methodology and instrumentation in this field is proceeding rapidly, and effective software is needed to link the different stages of proteomic analysis. We have developed an application, proteogest, written in Perl that generates descriptive and statistical analyses of the biophysical properties of multiple (e.g. thousands) protein sequences submitted by the user, for instance protein sequences inferred from the complete genome sequence of a model organism. The application also carries out in silico proteolytic digestion of the submitted proteomes, or subsets thereof, and the distribution of biophysical properties of the resulting peptides is presented. proteogest is customizable, the user being able to select many options, for instance the cleavage pattern of the digestion treatment or the presence of modifications to specific amino acid residues. We show how proteogest can be used to compare the proteomes and digested proteome products of model organisms, to examine the added complexity generated by modification of residues, and to facilitate the design of proteomics experiments for optimal representation of component proteins

    Missing value imputation for epistatic MAPs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data.</p> <p>Results</p> <p>We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers.</p> <p>Conclusions</p> <p>We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.</p

    Distinct configurations of protein complexes and biochemical pathways revealed by epistatic interaction network motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks.</p> <p>Results</p> <p>We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter.</p> <p>Conclusion</p> <p>We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.</p

    msmsEval: tandem mass spectral quality assignment for high-throughput proteomics

    Get PDF
    BACKGROUND: In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable. RESULTS: We describe an application, msmsEval, that builds on previous work by statistically modeling the spectral quality discriminant function using a Gaussian mixture model. This allows a researcher to filter spectra based on the probability that a spectrum will ultimately be identified by database searching. We show that spectra that are predicted by msmsEval to be of high quality, yet remain unidentified in standard database searches, are candidates for more intensive search strategies. Using a well studied public dataset we also show that a high proportion (83.9%) of the spectra predicted by msmsEval to be of high quality but that elude standard search strategies, are in fact interpretable. CONCLUSION: msmsEval will be useful for high-throughput proteomics projects and is freely available for download from . Supports Windows, Mac OS X and Linux/Unix operating systems

    Invadolysin, a conserved lipid-droplet-associated metalloproteinase, is required for mitochondrial function in Drosophila

    Get PDF
    Mitochondria are the main producers of ATP, the principal energy source of the cell, and reactive oxygen species (ROS), important signaling molecules. Mitochondrial morphogenesis and function depend on a hierarchical network of mechanisms in which proteases appear to be center stage. The invadolysin gene encodes an essential conserved metalloproteinase of the M8 family that is necessary for mitosis and cell migration during Drosophila development. We previously demonstrated that invadolysin is found associated with lipid droplets in cells. Here, we present data demonstrating that invadolysin interacts physically with three mitochondrial ATP synthase subunits. Our studies have focused on the genetic phenotypes of invadolysin and bellwether, the Drosophila homolog of ATP synthase a, mutants. The invadolysin mutation presents defects in mitochondrial physiology similar to those observed in bellwether mutants. The invadolysin and bellwether mutants have parallel phenotypes that affect lipid storage and mitochondrial electron transport chain activity, which result in a reduction in ATP production and an accumulation of ROS. As a consequence, invadolysin mutant larvae show lower energetic status and higher oxidative stress. Our data demonstrate an essential role for invadolysin in mitochondrial function that is crucial for normal development and survival.peer-reviewe

    Proteomics Strategy for Identifying Candidate Bioactive Proteins in Complex Mixtures: Application to the Platelet Releasate

    Get PDF
    Proteomic approaches have proven powerful at identifying large numbers of proteins, but there are fewer reports of functional characterization of proteins in biological tissues. Here, we describe an experimental approach that fractionates proteins released from human platelets, linking bioassay activity to identity. We used consecutive orthogonal separation platforms to ensure sensitive detection: (a) ion-exchange of intact proteins, (b) SDS-PAGE separation of ion-exchange fractions and (c) HPLC separation of tryptic digests coupled to electrospray tandem mass spectrometry. Migration of THP-1 monocytes in response to complete or fractionated platelet releasate was assessed and located to just one of the forty-nine ion-exchange fractions. Over 300 proteins were identified in the releasate, with a wide range of annotated biophysical and biochemical properties, in particular platelet activation, adhesion, and wound healing. The presence of PEDF and involucrin, two proteins not previously reported in platelet releasate, was confirmed by western blotting. Proteins identified within the fraction with monocyte promigratory activity and not in other inactive fractions included vimentin, PEDF, and TIMP-1. We conclude that this analytical platform is effective for the characterization of complex bioactive samples

    Improved functional overview of protein complexes using inferred epistatic relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epistatic Miniarray Profiling(E-MAP) quantifies the net effect on growth rate of disrupting pairs of genes, often producing phenotypes that may be more (negative epistasis) or less (positive epistasis) severe than the phenotype predicted based on single gene disruptions. Epistatic interactions are important for understanding cell biology because they define relationships between individual genes, and between sets of genes involved in biochemical pathways and protein complexes. Each E-MAP screen quantifies the interactions between a logically selected subset of genes (e.g. genes whose products share a common function). Interactions that occur between genes involved in different cellular processes are not as frequently measured, yet these interactions are important for providing an overview of cellular organization.</p> <p>Results</p> <p>We introduce a method for combining overlapping E-MAP screens and inferring new interactions between them. We use this method to infer with high confidence 2,240 new strongly epistatic interactions and 34,469 weakly epistatic or neutral interactions. We show that accuracy of the predicted interactions approaches that of replicate experiments and that, like measured interactions, they are enriched for features such as shared biochemical pathways and knockout phenotypes. We constructed an expanded epistasis map for yeast cell protein complexes and show that our new interactions increase the evidence for previously proposed inter-complex connections, and predict many new links. We validated a number of these in the laboratory, including new interactions linking the SWR-C chromatin modifying complex and the nuclear transport apparatus.</p> <p>Conclusion</p> <p>Overall, our data support a modular model of yeast cell protein network organization and show how prediction methods can considerably extend the information that can be extracted from overlapping E-MAP screens.</p
    corecore