19 research outputs found

    Exploring the limits of the geometric copolymerization model

    Get PDF
    The geometric copolymerization model is a recently introduced statistical Markov chain model. Here, we investigate its practicality. First, several approaches to identify the optimal model parameters from observed copolymer fingerprints are evaluated using Monte Carlo simulated data. Directly optimizing the parameters is robust against noise but has impractically long running times. A compromise between robustness and running time is found by exploiting the relationship between monomer concentrations calculated by ordinary differential equations and the geometric model. Second, we investigate the applicability of the model to copolymerizations beyond living polymerization and show that the model is useful for copolymerizations involving termination and depropagation reactions

    SAMPI: Protein Identification with Mass Spectra Alignments

    Get PDF
    BACKGROUND: Mass spectrometry based peptide mass fingerprints (PMFs) offer a fast, efficient, and robust method for protein identification. A protein is digested (usually by trypsin) and its mass spectrum is compared to simulated spectra for protein sequences in a database. However, existing tools for analyzing PMFs often suffer from missing or heuristic analysis of the significance of search results and insufficient handling of missing and additional peaks. RESULTS: We present an unified framework for analyzing Peptide Mass Fingerprints that offers a number of advantages over existing methods: First, comparison of mass spectra is based on a scoring function that can be custom-designed for certain applications and explicitly takes missing and additional peaks into account. The method is able to simulate almost every additive scoring scheme. Second, we present an efficient deterministic method for assessing the significance of a protein hit, independent of the underlying scoring function and sequence database. We prove the applicability of our approach using biological mass spectrometry data and compare our results to the standard software Mascot. CONCLUSION: The proposed framework for analyzing Peptide Mass Fingerprints shows performance comparable to Mascot on small peak lists. Introducing more noise peaks, we are able to keep identification rates at a similar level by using the flexibility introduced by scoring schemes

    Peak intensity prediction in MALDI-TOF mass spectrometry: A machine learning study to support quantitative proteomics

    Get PDF
    Timm W, Scherbart A, Boecker S, Kohlbacher O, Nattkemper TW. Peak intensity prediction in MALDI-TOF mass spectrometry: A machine learning study to support quantitative proteomics. BMC Bioinformatics. 2008;9(1):443.Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e. g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification. Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation). Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics

    Integrative analysis of multimodal mass spectrometry data in MZmine 3

    Get PDF
    3 Pág.We thank Christopher Jensen and Gauthier Boaglio for their contributions to the MZmine codebase. We thank Jianbo Zhang and Zachary Russ for their donations to MZmine development. The MZmine 3 logo was designed by the Bioinformatics & Research Computing group at the Whitehead Institute for Biomedical Research. T.P. is supported by Czech Science Foundation (GA CR) grant 21-11563M and by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement 891397. Support for P.C.D. was from US NIH U19 AG063744, P50HD106463, 1U24DK133658 and BBSRC-NSF award 2152526. T.S. acknowledges funding by Deutsche Forschungsgemeinschaft (441958208). M. Wang acknowledges the US Department of Energy Joint Genome Institute ( https://ror.org/04xm1d337 , a DOE Office of Science User Facility) and is supported by the Office of Science of the US Department of Energy operated under subcontract No. 7601660. E.R. and H.H. thank Wen Jiang (HILICON AB) for providing the iHILIC Fusion(+) column for HILIC measurements. M.F., K.D. and S.B. are supported by Deutsche Forschungsgemeinschaft (BO 1910/20). L.-F.N. is supported by the Swiss National Science Foundation (project 189921). D.P. was supported through the Deutsche Forschungsgemeinschaft (German Research Foundation) through the CMFI Cluster of Excellence (EXC-2124 — 390838134 project-ID 1-03.006_0) and the Collaborative Research Center CellMap (TRR 261 - 398967434). J.-K.W. acknowledges the US National Science Foundation (MCB-1818132), the US Department of Agriculture, and the Chan Zuckerberg Initiative. MZmine developers have received support from the European COST Action CA19105 — Pan-European Network in Lipidomics and EpiLipidomics (EpiLipidNET). We acknowledge the support of the Google Summer of Code (GSoC) program, which has funded the development of several MZmine modules through student projects. We thank Adam Tenderholt for introducing MZmine to the GSoC program.Peer reviewe

    New Statistical Models for Copolymerization

    No full text
    For many years, copolymerization has been studied using mathematical and statistical models. Here, we present new Markov chain models for copolymerization kinetics: the Bernoulli and Geometric models. They model copolymer synthesis as a random process and are based on a basic reaction scheme. In contrast to previous Markov chain approaches to copolymerization, both models take variable chain lengths and time-dependent monomer probabilities into account and allow for computing sequence likelihoods and copolymer fingerprints. Fingerprints can be computed from copolymer mass spectra, potentially allowing us to estimate the model parameters from measured fingerprints. We compare both models against Monte Carlo simulations. We find that computing the models is fast and memory efficient

    Influence of age and level of activity on the applicability of a walker orthosis - a prospective study in different cohorts of healthy volunteers

    Get PDF
    Abstract Background Walker orthosis are frequently prescribed as they are removable to allow wound control, body care and physiotherapy and are adaptable to the soft tissue conditions. The prerequisite for successful treatment with any walker orthosis is a correct use by the patient. Therefore, the aim of this study was to investigate patients’ handling of a commonly used walker. Methods Prospective observational study analyzing the applicability of a walker orthosis in different cohorts with varying age and level of activity. Volunteers were recruited from a mountain-biking-team (Sport), a cardiovascular-health-sports-group (Cardio) and a retirement home (Senior). The correct application was assessed following initial training (t0) and one week later (t1). Outcome parameters were an Application Score, strap tightness, vertical heel lift-off and subjective judgement of correct application. Results Thirty-three volunteers, 11 Sports group (31 ± 7a), 12 Cardio group (59 ± 11a), 10 Senior group, (82 ± 5a) were enrolled. No differences for any parameter could be observed between t0 and t1. Age showed a moderate correlation for all outcome parameters and the cohort influenced all variables. The Senior group presented significant inferior results to the Sport- and Cardio group for the Application Score (p = 0.002-p < 0.001) and strap tightness (p < 0.001). Heel lift-off was significantly inferior in the Cardio- and Senior- compared to the Sport group (p = 0.003-p < 0.001). 14% in the Sport-, 4% in the Cardio- and 83% in the Senior group achieved less than 9 points in the Application Score – which was considered insufficient. However, out of these 90% believed the application to be correct. Conclusions The elderly cohort living in a retirement home demonstrated an impaired handling of the walker orthosis. Further, participants were incapable to self-assess the correct handling. These aspects should be respected when initiating treatment with a walker orthosis. Trial registration Retrospectively registered on the 16th of February 2018: #DRKS00013728 on DRKS

    Statistics for approximate gene clusters

    Get PDF
    Jahn K, Winter S, Stoye J, Böcker S. Statistics for approximate gene clusters. BMC Bioinformatics. 2013;14(Suppl 15: Proc. of RECOMB-CG 2013): S14.Background Genes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly sensitive over the past decade. The most powerful approaches allow for various types of imperfect cluster conservation: Cluster locations may be internally rearranged. The individual cluster locations may contain only a subset of the cluster genes and may be disrupted by uninvolved genes. Moreover cluster locations may not at all occur in some or even most of the studied genomes. The detection of such low quality clusters increases the risk of mistaking faint patterns that occur merely by chance for genuine findings. Therefore, it is crucial to estimate the significance of computational gene cluster predictions and discriminate between true conservation and coincidental clustering. Results In this paper, we present an efficient and accurate approach to estimate the significance of gene cluster predictions under the approximate common intervals model. Given a single gene cluster prediction, we calculate the probability to observe it with the same or a higher degree of conservation under the null hypothesis of random gene order, and add a correction factor to account for multiple testing. Our approach considers all parameters that define the quality of gene cluster conservation: the number of genomes in which the cluster occurs, the number of involved genes, the degree of conservation in the different genomes, as well as the frequency of the clustered genes within each genome. We apply our approach to evaluate gene cluster predictions in a large set of well annotated genomes
    corecore