31 research outputs found
The model of proteolysis
This document presents the original approach for estimating parameters of proteolysis process. Data used to fit the model are taken from mass
spectrometric experiments. For parameters estimation the Levenberg-Marquadt algorithm is used. The motivation for model is a hypothesis
that discrimination between cancer patients and healthy donors can be based on activity of peptide cleaving enzymes (i.e. peptidases)
Metody obliczeniowe dla wielkoskalowych danych w diagnostyce medycznej
This thesis covers a topic of fast and reliable processing of the high-throughput biomedical data, that is currently needed in genetics and proteomics. We therefore concentrate on these two rapidly developing research areas in life sciences.
First, we perform a systematic analyses of human reference genome build in the context of its potential local instability caused by recurrent genomic rearrangements, e.g. deletions, duplications, and inversions. Our approach enables also to analyze large and unique clinical database.
Secondly, we present various analyses of mass spectrometry data. In particular, we propose isotopic distribution at many levels of accuracy; more precisely we consider aggregated and fine isotopic structures. We also show some case application studies involving high-throughput processing, potentially applicable in proteomics and lipidomics.
Of note, this thesis is also an exemplification of interdisciplinary approach for basic science, where a deeper and complex understanding of both biomedical and computational aspects can be mutually beneficial.Niniejsza rozprawa opisuje efektywne metody przetwarzania wielkoskalowych danych w biologii molekularnej, co jest szczególnie istotne w genetyce i proteomice. Właśnie te dwie dynamicznie rozwijające się gałęzie nauk o życiu stanowią obszar naszych zainteresowań.
Na początku przeprowadzamy systematyczną analizę referencyjnego genomu człowieka. Nasze badania dotyczą jego potencjalnej lokalnej niestabilności spowodowanej przez nawracające rearanżacje, takie jak delecje, duplikacje oraz inwersje. Przedstawione podejście pozwala również, w przypadku delecji i duplikacji, przeanalizować dużą i unikalną bazę danych przypadków klinicznych.
W drugiej części rozprawy prezentujemy modele wykorzystywane w analizie danych spektrometrycznych. W szczególności zajmujemy się wpływem wariantów izotopowych na wyniki uzyskiwane w eksperymentach. Nasze badania prowadzimy wykorzystując różne stopnie dokładności przy reprezentowaniu rozkładów izotopowych -- podejście zagregowane oraz dokładne. Ponadto przedstawiamy przykłady analizy wieloskalowych danych w proteomice.
Pragniemy podkreślić, że niniejsza rozprawa prezentuje interdyscyplinarne podejście do badań podstawowych. Ponadto, nasze badania są przykładem kompleksowego wykorzystania w nauce o życiu metod obliczeniowych popartych teorią nauk matematycznych
MIND: A Double-Linear Model To Accurately Determine Monoisotopic Precursor Mass in High-Resolution Top-Down Proteomics
Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the “true” (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind
Inferring serum proteolytic activity from LC-MS/MS data
<p>Abstract</p> <p>Background</p> <p>In this paper we deal with modeling serum proteolysis process from tandem mass spectrometry data. The parameters of peptide degradation process inferred from LC-MS/MS data correspond directly to the activity of specific enzymes present in the serum samples of patients and healthy donors. Our approach integrate the existing knowledge about peptidases' activity stored in MEROPS database with the efficient procedure for estimation the model parameters.</p> <p>Results</p> <p>Taking into account the inherent stochasticity of the process, the proteolytic activity is modeled with the use of Chemical Master Equation (CME). Assuming the stationarity of the Markov process we calculate the expected values of digested peptides in the model. The parameters are fitted to minimize the discrepancy between those expected values and the peptide activities observed in the MS data. Constrained optimization problem is solved by Levenberg-Marquadt algorithm.</p> <p>Conclusions</p> <p>Our results demonstrates the feasibility and potential of high-level analysis for LC-MS proteomic data. The estimated enzyme activities give insights into the molecular pathology of colorectal cancer. Moreover the developed framework is general and can be applied to study proteolytic activity in different systems.</p
Computational planning of the synthesis of complex natural products
Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years(1-7). However, the field has progressed greatly since the development of early programs such as LHASA(1,7), for which reaction choices at each step were made by human operators. Multiple software platforms(6,8-14) are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary(15,16) and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships(17,18), allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization. A synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts
BRAIN 2.0 : time and memory complexity improvements in the algorithm for calculating the isotope distribution
Recently, an elegant iterative algorithm called BRAIN (Baffling Recursive Algorithm for Isotopic distributioN calculations) was presented. The algorithm is based on the classic polynomial method for calculating aggregated isotope distributions, and it introduces algebraic identities using Newton-Girard and Viète’s formulae to solve the problem of polynomial expansion. Due to the iterative nature of the BRAIN method, it is a requirement that the calculations start from the lightest isotope variant. As such, the complexity of BRAIN scales quadratically with the mass of the putative molecule, since it depends on the number of aggregated peaks that need to be calculated. In this manuscript, we suggest two improvements of the algorithm to decrease both time and memory complexity in obtaining the aggregated isotope distribution. We also illustrate a concept to represent the element isotope distribution in a generic manner. This representation allows for omitting the root calculation of the element polynomial required in the original BRAIN method. A generic formulation for the roots is of special interest for higher order element polynomials such that root finding algorithms and its inaccuracies can be avoided. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s13361-013-0796-5) contains supplementary material, which is available to authorized users
Navigating around Patented Routes by Preserving Specific Motifs along Computer-Planned Retrosynthetic Pathways
By keeping track of lists of specific bonds one wishes to preserve, a computer program is able to identify the key disconnections used in the patented syntheses and design synthetic routes that circumvent these approaches. Here, we provide examples of computer-designed syntheses relevant to medicinal chemistry, in which the machine avoids "strategic" disconnections common to industrial patents and is forced to use different starting materials. The ability of modern retrosynthetic planners to navigate around patented solutions may have significant implications for the ways in which intellectual property related to multistep syntheses is protected and/or challenged