2,796 research outputs found

    How to find simple and accurate rules for viral protease cleavage specificities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.</p> <p>Results</p> <p>A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.</p> <p>Conclusion</p> <p>A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.</p

    A Method for Refining Knowledge Rules Using Exceptions

    Get PDF
    The search for patterns in data sets is a fundamental task in Data Mining, where Machine Learning algorithms are generally used. However, Machine Learning algorithms have biases that strengthen the classifica-tion task, not taking into consideration exceptions. Exceptions contra-dict common sense rules. They are generally unknown, unexpected and contradictory to the user believes. For this reason, exceptions may be interesting. In this work we propose a method to find exceptions out from common sense rules. Besides, we apply the proposed method in a real world data set, to discover rules and exceptions in the HIV virus protein cleavage process.Sociedad Argentina de Informática e Investigación Operativ

    Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides

    Get PDF
    Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical information. Specifically, we develop three different physiochemical representations for a set of roughly 700 HIV–I protease substrates. These different representations are used as input to an array of six different machine learning models which are used to predict whether or not a given peptide is likely to be an acceptable substrate for the protease. Our results show that, in general, higher–dimensional physiochemical representations tend to have better performance than representations incorporating fewer dimensions selected on the basis of high information content. We contend that such representations are more biologically relevant than simple string–based representations and are likely to more accurately capture peptide characteristics that are functionally important.Singapore-MIT Alliance (SMA

    A genetic approach for building different alphabets for peptide and protein classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In this paper, it is proposed an optimization approach for producing reduced alphabets for peptide classification, using a Genetic Algorithm. The classification task is performed by a multi-classifier system where each classifier (Linear or Radial Basis function Support Vector Machines) is trained using features extracted by different reduced alphabets. Each alphabet is constructed by a Genetic Algorithm whose objective function is the maximization of the area under the ROC-curve obtained in several classification problems.</p> <p>Results</p> <p>The new approach has been tested in three peptide classification problems: HIV-protease, recognition of T-cell epitopes and prediction of peptides that bind human leukocyte antigens. The tests demonstrate that the idea of training a pool classifiers by reduced alphabets, created using a Genetic Algorithm, allows an improvement over other state-of-the-art feature extraction methods.</p> <p>Conclusion</p> <p>The validity of the novel strategy for creating reduced alphabets is demonstrated by the performance improvement obtained by the proposed approach with respect to other reduced alphabets-based methods in the tested problems.</p

    Immuno-Competitive Capture Mass Spectrometry, a novel unbiased approach to study endogenous protein-protein interactions

    Get PDF
    Protein-protein interactions (PPIs) are controlling the majority of biological functions and are the main driver of cellular processes observed in normal as well as pathological conditions. Such a level of controlling is only possible via a high degree of complexity; i.e. a massive number of protein-protein interactions (in the range of couple hundreds of thousands), a variety of physical and structural properties and their reversibility. Moreover, binding affinities can span from micro-molar to high pico-molar level and some proteins are acting as “hubs” by having multiple partners. This sophisticated organization and regulation of PPIs explains why their study is so challenging. No single approaches can capture the full picture and there is an urgent need for innovative platforms to study and analyze PPIs. In this thesis, a novel platform named Immuno-Competitive Capture Mass Spectrometry (ICC-MS) was developed to screen in an unbiased fashion intracellular PPIs. ICC-MS was designed to reach higher specificity compared to classical affinity purification mass spectrometry by introducing a competition step between free and capturing antibody prior to immunoprecipitation. This antibody-based label-free quantitative approach was then combined with a rigorous statistical analysis to extract the cellular interactome of proteins of interest while filtering out non-specifically binding proteins. ICC-MS was first applied to elucidate hepatitis C viral non-structural protein 5A interactome in human hepatoma cells revealing LATS kinases as potential important regulators of viral infection. The study of Glypican-2 and HtrA1 interacting partners further confirmed the ability of ICC-MS to deliver a limited number of highly confident interacting proteins being promising candidates for functional validation. Interestingly, ICC-MS can also be adapted to study interactions formed between proteins and oligonucleotides (Oligo-Competitive Capture Mass Spectrometry or OCC-MS). While it contributed to a better understanding of the mode of action of an SMN2 splicing modifier, the approach could not elucidate the role of protein interactions in antisense oligonucleotides toxicity. Taken together, this innovative approach is suitable to improve the comprehensiveness and accuracy of current protein-protein interactions databases in term of true biological interactome representation

    Post-translational generation of constitutively active cores from larger phosphatases in the malaria parasite, Plasmodium falciparum: implications for proteomics

    Get PDF
    BACKGROUND: Although the complete genome sequences of a large number of organisms have been determined, the exact proteomes need to be characterized. More specifically, the extent to which post-translational processes such as proteolysis affect the synthesized proteins has remained unappreciated. We examined this issue in selected protein phosphatases of the protease-rich malaria parasite, Plasmodium falciparum. RESULTS: P. falciparum encodes a number of Ser/Thr protein phosphatases (PP) whose catalytic subunits are composed of a catalytic core and accessory domains essential for regulation of the catalytic activity. Two examples of such regulatory domains are found in the Ca(+2)-regulated phosphatases, PP7 and PP2B (calcineurin). The EF-hand domains of PP7 and the calmodulin-binding domain of PP2B are essential for stimulation of the phosphatase activity by Ca(+2). We present biochemical evidence that P. falciparum generates these full-length phosphatases as well as their catalytic cores, most likely as intermediates of a proteolytic degradation pathway. While the full-length phosphatases are activated by Ca(+2), the processed cores are constitutively active and either less responsive or unresponsive to Ca(+2). The processing is extremely rapid, specific, and occurs in vivo. CONCLUSIONS: Post-translational cleavage efficiently degrades complex full-length phosphatases in P. falciparum. In the course of such degradation, enzymatically active catalytic cores are produced as relatively stable intermediates. The universality of such proteolysis in other phosphatases or other multi-domain proteins and its potential impact on the overall proteome of a cell merits further investigation

    HIV Drug Resistant Prediction and Featured Mutants Selection using Machine Learning Approaches

    Get PDF
    HIV/AIDS is widely spread and ranks as the sixth biggest killer all over the world. Moreover, due to the rapid replication rate and the lack of proofreading mechanism of HIV virus, drug resistance is commonly found and is one of the reasons causing the failure of the treatment. Even though the drug resistance tests are provided to the patients and help choose more efficient drugs, such experiments may take up to two weeks to finish and are expensive. Because of the fast development of the computer, drug resistance prediction using machine learning is feasible. In order to accurately predict the HIV drug resistance, two main tasks need to be solved: how to encode the protein structure, extracting the more useful information and feeding it into the machine learning tools; and which kinds of machine learning tools to choose. In our research, we first proposed a new protein encoding algorithm, which could convert various sizes of proteins into a fixed size vector. This algorithm enables feeding the protein structure information to most state of the art machine learning algorithms. In the next step, we also proposed a new classification algorithm based on sparse representation. Following that, mean shift and quantile regression were included to help extract the feature information from the data. Our results show that encoding protein structure using our newly proposed method is very efficient, and has consistently higher accuracy regardless of type of machine learning tools. Furthermore, our new classification algorithm based on sparse representation is the first application of sparse representation performed on biological data, and the result is comparable to other state of the art classification algorithms, for example ANN, SVM and multiple regression. Following that, the mean shift and quantile regression provided us with the potentially most important drug resistant mutants, and such results might help biologists/chemists to determine which mutants are the most representative candidates for further research

    Expanding the Boolean logic of the prokaryotic transcription factor XylR by functionalization of permissive sites with a protease-target sequence

    Get PDF
    The σ54-dependent prokaryotic regulator XylR implements a one-input/one-output actuator that transduces the presence of the aromatic effector m-xylene into transcriptional activation of the cognate promoter Pu. Such a signal conversion involves the effector-mediated release of the intramolecular repression of the N-terminal A domain on the central C module of XylR. On this background, we set out to endow this regulator with additional signal-sensing capabilities by inserting a target site of the viral protease NIa in permissive protein locations that once cleaved in vivo could either terminate XylR activity or generate an effector-independent, constitutive transcription factor. To find optimal protein positions to this end, we saturated the xylR gene DNA with a synthetic transposable element designed for randomly delivering in-frame polypeptides throughout the sequence of any given protein. This Tn5-based system supplies the target gene with insertions of a selectable marker that can later be excised, leaving behind the desired (poly) peptides grafted into the protein structure. Implementation of such knock-in-leave-behind (KILB) method to XylR was instrumental to produce a number of variants of this transcription factor (TF) that could compute in vivo two inputs (m-xylene and protease) into a single output following a logic that was dependent on the site of the insertion of the NIa target sequence in the TF. Such NIa-sensitive XylR specimens afforded the design of novel regulatory nodes that entered protease expression as one of the signals recognized in vivo for controlling Pu. This approach is bound to facilitate the functionalization of TFs and other proteins with new traits, especially when their forward engineering is made difficult by, for example, the absence of structural data.This study was supported by the BIO and FEDER CONSOLIDER-INGENIO Program of the Spanish Ministry of Science and Innovation, the MICROME, ST-FLOW and ARYSIS Contracts of the EU, and the PROMT Project of the CAM.Peer reviewe

    Profiling COVID-19 Genetic Research: A Data-Driven Study Utilizing Intelligent Bibliometrics

    Full text link
    The COVID-19 pandemic constitutes an ongoing worldwide threat to human society and has caused massive impacts on global public health, the economy and the political landscape. The key to gaining control of the disease lies in understanding the genetics of SARS-CoV-2 and the disease spectrum that follows infection. This study leverages traditional and intelligent bibliometric methods to conduct a multi-dimensional analysis on 5,632 COVID-19 genetic research papers, revealing that 1) the key players include research institutions from the United States, China, Britain and Canada; 2) research topics predominantly focus on virus infection mechanisms, virus testing, gene expression related to the immune reactions and patient clinical manifestation; 3) studies originated from the comparison of SARS-CoV-2 to previous human coronaviruses, following which research directions diverge into the analysis of virus molecular structure and genetics, the human immune response, vaccine development and gene expression related to immune responses; and 4) genes that are frequently highlighted include ACE2, IL6, TMPRSS2, and TNF. Emerging genes to the COVID-19 consist of FURIN, CXCL10, OAS1, OAS2, OAS3, and ISG15. This study demonstrates that our suite of novel bibliometric tools could help biomedical researchers follow this rapidly growing field and provide substantial evidence for policymakers' decision-making on science policy and public health administration
    corecore