6,774 research outputs found

    機械学習モデルからの知識抽出と生命情報学への応用

    Get PDF
    京都大学新制・課程博士博士(情報学)甲第23397号情博第766号新制||情||131(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授 阿久津 達也, 教授 山本 章博, 教授 鹿島 久嗣学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites

    Get PDF
    We have developed a new method for identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. The method performs significantly better than previous prediction schemes, and can easily be applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, thoughwith lower precision. Predictions can be made on a publicly available WWW server. Present address: Novo Nordisk A/S, Scientific Computing, Building 9M1, Novo Alle, DK-2880 Bagsværd, Denmark Introduction Signal peptides control the entry of virtually all proteins to the secretory pathway, both in eukaryotes and prokaryotes (von Heijne, 1990; Gierasch, 1989; Rapoport, 1992). They comprise the N--terminal part of the amino acid chain, and are cleaved off while the protein is translocated through the membrane. The common structure of signal peptides from variou..

    How to find simple and accurate rules for viral protease cleavage specificities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.</p> <p>Results</p> <p>A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.</p> <p>Conclusion</p> <p>A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.</p

    Predicting Off-target Effects in CRISPR-Cas9 System using Graph Convolutional Network

    Get PDF
    CRISPR-Cas9 is a powerful genome editing technology that has been widely applied in target gene repair and gene expression regulation. One of the main challenges for the CRISPR-Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far that predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques. Unfortunately, they implement a convoluted process that is difficult to understand and implement by researchers. This thesis focuses on developing a novel graph-based approach to predict off-target efficacy of sgRNA in CRISPR-Cas9 system that is easy to understand and replicate by researchers. This is achieved by creating a graph with sequences as nodes and by performing link prediction using Graph Convolutional Network (GCN) to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences

    Predicting Bevirimat resistance of HIV-1 from genotype

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Maturation inhibitors are a new class of antiretroviral drugs. Bevirimat (BVM) was the first substance in this class of inhibitors entering clinical trials. While the inhibitory function of BVM is well established, the molecular mechanisms of action and resistance are not well understood. It is known that mutations in the regions CS p24/p2 and p2 can cause phenotypic resistance to BVM. We have investigated a set of p24/p2 sequences of HIV-1 of known phenotypic resistance to BVM to test whether BVM resistance can be predicted from sequence, and to identify possible molecular mechanisms of BVM resistance in HIV-1.</p> <p>Results</p> <p>We used artificial neural networks and random forests with different descriptors for the prediction of BVM resistance. Random forests with hydrophobicity as descriptor performed best and classified the sequences with an area under the Receiver Operating Characteristics (ROC) curve of 0.93 ± 0.001. For the collected data we find that p2 sequence positions 369 to 376 have the highest impact on resistance, with positions 370 and 372 being particularly important. These findings are in partial agreement with other recent studies. Apart from the complex machine learning models we derived a number of simple rules that predict BVM resistance from sequence with surprising accuracy. According to computational predictions based on the data set used, cleavage sites are usually not shifted by resistance mutations. However, we found that resistance mutations could shorten and weaken the <it>α</it>-helix in p2, which hints at a possible resistance mechanism.</p> <p>Conclusions</p> <p>We found that BVM resistance of HIV-1 can be predicted well from the sequence of the p2 peptide, which may prove useful for personalized therapy if maturation inhibitors reach clinical practice. Results of secondary structure analysis are compatible with a possible route to BVM resistance in which mutations weaken a six-helix bundle discovered in recent experiments, and thus ease Gag cleavage by the retroviral protease.</p

    A genetic approach for building different alphabets for peptide and protein classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In this paper, it is proposed an optimization approach for producing reduced alphabets for peptide classification, using a Genetic Algorithm. The classification task is performed by a multi-classifier system where each classifier (Linear or Radial Basis function Support Vector Machines) is trained using features extracted by different reduced alphabets. Each alphabet is constructed by a Genetic Algorithm whose objective function is the maximization of the area under the ROC-curve obtained in several classification problems.</p> <p>Results</p> <p>The new approach has been tested in three peptide classification problems: HIV-protease, recognition of T-cell epitopes and prediction of peptides that bind human leukocyte antigens. The tests demonstrate that the idea of training a pool classifiers by reduced alphabets, created using a Genetic Algorithm, allows an improvement over other state-of-the-art feature extraction methods.</p> <p>Conclusion</p> <p>The validity of the novel strategy for creating reduced alphabets is demonstrated by the performance improvement obtained by the proposed approach with respect to other reduced alphabets-based methods in the tested problems.</p

    The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity

    Get PDF
    This paper reviews recent research relating to the application of bioinformatics approaches to determining HIV-1 protease specificity, outlines outstanding issues, and presents a new approach to addressing these issues. Leading machine learning theory for the problem currently suggests that the direct encoding of the physicochemical properties of the amino acid substrates is not required for optimal performance. A number of amino acid encoding approaches which incorporate potentially relevant physicochemical properties of the substrate are identified, and are evaluated using a nonlinear task decomposition based neuroevolution algorithm. The results are evaluated, and compared against a recent benchmark set on a nonlinear classifier using only amino acid sequence and identity information. Ensembles of these nonlinear classifiers using the physicochemical properties of the substrate are demonstrated to consistently outperform the recently published state-of-the-art linear support vector machine based approach in out-of-sample evaluations

    Epigenetic regulation of Mash1 expression

    Get PDF
    Mash1 is a proneural gene important for specifying the neural fate. The Mash1 locus undergoes specific epigenetic changes in ES cells following neural induction. These include the loss of repressive H3K27 trimethylation and acquisition of H3K9 acetylation at the promoter, switch to an early replication timing and repositioning of the locus away from the nuclear periphery. Here I examine the relationship between nuclear localization and gene expression during neural differentiation and the role of the neuronal repressor REST in silencing Mash1 expression in ES cells. Following neural induction of ES cells, I observed that relocation of the Mash1 locus occurs from day 4-6 whereas overt expression begins at day 6. Mash1 expression was unaffected by REST removal in ES cells as well as the locus localization at the nuclear periphery. In contrast bona fide REST target genes were upregulated in REST -/- cells. Interestingly, among REST targets, loci that were more derepressed upon REST removal showed an interior location (Sthatmin, Synaptophysin), while those more resistant to REST withdrawal, showed a peripheral location (BDNF, Calbidin, Complexin). To ask whether the insulator protein CTCF together with the cohesin complex might be involved in regulating Mash1 in ES cells, I performed ChIP analysis of CTCF and cohesin binding across the Mash1 locus in ES cells and used RNAi to deplete CTCF and cohesin expression. A slight increase in the transcription of Mash1 was seen in cells upon Rad21 knock down, although it was not possible to exclude this was a consequence of delayed cell cycle progression. Finally ES cell lines that carried a Mash1 transgene were created as a tool to look at whether activation of Mash1 can affect the epigenetic properties of neighbouring genes

    Peptide classification using optimal and information theoretic syntactic modeling

    Get PDF
    We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model of Oommen and Kashyap [22] applicable for syntactic pattern recognition can be used to tackle peptide classification problem. We advocate that one can model the differences between compared strings as a mutation model consisting of random substitutions, insertions and deletions obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a support vector machine (SVM)-based peptide classifier can be devised. The classifier, which we have built has been tested for eight different substitution matrices and for two different data sets, namely, the HIV-1 Protease cleavage sites and the T-cell epitopes. The results show that the OIT model performs significantly better than the one which uses a Needleman-Wunsch sequence alignment score, it is less sensitive to the substitution matrix than the other methods compared, and that when combined with a SVM, is among the best peptide classification methods availabl
    corecore