Search CORE

6,774 research outputs found

機械学習モデルからの知識抽出と生命情報学への応用

Author: Liu Pengyu
Publication venue: 京都大学
Publication date: 24/05/2021
Field of study

京都大学新制・課程博士博士(情報学)甲第23397号情博第766号新制||情||131(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授阿久津達也, 教授山本章博, 教授鹿島久嗣学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

Kyoto University Research Information Repository

A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites

Author: Brunak Søren
Engelbrecht Jacob
Nielsen Henrik
von Heijne Gunnar
Publication venue
Publication date: 01/01/1997
Field of study

We have developed a new method for identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. The method performs significantly better than previous prediction schemes, and can easily be applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, thoughwith lower precision. Predictions can be made on a publicly available WWW server. Present address: Novo Nordisk A/S, Scientific Computing, Building 9M1, Novo Alle, DK-2880 Bagsværd, Denmark Introduction Signal peptides control the entry of virtually all proteins to the secretory pathway, both in eukaryotes and prokaryotes (von Heijne, 1990; Gierasch, 1989; Rapoport, 1992). They comprise the N--terminal part of the amino acid chain, and are cleaved off while the protein is translocated through the membrane. The common structure of signal peptides from variou..

CiteSeerX

Online Research Database In Technology

How to find simple and accurate rules for viral protease cleavage specificities

Author: A Grakoui
A Kontijevskis
A Narayanan
A Urbani
AA Kolykhalov
AD Kwong
B Keil
BA Malcolm
BE Turk
BM Dunn
C Howson
CM Overall
Daniel Garwicz
DJC MacKay
E Berry
EB Fowlkes
H Eizert
H Neurath
HB Shen
I Schechter
Ian Jarman
IH Jarman
J Shi
JK Stoller
K Fujikawa
K Li
L You
L You
Liwen You
MR Attwood
NA Thornberry
O Schilling
Paulo JG Lisboa
R Bartenschlager
R Bartenschlager
R Rönn
R Zhang
RA Poorman
RE Stauber
SC Pettit
SH Yang
SM Best
SS Leinbach
SY Kim
T Rögnvaldsson
TA Etchells
Terence A Etchells
Thorsteinn Rögnvaldsson
X Hou
YH Kou
ZR Yang
ZR Yang
ZR Yang
ZR Yang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.</p

Lund University Publications

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Högskolebiblioteket i Halmstad Publikationer

Predicting Off-target Effects in CRISPR-Cas9 System using Graph Convolutional Network

Author: Vinodkumar Prasoon Kumar
Publication venue: Tartu Ülikool
Publication date: 01/01/2021
Field of study

CRISPR-Cas9 is a powerful genome editing technology that has been widely applied in target gene repair and gene expression regulation. One of the main challenges for the CRISPR-Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far that predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques. Unfortunately, they implement a convoluted process that is difficult to understand and implement by researchers. This thesis focuses on developing a novel graph-based approach to predict off-target efficacy of sgRNA in CRISPR-Cas9 system that is easy to understand and replicate by researchers. This is achieved by creating a graph with sequences as nodes and by performing link prediction using Graph Convolutional Network (GCN) to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences

DSpace at Tartu University Library

Predicting Bevirimat resistance of HIV-1 from genotype

Author: A Kernytsky
A Löytynoja
AD Sevin
C Cole
C Notredame
CS Adamson
CS Adamson
D Heider
D Nguyen
D Wang
Daniel Hoffmann
DK Worthylake
Dominik Heider
E Frank
ER Wright
F Li
F Li
F Wilcoxon
GC Cawley
HB Shen
IH Witten
J Demsar
J Kingston
J Kyte
J Thompson
J Verheyen
J Zhou
Jens Verheyen
K Salzwedel
K Salzwedel
KC Chou
KV Baelen
L Breiman
L Nanni
M Borschbach
M Miller
M Riedmiller
MA Accola
N Beerenwinkel
N Beerenwinkel
N Margot
N Morellet
R Development Core Team
R King
R Lathrop
RC Edgar
RE Banfield
RJ Murray
S Draghici
S McCallister
S Ong
S Tzafestas
SR Eddy
T Fawcett
T Sing
W Resch
WW Cohen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Maturation inhibitors are a new class of antiretroviral drugs. Bevirimat (BVM) was the first substance in this class of inhibitors entering clinical trials. While the inhibitory function of BVM is well established, the molecular mechanisms of action and resistance are not well understood. It is known that mutations in the regions CS p24/p2 and p2 can cause phenotypic resistance to BVM. We have investigated a set of p24/p2 sequences of HIV-1 of known phenotypic resistance to BVM to test whether BVM resistance can be predicted from sequence, and to identify possible molecular mechanisms of BVM resistance in HIV-1. Results We used artificial neural networks and random forests with different descriptors for the prediction of BVM resistance. Random forests with hydrophobicity as descriptor performed best and classified the sequences with an area under the Receiver Operating Characteristics (ROC) curve of 0.93 ± 0.001. For the collected data we find that p2 sequence positions 369 to 376 have the highest impact on resistance, with positions 370 and 372 being particularly important. These findings are in partial agreement with other recent studies. Apart from the complex machine learning models we derived a number of simple rules that predict BVM resistance from sequence with surprising accuracy. According to computational predictions based on the data set used, cleavage sites are usually not shifted by resistance mutations. However, we found that resistance mutations could shorten and weaken the <it>α</it>-helix in p2, which hints at a possible resistance mechanism. Conclusions We found that BVM resistance of HIV-1 can be predicted well from the sequence of the p2 peptide, which may prove useful for personalized therapy if maturation inhibitors reach clinical practice. Results of secondary structure analysis are compatible with a possible route to BVM resistance in which mutations weaken a six-helix bundle discovered in recent experiments, and thus ease Gag cleavage by the retroviral protease.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A genetic approach for building different alphabets for peptide and protein classification

Author: A Kontijevskis
A Martin
A Narayanan
Alessandra Lumini
D Sarda
DR Madden
GL Zhang
GZ Liang
H Ogul
HB Shen
I Bozic
J Chen
J Hammer
J Huang
JJ Chou
JJ Chou
KC Chou
KC Chou
L Huang
L Nanni
L Nanni
L Nanni
Loris Nanni
LR Murphy
M Halkidi
M Milik
MC Honeyman
N Cristianini
R Duda
T Fawcett
T Rögnvaldsson
T Rögnvaldsson
T Sturniolo
V Brusic
Y Zhao
YD Cai
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In this paper, it is proposed an optimization approach for producing reduced alphabets for peptide classification, using a Genetic Algorithm. The classification task is performed by a multi-classifier system where each classifier (Linear or Radial Basis function Support Vector Machines) is trained using features extracted by different reduced alphabets. Each alphabet is constructed by a Genetic Algorithm whose objective function is the maximization of the area under the ROC-curve obtained in several classification problems. Results The new approach has been tested in three peptide classification problems: HIV-protease, recognition of T-cell epitopes and prediction of peptides that bind human leukocyte antigens. The tests demonstrate that the idea of training a pool classifiers by reduced alphabets, created using a Genetic Algorithm, allows an improvement over other state-of-the-art feature extraction methods. Conclusion The validity of the novel strategy for creating reduced alphabets is demonstrated by the performance improvement obtained by the proposed approach with respect to other reduced alphabets-based methods in the tested problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Padova

The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity

Author: Manning Timmy
Walsh Paul
Publication venue: 'Informa UK Limited'
Publication date: 04/12/2015
Field of study

This paper reviews recent research relating to the application of bioinformatics approaches to determining HIV-1 protease specificity, outlines outstanding issues, and presents a new approach to addressing these issues. Leading machine learning theory for the problem currently suggests that the direct encoding of the physicochemical properties of the amino acid substrates is not required for optimal performance. A number of amino acid encoding approaches which incorporate potentially relevant physicochemical properties of the substrate are identified, and are evaluated using a nonlinear task decomposition based neuroevolution algorithm. The results are evaluated, and compared against a recent benchmark set on a nonlinear classifier using only amino acid sequence and identity information. Ensembles of these nonlinear classifiers using the physicochemical properties of the substrate are demonstrated to consistently outperform the recently published state-of-the-art linear support vector machine based approach in out-of-sample evaluations

SWORD (Cork Inst. of Technology)

Epigenetic regulation of Mash1 expression

Author: Beretta Chiara
Beretta Chiara
Publication venue: Medicine, Imperial College London
Publication date: 01/10/2010
Field of study

Mash1 is a proneural gene important for specifying the neural fate. The Mash1 locus undergoes specific epigenetic changes in ES cells following neural induction. These include the loss of repressive H3K27 trimethylation and acquisition of H3K9 acetylation at the promoter, switch to an early replication timing and repositioning of the locus away from the nuclear periphery. Here I examine the relationship between nuclear localization and gene expression during neural differentiation and the role of the neuronal repressor REST in silencing Mash1 expression in ES cells. Following neural induction of ES cells, I observed that relocation of the Mash1 locus occurs from day 4-6 whereas overt expression begins at day 6. Mash1 expression was unaffected by REST removal in ES cells as well as the locus localization at the nuclear periphery. In contrast bona fide REST target genes were upregulated in REST -/- cells. Interestingly, among REST targets, loci that were more derepressed upon REST removal showed an interior location (Sthatmin, Synaptophysin), while those more resistant to REST withdrawal, showed a peripheral location (BDNF, Calbidin, Complexin). To ask whether the insulator protein CTCF together with the cohesin complex might be involved in regulating Mash1 in ES cells, I performed ChIP analysis of CTCF and cohesin binding across the Mash1 locus in ES cells and used RNAi to deplete CTCF and cohesin expression. A slight increase in the transcription of Mash1 was seen in cells upon Rad21 knock down, although it was not possible to exclude this was a consequence of delayed cell cycle progression. Finally ES cell lines that carried a Mash1 transgene were created as a tool to look at whether activation of Mash1 can affect the epigenetic properties of neighbouring genes

Spiral - Imperial College Digital Repository

Peptide classification using optimal and information theoretic syntactic modeling

Author: Aygün Ezra
Cataltepe Z
Oommen B. John
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model of Oommen and Kashyap [22] applicable for syntactic pattern recognition can be used to tackle peptide classification problem. We advocate that one can model the differences between compared strings as a mutation model consisting of random substitutions, insertions and deletions obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a support vector machine (SVM)-based peptide classifier can be devised. The classifier, which we have built has been tested for eight different substitution matrices and for two different data sets, namely, the HIV-1 Protease cleavage sites and the T-cell epitopes. The results show that the OIT model performs significantly better than the one which uses a Needleman-Wunsch sequence alignment score, it is less sensitive to the substitution matrix than the other methods compared, and that when combined with a SVM, is among the best peptide classification methods availabl

Crossref

NORA - Norwegian Open Research Archives

Agder University Research Archive