Search CORE

2,139 research outputs found

Use of the R-group descriptor for alignment-free QSAR

Author: Gedeck P.
Hirons L.
Holliday J.D.
Jelfs S.P.
Willett P.
Publication venue: 'Wiley'
Publication date: 01/04/2005
Field of study

An R-group descriptor characterises the distribution of some atom-based property, such as elemental type or partial atomic charge, at increasing numbers of bonds distant from the point of substitution on a parent ring system. Application of Partial Least Squares (PLS) to datasets for which bioactivity data and R-group descriptor information are available is shown to provide an effective way of generating QSAR models with a high level of predictive ability. The resulting models are competitive with the models produced by established QSAR approaches, are readily interpretable in structural terms, and are shown to be of value in the optimisation of a lead series

White Rose Research Online

Machine Learning for In Silico Virtual Screening and Chemical Genomics: New Strategies

Author: Jacob Laurent
Vert Jean-Philippe
Publication venue: Bentham Science Publishers Ltd.
Publication date: 01/01/2008
Field of study

Support vector machines and kernel methods belong to the same class of machine learning algorithms that has recently become prominent in both computational biology and chemistry, although both fields have largely ignored each other. These methods are based on a sound mathematical and computationally efficient framework that implicitly embeds the data of interest, respectively proteins and small molecules, in high-dimensional feature spaces where various classification or regression tasks can be performed with linear algorithms. In this review, we present the main ideas underlying these approaches, survey how both the “biological” and the “chemical” spaces have been separately constructed using the same mathematical framework and tricks, and suggest different avenues to unify both spaces for the purpose of in silico chemogenomics

ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space

Author: Dhifli Wajdi
Diallo Abdoulaye Baniré
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2016
Field of study

Studying the function of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the determination of the function of a protein structure remains a difficult, costly, and time consuming task. The difficulties are often due to the essential role of spatial and topological structures in the determination of protein functions in living cells. In this paper, we propose ProtNN, a novel approach for protein function prediction. Given an unannotated protein structure and a set of annotated proteins, ProtNN finds the nearest neighbor annotated structures based on protein-graph pairwise similarities. Given a query protein, ProtNN finds the nearest neighbor reference proteins based on a graph representation model and a pairwise similarity between vector embedding of both query and reference protein-graphs in structural and topological spaces. ProtNN assigns to the query protein the function with the highest number of votes across the set of k nearest neighbor reference proteins, where k is a user-defined parameter. Experimental evaluation demonstrates that ProtNN is able to accurately classify several datasets in an extremely fast runtime compared to state-of-the-art approaches. We further show that ProtNN is able to scale up to a whole PDB dataset in a single-process mode with no parallelization, with a gain of thousands order of magnitude of runtime compared to state-of-the-art approaches

arXiv.org e-Print Archive

Springer - Publisher Connector

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

A novel method to compare protein structures using local descriptors

Abstract Background Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships. Results We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy). Conclusions DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at <url>http://bioexploratorium.pl/EP/DEDAL</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

Author: Ari Eszter
Horváth Arnold
Ittzés Péter
Jakó Éena
Podani János
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

Crossref

Repository of the Academy's Library

11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

Author: Abel R
Achenbach J
Adikwu UM
Ain QU
Al-Yamori R
Alhalabi Z
Aniceto N
Ansideri F
Baker D
Balducci A
Banting L
Barilla J
Barrett I
Basu D
Baumann K
Bender A
Bender A
Bender A
Berg E
Bergström F
Bermudez M
Bietz S
Bietz S
Bodnarchuk MS
Boeckler FM
Boeckler FM
Bojarski AJ
Bojarski AJ
Borbulevych OY
Buchholz M
Bulusu KC
Bureau R
Böckler FM
Böttcher S
Büttner FM
Cao Q
Cappel D
Cheeseright T
Clark RD
Clark T
Da Costa FB
Dahlgren M
De Graaf C
Demuth H-U
Dorfman R
Dubrucq K
Ecker GF
Edman K
Egelkraut-Holtus M
Eid S
Eigner-Pitto V
Engel J
Engkvist O
Epple M
Essex JW
Evers A
Exner TE
Fan T-P
Fechner U
Finkelmann AR
Firaha DS
Firth M
Fourches D
Fraaije JH
Frach R
Frach R
Fraczkiewicz R
Freitas A
Friedrich N-O
Friesner R
Fu X
Fuchs JE
Fulle S
Furtado F
Garg P
Gervasio FL
Ghafourian T
Glen R
Gracia RS
Grebner C
Guallar V
Göller AH
Günther MB
Günther S
Güssregen S
Haensele E
Heidrich J
Heil J
Hennig S
Herrmann G
Hessler G
Hilbig M
Himmler H-J
Hoffgaard F
Hogner A
Hollóczki O
Horinek D
Hošek P
Husch T
Ibezim A
Ihlenfeldt WD
Ihlenfeldt WD
Jardin C
Judson P
Jäger C
Kalinowski L
Kalliokoski T
Kast SM
Kast SM
Kast SM
Kibies P
Kibies P
Kirchmair J
Kirchner B
Kireeva N
Klute W
Koch O
Koch P
Kohlbacher O
Kolb P
Korth M
Kos A
Kramer C
Krilov G
Krotzky T
Krotzky T
Kuhn H
Kuhn MA
Kurczab R
Kühne R
Lange A
Lange A
Lanig H
Laufer S
Levine Z
Li X
Lifongo LL
Lin T
Lisurek M
Lokajíček MV
Mackey M
Masek BB
Mathea M
Matter H
Mbah CJ
Mbaze LM
McWilliams L
Mervin L
Mervin LH
Mittal S
Mohamad-Zobir SZ
Montanari F
Moser D
Mrugalla F
Mullen R
Murray DC
Nagy S
Nahum O
Naß A
Nguyen QD
Nogueira MS
Ntie-Kang F
Ntie-Kang F
Ntie-Kang F
Nwodo NJ
Oliveira Santos JS-D
Oliveira TB
Omoto K
Onlia I
Ostroumov D
Owen RM
Panecka J
Patel H
Pervov VS
Petrov A
Pisaková H
Pleik S
Polokoff M
Pongratz T
Pretzel J
Proschak E
Pryde DC
Pöhner IA
Rarey M
Rarey M
Rarey M
Rauh D
Renner G
Renner G
Richmond NJ
Rickmeyer T
Rippmann F
Ross GA
Ruff M
Rupp B
Saladino G
Saleh N
Sandmann A
Sandmann A
Schall C
Schmidt D
Schmidt TC
Schmidt TJ
Schmidtke P
Schneider G
Schomburg KT
Schram J
Schulz R
Schütter C
Segler MHS
Senderowitz H
Shaikh N
Shea J-E
Sherman W
Sievers-Engler A
Simoben CV
Simr P
Sippl W
Smith S
Solovev VP
Soltanshahi F
Sommer K
Sotriffer CA
Spiwok V
Stehle T
Steinbrecher TB
Steudle A
Sticht H
Strohfeldt S
Sánchez-García E
Tautermann CS
Torda AE
Torella R
Truszkowski A
Turk S
Tyrchan C
Tyrchan C
Ulander J
Ulander J
Van den Broek K
Van den Broek K
Van Oeyen A
Volkamer A
Wade RC
Waldman M
Waller MP
Wang L
Warszycki D
Weber J
Wessjohann L
Westerhoff LM
Whitley DC
Wieczorek V
Wolber G
Yosipof A
Zdrazil B
Zielesny A
Zimmermann MO
Zoufir A
Śmieja M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/03/2016
Field of study

Spiral - Imperial College Digital Repository

Knowledge-based energy functions for computational studies of proteins

Author: A. Ben-Naim
A. Godzik
A. Godzik
A. Rossi
A.J. Bordner
A.V. Finkelstein
B. Fain
B. Krishnamoorthy
B. Kuhlman
B. Schölkopf
B.H. Park
B.I. Dahiyat
B.J. McConkey
B.O. Mitchell
C. Anfinsen
C. Carter Jr.
C. Czaplewski
C. Hoppe
C. Hu
C. Micheletti
C. Papadimitriou
C. Zhang
C. Zhang
C. Zhang
C. Zhang
C. Zhang
C.A. Rohl
C.B. Anfinsen
C.M.R Lemer
C.S. Mészáros
D. Gilis
D. Gilis
D. Gilis
D. Tobi
D. Xu
E. Venclovas
E.I. Shakhnovich
E.I. Shakhnovich
F.A. Momany
H. Dobbs
H. Edelsbrunner
H. Gan
H. Li
H. Li
H. Lu
H. Zhou
H.S. Chan
I. Muegge
J. Khatun
J. Liang
J.A. Kocher
J.A. Rank
J.M. Deutsch
J.R. Bienkowska
K. Nishikawa
K. Sale
K.H. Lee
K.K. Koretke
K.K. Koretke
K.T. Simons
L. Adamian
L. Adamian
L. Adamian
L.A. Mirny
L.L. Looger
L.M. Amzel
M. Karplus
M. Levitt
M. Vendruscolo
M. Vendruscolo
M.H. Hao
M.H. Hao
M.J. Sippl
M.J. Sippl
M.J. Sippl
M.P. Eastwood
M.R. Betancourt
M.S. Friedrichs
N. Karmarkar
N.V. Buchete
N.V. Buchete
P. Koehl
P. Koehl
P.D. Thomas
P.D. Thomas
P.G. Wolynes
P.J. Munson
R. Goldstein
R. Guerois
R. Jackups Jr.
R. Janicke
R. Méndez
R. Samudrala
R. Samudrala
R.B. Hill
R.I. Dima
R.J. Vanderbei
R.K. Singh
R.L. Jernigan
R.S. DeWitte
S. Liu
S. Miyazawa
S. Miyazawa
S. Miyazawa
S. Shimizu
S. Shimizu
S. Tanaka
S.J. Wodak
T. Kortemme
T. Kortemme
T. Kortemme
T. Lazaridis
T.L. Chiu
U. Bastolla
U. Bastolla
V. Vapnik
V. Vapnik
V.N. Maiorov
W.P. Russ
X. Li
X. Li
Y. Duan
Y. Park
Y. Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/01/2006
Field of study

This chapter discusses theoretical framework and methods for developing knowledge-based potential functions essential for protein structure prediction, protein-protein interaction, and protein sequence design. We discuss in some details about the Miyazawa-Jernigan contact statistical potential, distance-dependent statistical potentials, as well as geometric statistical potentials. We also describe a geometric model for developing both linear and non-linear potential functions by optimization. Applications of knowledge-based potential functions in protein-decoy discrimination, in protein-protein interactions, and in protein design are then described. Several issues of knowledge-based potential functions are finally discussed.Comment: 57 pages, 6 figures. To be published in a book by Springe

arXiv.org e-Print Archive

Crossref