Search CORE

1,141 research outputs found

An integrative clustering approach combining particle swarm optimization and formal concept analysis

Author: A. Alizadeh
A. Brazma
E. Tsiporkova
G. Rustici
J. Besson
J. Besson
J. Handl
J. Kennedy
J.K. Choi
M. Kaytoue-Uberall
P. Rousseeuw
S. Maere
T. Golub
V. Boeva
V. Choi
Zhou
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

Crossref

Ghent University Academic Bibliography

On heuristic bias in fragment-Assembly methods for protein structure prediction

Author: Garza-Fabre M
Handl J
Kandathil S
Lovell SC
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 15/07/2017
Field of study

We discuss the issue of heuristic bias in fragment-Assembly methods for protein structure prediction. We explain the importance of this issue, which has been paid insufficient a.ention by evolutionary computation researchers engaging with the structural biology community. We proceed by describing preliminary data that illustrates the signi.cant (and expectable) impact that fragment library composition has on search performance, and discuss the challenges this poses for the development of improved fragment libraries

Crossref

UCL Discovery

An optimized TOPS+ comparison method for enhanced TOPS models

Author: A Brazma
A Harrison
A Harrison
CA Orengo
CA Orengo
CA Orengo
CJ van Rijsbergen
D Gilbert
D Gilbert
D Westhead
David Gilbert
G Valiente
Gabriel Valiente
GJ Barton
GM Torrance
HM Berman
HM Grindley
I Koch
I Michalopoulos
IN Shindyalov
J Handl
J Viksna
K Mizuguchi
L Holm
LP Chew
M Veeramalai
M Veeramalai
M Veeramalai
Mallika Veeramalai
N Krasnogor
RB Russell
S Goldsmith-Fischman
SB Needleman
SS Krishna
T Madej
T Madej
TF Smith
VI Levenshtein
WR Taylor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

This article has been made available through the Brunel Open Access Publishing Fund.Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Brunel University Research Archive

Generating, maintaining and exploiting diversity in a memetic algorithm for protein structure prediction

Author: Cook W. J.
Engh R. A.
Garza-Fabre M.
Goldberg D.
Goldberg D.
Goldberg D.
Joshua Knowles
Julia Handl
Kandathil S.
Mario Garza-Fabre
Olson B. S.
Papadimitriou C.
Sastry K.
Shaun M. Kandathil
Simon C. Lovell
Wang C.
Xu D.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2016
Field of study

Crossref

University of Birmingham Research Portal

The University of Manchester - Institutional Repository

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Mycotoxin occurrence in commodities, feeds and feed ingredients sourced in the Middle East and Africa

Author: Diekman MA
E.M. Binder
Gareis M
Gareis M
Guthrie LD
I. Rodrigues
J. Handl
Jelinek CF
Leeson S
Pestka JJ
Sharma RP
Tangendjaja B
Vardon P
Whitlow LW
Wicklow DT
Publication venue: Taylor & Francis
Publication date
Field of study

Between February and October 2009, 324 grain, feed and feed commodity samples were sourced directly at animal farms or feed production sites in Middle East and Africa and tested for the presence of A- and B-trichothecenes, zearalenone, fumonisins, aflatoxins and ochratoxin A, or for selected groups of mycotoxins only. Samples were analyzed after clean-up by immunoaffinity or solid-phase extraction followed by HPLC with derivatization where appropriate and fluorescence, UV or mass spectrometric detection. The percentage of positive samples of B-trichothecenes ranged from 0 to 87% of tested samples. The prevalence of fumonisins in the different countries was >50% in most cases. Zearalenone was present in tested commodities from all countries except three. The presence of aflatoxin in analyzed samples varied from 0 to 94%. Ochratoxin A was present in 67% of samples in Sudan and in 100% of Nigerian samples. No A-trichothecenes were found in this survey

Crossref

PubMed Central

Discovering multi–level structures in bio-molecular data through the Bernstein inequality

Author: A Alizadeh
A Alizadeh
A Ben-Hur
A Bertoni
A Bertoni
A Jain
Alberto Bertoni
D Achlioptas
E Levine
G Valentini
G Valentini
Giorgio Valentini
J Dopazo
J Handl
J Ward
L Dyrskjøt
L Kaufman
L McShane
M Smolkin
N Kaplan
S Dudoit
S Monti
T Golub
T Lange
W Hoeffding
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the ”optimal ” number of clusters in bio-molecular data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the data are needed. Results: To assess the statistical significance and to discover multi-level structures in bio-molecular data, a new method based on Bernstein’s inequality is proposed. This approach makes no assumptions about the distribution of the data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray data show the effectiveness of the proposed method. Conclusions: The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-molecular data

CiteSeerX

Crossref

AIR Universita degli studi di Milano

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hierarchical information clustering by means of topologically embedded graphs

Author: A Alizadeh
A Jain
AI Saez
AJ Nathalie
BB Ding
C Rivera
D Arthur
D Garlaschelli
DL Davies
DM Rocke
G Caldarelli
G Lenz
G Ringel
G Romeo
GL Pellegrini
GP Coffey
H Hooyberghs
IS Lossos
IT Hernádvölgyi
J Dunn
J Handl
J McQueen
J Quackenbush
J Ruan
J Shi
J Wang
JM Boyer
JS Abramson
JSJ Andrade
KII Goh
L Amaral
L Chen
L Hubert
L Leseux
LL Lam
M Arsura
M Eisen
M Filipits
M Girvan
M Kitsak
M Tumminello
MC de Souto
N Wada
PF Jonsson
R Diestel
R Seki
R Xu
RA Fisher
S Fortunato
ShaunS Wang
SV Buldyrev
T Aste
T Di Matteo
T Di Matteo
T Di Matteo
T Kamijo
T Kohonen
T Sorensen
T. Di Matteo
Tomaso Aste
U von Luxburg
WM Song
Won-Min Song
X Zhao
XF Zhao
Ying Xu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/10/2011
Field of study

We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.Comment: 33 Pages, 18 Figures, 5 Table

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Kent Academic Repository

King's Research Portal

FigShare