Search CORE

31 research outputs found

Clustering files of chemical structures using the Szekely-Rizzo generalization of Ward's method

Author: Bureau R.
Mueller C.
Varin T.
Willett P.
Publication venue: 'Elsevier BV'
Publication date: 01/09/2009
Field of study

Ward's method is extensively used for clustering chemical structures represented by 2D fingerprints. This paper compares Ward clusterings of 14 datasets (containing between 278 and 4332 molecules) with those obtained using the Szekely–Rizzo clustering method, a generalization of Ward's method. The clusters resulting from these two methods were evaluated by the extent to which the various classifications were able to group active molecules together, using a novel criterion of clustering effectiveness. Analysis of a total of 1400 classifications (Ward and Székely–Rizzo clustering methods, 14 different datasets, 5 different fingerprints and 10 different distance coefficients) demonstrated the general superiority of the Székely–Rizzo method. The distance coefficient first described by Soergel performed extremely well in these experiments, and this was also the case when it was used in simulated virtual screening experiments

White Rose Research Online

Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

Author: Hári Péter
Katona Gyula Y
Málnási-Csizmadia András
Zahoránszky László A
Zahoránszky-Köhalmi Gergely
Zweig Katharina A
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results In this article we provide a general selection scheme, the <it>level independent clustering selection method</it>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <it>cohesive clusters</it>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ELTE Digital Institutional Repository (EDIT)

Clustering for 2D chemical structures

Author: Chu Chia-Wei
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/2011
Field of study

The clustering of chemical structures is important and widely used in several areas of chemoinformatics. A little-discussed aspect of clustering is standardization, it ensures all descriptors in a chemical representation make a comparable contribution to the measurement of similarity. The initial study compares the effectiveness of seven different standardization procedures that have been suggested previously, the results were also compared with unstandardized datasets. It was found that no one standardization method offered consistently the best performance. Comparative studies of clustering effectiveness are helpful in providing suitability and guidelines of different methods. In order to examine the suitability of different clustering methods for the application in chemoinformatics, especially those had not previously been applied to chemoinformatics, the second piece of study carries out an effectiveness comparison of nine clustering methods. However, the result revealed that it is unlikely that a single clustering method can provide consistently the best partition under all circumstances. Consensus clustering is a technique to combine multiple input partitions of the same set of objects to achieve a single clustering that is expected to provide a more robust and more generally effective representation of the partitions that are submitted. The third piece of study reports the use of seven different consensus clustering methods which had not previously been used on sets of chemical compounds represented by 2D fingerprints. Their effectiveness was compared with some traditional clustering methods discussed in the second study. It was observed that no consistently best consensus clustering method was found

White Rose E-theses Online

Recommended from our members

Classifying the world anti-doping agency's 2005 prohibited list using the Chemistry Development Kit fingerprint

Author: Cannon EO
Mitchell JBO
Publication venue: COMPUTATIONAL LIFE SCIENCES II, PROCEEDINGS
Publication date: 01/01/2006
Field of study

Presented at CompLife 2006, Cambridge, 27-29 September 2006.We used the freely available Chemistry Development Kit (CDK) fingerprint to classify 5235 representative molecules taken from ten banned classes in the 2005 World Anti-Doping Agency’s (WADA) prohibited list, including molecules taken from the corresponding activity classes in the MDL Drug Data Report (MDDR). We used both Random Forest and k-Nearest Neighbours (kNN)algorithms to generate classifiers. The kNN classifiers with k = 1 gave a very slightly better Matthews Correlation Coefficient than the Random Forest classifiers; the latter, however, predicted fewer false positives. The performance of kNN classifiers tended to decline with increasing k. The performance of the CDK fingerprint is essentially equivalent to that of Unity 2D. Our results suggest that it will be possible to use freely available chemoinformatics tools to aid the fight against drugs in sport, while minimising the risk of wrongfully penalising innocent athletes.EPSRC Unileve

Apollo (Cambridge)

The Application of Spectral Clustering in Drug Discovery

Author: Gan Sonny
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/09/2013
Field of study

The application of clustering algorithms to chemical datasets is well established and has been reviewed extensively. Recently, a number of ‘modern’ clustering algorithms have been reported in other fields. One example is spectral clustering, which has yielded promising results in areas such as protein library analysis. The term spectral clustering is used to describe any clustering algorithm that utilises the eigenpairs of a matrix as the basis for partitioning a dataset. This thesis describes the development and optimisation of a non-overlapping spectral clustering method that is based upon a study by Brewer. The initial version of the spectral clustering algorithm was closely related to Brewer’s method and used a full matrix diagonalisation procedure to identify the eigenpairs of an input matrix. This spectral clustering method was compared to the k-means and Ward’s algorithms, producing encouraging results, for example, when coupled with extended connectivity fingerprints, this method outperformed the other clustering algorithms according to the QCI measure. Although the spectral clustering algorithm showed promising results, its operational costs restricted its application to small datasets. Hence, the method was optimised in successive studies. Firstly, the effect of matrix sparsity on the spectral clustering was examined and showed that spectral clustering with sparse input matrices can lead to an improvement in the results. Despite this improvement, the costs of spectral clustering remained prohibitive, so the full matrix diagonalisation procedure was replaced with the Lanczos algorithm that has lower associated costs, as suggested by Brewer. This method led to a significant decrease in the computational costs when identifying a small number of clusters, however a number of issues remained; leading to the adoption of a SVD-based eigendecomposition method. The SVD-based algorithm was shown to be highly efficient, accurate and scalable through a number of studies

White Rose E-theses Online

Similarity Methods in Chemoinformatics

Author: A-Razzak
Adamson
Adamson
Agrafiotis
Agrafiotis
Agrafiotis
Agrafiotis
Ajay Walters
Allen
Attias
Baber
Bajorath
Ballester
Ballester
Barker
Barker
Barnard
Barnard
Barton
Bawden
Bayley
Beitzel
Belkin
Ben-Dor
Bender
Bender
Berks
Berman
Blair
Boecker
Bohl
Bohl
Bostrom
Boyd
Breiman
Bremser
Briem
Brint
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Bunin
Burbridge
Butina
Byvatov
Böhm
Böhm
Cannon
Capelli
Carbó
Carhart
Charifsen
Cheeseright
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Christianini
Clark
Clark
Clark
Clark
Clark
Clark
Clark
Cleves
Cole
Coles
Congreve
Corey
Corey
Cornell
Cosgrove
Cramer
Cramer
Cramer
Cramer
Cramer
Cramer
Crandell
Croft
Cruciani
Cuissart
Dalby
Danziger
Davis
DesJarlais
Diestel
DiMasi
Dittmar
Dixon
Dixon
Dixon
Dixon
Doman
Doweyko
Downie
Downs
Downs
Downs
Eckert
Eckert
Edgar
Egan
El-Hamdouchi
Engels
Erickson
Estrada
Everitt
Ewing
Ewing
Feher
Feldman
Fetchner
Fisanick
Fligner
Flower
Free
Freeland
Friesner
Frimurer
Gasteiger
Gedeck
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Ginn
Ginn
Glen
Godden
Godden
Godden
Godden
Goldman
Good
Good
Good
Good
Good
Gorse
Graf
Grant
Gray
Greco
Green
Griffiths
Gund
Gund
Hagadone
Haigh
Hall
Hann
Hann
Hansch
Hansch
Hansch
Hansch
Harper
Harper
Hassan
Hassan
Hawkins
Hawkins
Hawkins
He
Hert
Hert
Hert
Hert
Hertzberg
Hessler
Hiller
Hinchcliffe
Holliday
Holliday
Holliday
Holliday
Hsu
Huang
Hudson
Hurst
Hyland
Jakes
Jakes
Jarvis
Jones
Jorissen
Kauvar
Kearsley
Keiser
Kelley
Kier
Klein
Klein
Kogej
Kubinyi
Kubinyi
Kubinyi
Kuntz
Kurogi
Lajiness
Langridge
Leach
Leach
Leach
Lee
Leeson
Leiter
Lemmen
Lengauer
Lesk
Lewis
Lind
Lindsay
Lipinski
Lipinski
Lipscomb
Loftus
Lombardino
Longley
Low
Lynch
Lynch
Lynch
Lyne
Maggiora
Mahe
Maizel
Makara
Maldonado
Marshall
Martin
Martin
Martin
Martin
Martin
Mason
Mason
Matter
Medina-Franco
Mestres
Mestres
Mestres
Monge
Moock
Moock
Moon
Morgan
Muller
Munk
Murrall
Murtagh
Ng
Nikolova
Nishibata
Nübling
Oda
Onodera
Oprea
Oprea
Oprea
Oprea
Ott
Paolini
Paris
Patterson
Pearlman
Pearlman
Pearlman
Perekhodtsev
Pickett
Prathipati
Pretsch
Proudfoot
Raha
Rarey
Rarey
Rarey
Rasmussen
Ray
Raymond
Raymond
Raymond
Raymond
Raymond
Raymond
Robertson
Rogers
Rush
Rush
Rusinko
Rössler
Sadowski
Saeh
Salim
Salton
Sasaki
Schneider
Schneider
Schneider
Schofield
Schreyer
Schuffenhauer
Schuffenhauer
Schuffenhauer
Schuffenhauer
Shanmugasundaram
Shelley
Shemetulskis
Shenton
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Shively
Sirois
Smeaton
Snarey
Sneath
Spärck Jones
Spärck Jones
Stahl
Stahura
Steinbach
Steindl
Stiefl
Stiefl
Sultan
Sussenguth
Svetnik
Takahashi
Tate
Taylor
Teague
Terrett
Thorner
Thorner
Todeschini
Tong
Tong
Triballeau
Truchon
Tversky
Ullmann
van de Waterbeemd
van de Waterbeemd
van Rijsbergen
Veber
Verdonk
Verheij
Vieth
Vleduts
Wagener
Waldman
Walters
Wang
Wang
Ward
Warmuth
Warr
Warren
Weininger
Weisgerber
Whittle
Whittle
Whittle
Wild
Wild
Wild
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Williams
Wilson
Wilton
Wipke
Wipke
Worboys
Xia
Xue
Yang
Yin
Yu
Zernov
Zhang
Zupan
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

promoting access to White Rose research paper

CiteSeerX

Crossref

White Rose Research Online

Design of a Structure Search Engine for Chemical Compound Database

Author: Wang Hao
Publication venue: ScholarWorks @ Georgia State University
Publication date: 02/05/2008
Field of study

The search for structural fragments (substructures) of compounds is very important in medicinal chemistry, QSAR, spectroscopy, and many other fields. In the last decade, with the development of hardware and evolution of database technologies, more and more chemical compound database applications have been developed along with interfaces of searching for targets based on user input. Due to the algorithmic complexity of structure comparison, essentially a graph isomorphism problem, the current applications mainly work by the approximation of the comparison problem based on certain chemical perceptions and their search interfaces are often e-mail based. The procedure of approximation usually invokes subjective assumption. Therefore, the accuracy of the search is undermined, which may not be acceptable for researchers because in a time-consuming drug design, accuracy is always the first priority. In this dissertation, a design of a search engine for chemical compound database is presented.The design focuses on providing a solution to develop an accurate and fast search engine without sacrificing performance. The solution is comprehensive in a way that a series of related problems were addressed throughout the dissertation with proposed methods. Based on the design, a flexible computing model working for compound search engine can be established and the model can be easily applied to other applications as well. To verify the solution in a practical manner, an implementation based on the presented solution was developed. The implementation clarifies the coupling between theoretic design and technique development. In addition, a workable implementation can be deployed to test the efficiency and effectiveness of the design under variant of experimental data

ScholarWorks @ Georgia State University

Multivariate Analysis in Management, Engineering and the Sciences

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Recently statistical knowledge has become an important requirement and occupies a prominent position in the exercise of various professions. In the real world, the processes have a large volume of data and are naturally multivariate and as such, require a proper treatment. For these conditions it is difficult or practically impossible to use methods of univariate statistics. The wide application of multivariate techniques and the need to spread them more fully in the academic and the business justify the creation of this book. The objective is to demonstrate interdisciplinary applications to identify patterns, trends, association sand dependencies, in the areas of Management, Engineering and Sciences. The book is addressed to both practicing professionals and researchers in the field

Directory of Open Access Books (DOAB)

Recommended from our members

Understanding Designer Mental Models to Support Computer Directed Analogical Design

Author: Arlitt Ryan M.
Publication venue: 'Oregon State University'
Publication date
Field of study

Analysis of alternative concepts has a significant impact on design project outcomes, and yet many design teams fail to consider a significantly broad range of conceptual solutions. Within the realm of conceptual design exists a technique called design by analogy (DBA) -- the practice of reapplying old solutions to new problems. DBA mitigates the effort required to generate a large field of candidate concepts by leveraging existing knowledge from a wide variety of domains, making it an attractive approach toward improving design outcomes. Unfortunately, DBA is challenging in the absence of expert knowledge. Designers need computational support in order to effectively identify a large number of high-quality analogical connections across a wide variety of domains. With this challenge in mind, the goal of this dissertation is to improve the body of knowledge regarding computational support for design by analogy. More specifically, this body of work includes five manuscripts. Manuscript 0 presents a review of several function-related design abstractions, including their impacts on education and industry. Manuscript 1 studies analogy retrieval in a novel design context and catalogs the types of abstract similarity (including function) commonly used to form analogies. Manuscript 2 examines a scalable approach to capturing analogy-relevant design knowledge to support large-scale analogy searching. Manuscripts 3 and 4 examine and modify a technique from de novo drug design for quickly indexing and retrieving design analogies. Manuscript 3 examines the domain independence of the technique, and manuscript 4 develops it as a large-scale design analogy search method. The body of work contributes to a greater understanding of (1) the abstractions used by designers during conceptual design, (2) the use of human computation to support conceptual design activities, and (3) large scale solution screening using a variety of mixed design abstractions. This understanding advances the creation of tools that enable designers to consider a wide range of conceptual solutions in spite of lacking domain expertise

ScholarsArchive@OSU