2 research outputs found

    Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.</p> <p>Results</p> <p>In this article we provide a general selection scheme, the <it>level independent clustering selection method</it>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <it>cohesive clusters</it>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection.</p> <p>Conclusion</p> <p>Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.</p

    NETWORK INFERENCE DRIVEN DRUG DISCOVERY

    Get PDF
    The application of rational drug design principles in the era of network-pharmacology requires the investigation of drug-target and target-target interactions in order to design new drugs. The presented research was aimed at developing novel computational methods that enable the efficient analysis of complex biomedical data and to promote the hypothesis generation in the context of translational research. The three chapters of the Dissertation relate to various segments of drug discovery and development process. The first chapter introduces the integrated predictive drug discovery platform „SmartGraph”. The novel collaborative-filtering based algorithm „Target Based Recommender (TBR)” was developed in the framework of this project and was validated on a set of 28,270 experimentally determined bioactivity data points involving 1,882 compounds and 869 targets. The TBR is integrated into the SmartGraph platform. The graphical interface of SmartGraph enables data analysis and hypothesis generation even for investigators without substantial bioinformatics knowledge. The platform can be utilized in the context of target identification, drug-target prediction and drug repurposing. The second chapter of the Dissertation introduces an information theory inspired dynamic network model and the novel “Luminosity Diffusion (LD)” algorithm. The model can be utilized to prioritize protein targets for drug discovery purposes on the basis of available information and the importance of the targets. The importance of targets is accounted for in the information flow simulation process and is derived merely from network topology. The LD algorithm was validated on 8,010 relations of 794 proteins extracted from the Target Central Resource Database developed in the framework of the “Illuminating the Druggable Genome” project. The last chapter discusses a fundamental problem pertaining to the generation of similarity network of molecules and their clustering. The network generation process relies on the selection of a similarity threshold. The presented work introduces a network topology based systematic solution for selecting this threshold so that the likelihood of a reasonable clustering can be increased. Furthermore, the work proposes a solution for generating so-called “pseudo-reference clustering” for large molecular data sets for performance evaluation purposes. The results of this chapter are applicable in the lead identification and development processes
    corecore