17 research outputs found

    Applying pattern recognition methods to analyze the molecular properties of a homologous series of nitrogen mustard agents

    No full text
    The purpose of this research was to analyze the pharmacological properties of a homologous series of nitrogen mustard (N-mustard) agents formed after inserting 1 to 9 methylene groups (-CH2-) between 2-N(CH2CH2Cl)2 groups. These compounds were shown to have significant correlations and associations in their properties after analysis by pattern recognition methods including hierarchical classification, cluster analysis, nonmetric multi-dimensional scaling (MDS), detrended correspondence analysis, K-means cluster analysis, discriminant analysis, and self-organizing tree algorithm (SOTA) analysis. Detrended correspondence analysis showed a linear-like association of the 9 homologs, and hierarchical classification showed that each homolog had great similarity to at least one other member of the series—as did cluster analysis using paired-group distance measure. Nonmetric multi-dimensional scaling was able to discriminate homologs 2 and 3 (by number of methylene groups) from homologs 4, 5, and 6 as a group, and from homologs 7, 8, and 9 as a group. Discriminant analysis, K-means cluster analysis, and hierarchical classification distinguished the high molecular weight homologs from low molecular weight homologs. As the number of methylene groups increased the aqueous solubility decreased, dermal permeation coefficient increased, Log P increased, molar volume increased, parachor increased, and index of refraction decreased. Application of pattern recognition methods discerned useful interrelationships within the homologous series that will determine specific and beneficial clinical applications for each homolog and methods of administration

    Fast rule-based bioactivity prediction using associative classification mining

    Get PDF
    <p>Abstract</p> <p>Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called <it>associative classification mining</it> (<it>ACM</it>), which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR), classification based on multiple association rules (CMAR) and classification based on association rules (CBA) are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB), mutagenicity and hERG (the human Ether-a-go-go-Related Gene) blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM) methods, and produce highly interpretable models.</p