7 research outputs found

    Prototype definition through consensus analysis between fuzzy c-means and archetypal analysis

    Get PDF
    The general aim of cluster analysis is to build prototypes, or typologies of units that present similar characteristics. In this paper we propose an alternative approach based on consensus analysis of two different clustering methods to suitably obtain prototypes. The clustering methods used are fuzzy c-means (centre approach) and archetypal analysis (extreme approach). The consensus clustering is used to assess the correspondence between the clustering solutions obtained


    Get PDF
    The general aim of cluster analysis is to build prototypes, or typologies of units that present similar characteristics. In this paper we propose an alternative approach based on consensus analysis of two different clustering methods to suitably obtain proto- types. The clustering methods used are fuzzy c-means (centre approach) and archetypal analysis (extreme approach). The consensus clustering is used to assess the correspon- dence between the clustering solutions obtained

    Modular self-organization

    Get PDF
    The aim of this paper is to provide a sound framework for addressing a difficult problem: the automatic construction of an autonomous agent's modular architecture. We combine results from two apparently uncorrelated domains: Autonomous planning through Markov Decision Processes and a General Data Clustering Approach using a kernel-like method. Our fundamental idea is that the former is a good framework for addressing autonomy whereas the latter allows to tackle self-organizing problems


    Get PDF
    Cloud motion wind (CMW) determination requires tracking of individual cloud targets. This is achieved by first clustering and then tracking each cloud cluster. Ideally, different cloud clusters correspond to diiferent pressure levels. Two new clustering techniques have been developed for the identification of cloud types in multi-spectral satellite imagery. The first technique is the Global-Local clustering algorithm. It is a cascade of a histogram clustering algorithm and a dynamic clustering algorithm. The histogram clustering algorithm divides the multi-spectral histogram into'non-overlapped regions, and these regions are used to initialise the dynamic clustering algorithm. The dynamic clustering algorithm assumes clusters have a Gaussian distributed probability density function with diiferent population size and variance. The second technique uses graph theory to exploit the spatial information which is often ignored in per-pixel clustering. The algorithm is in two stages: spatial clustering and spectral clustering. The first stage extracts homogeneous objects in the image using a family of algorithms based on stepwise optimization. This family of algorithms can be further divided into two approaches: Top-down and Bottom-up. The second stage groups similar segments into clusters using a statistical hypothesis test on their similarities. The clusters generated are less noisy along class boundaries and are in hierarchical order. A criterion based on mutual information is derived to monitor the spatial clustering process and to suggest an optimal number of segments. An automated cloud motion tracking program has been developed. Three images (each separated by 30 minutes) are used to track cloud motion and the middle image is clustered using Global-Local clustering prior to tracking. Compared with traditional methods based on raw images, it is found that separation of cloud types before cloud tracking can reduce the ambiguity due to multi-layers of cloud moving at different speeds and direction. Three matching techniques are used and their reliability compared. Target sizes ranging from 4 x 4 to 32 x 32 are tested and their errors compared. The optimum target size for first generation METEOSAT images has also been found.Meteorological Office, Bracknel

    Computational strategies to identify, prioritize and design potential antimalarial agents from natural products

    Get PDF
    Philosophiae Doctor - PhDIntroduction: There is an exigent need to develop novel antimalarial drugs in view of the mounting disease burden and emergent resistance to the presently used drugs against the malarial parasites. A large amount of natural products, especially those used in ethnomedicine for malaria, have shown varying in-vitro antiplasmodial activities. Facilitating antimalarial drug development from this wealth of natural products is an imperative and laudable mission to pursue. However, the limited resources, high cost, low prospect and the high cost of failure during preclinical and clinical studies might militate against pursue of this mission. Chemoinformatics techniques can simulate and predict essential molecular properties required to characterize compounds thus eliminating the cost of equipment and reagents to conduct essential preclinical studies, especially on compounds that may fail during drug development. Therefore, applying chemoinformatics techniques on natural products with in-vitro antiplasmodial activities may facilitate identification and prioritization of these natural products with potential for novel mechanism of action, desirable pharmacokinetics and high likelihood for development into antimalarial drugs. In addition, unique structural features mined from these natural products may be templates to design new potential antimalarial compounds. Method: Four chemoinformatics techniques were applied on a collection of selected natural products with in-vitro antiplasmodial activity (NAA) and currently registered antimalarial drugs (CRAD): molecular property profiling, molecular scaffold analysis, machine learning and design of a virtual compound library. Molecular property profiling included computation of key molecular descriptors, physicochemical properties, molecular similarity analysis, estimation of drug-likeness, in-silico pharmacokinetic profiling and exploration of structure-activity landscape. Analysis of variance was used to assess statistical significant differences in these parameters between NAA and CRAD. Next, molecular scaffold exploration and diversity analyses were performed on three datasets (NAA, CRAD and malarial data from Medicines for Malarial Ventures (MMV)) using scaffold counts and cumulative scaffold frequency plots. Scaffolds from the NAA were compared to those from CRAD and MMV. A Scaffold Tree was also generated for all the datasets. Thirdly, machine learning approaches were used to build four regression and four classifier models from bioactivity data of NAA using molecular descriptors and molecular fingerprints. Models were built and refined by leave-one-out cross-validation and evaluated with an independent test dataset. Applicability domain (AD), which defines the limit of reliable predictability by the models, was estimated from the training dataset and validated with the test dataset. Possible chemical features associated with reported antimalarial activities of the compounds were also extracted. Lastly, virtual compound libraries were generated with the unique molecular scaffolds identified from the NAA. The virtual compounds generated were characterized by evaluating selected molecular descriptors, toxicity profile, structural diversity from CRAD and prediction of antiplasmodial activity. Results: From the molecular property profiling, a total of 1040 natural products were selected and a total of 13 molecular descriptors were analyzed. Significant differences were observed between the natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) for at least 11 of the molecular descriptors. Molecular similarity and chemical space analysis identified NAA that were structurally diverse from CRAD. Over 50% of NAA with desirable drug-like properties were identified. However, nearly 70% of NAA were identified as potentially "promiscuous" compounds. Structure-activity landscape analysis highlighted compound pairs that formed "activity cliffs". In all, prioritization strategies for the natural products with in-vitro antiplasmodial activities were proposed. The scaffold exploration and analysis results revealed that CRAD exhibited greater scaffold diversity, followed by NAA and MMV respectively. Unique scaffolds that were not contained in any other compounds in the CRAD datasets were identified in NAA. The Scaffold Tree showed the preponderance of ring systems in NAA and identified virtual scaffolds, which maybe potential bioactive compounds or elucidate the NAA possible synthetic routes. From the machine learning study, the regression and classifier models that were most suitable for NAA were identified as model tree M5P (correlation coefficient = 0.84) and Sequential Minimization Optimization (accuracy = 73.46%) respectively. The test dataset fitted into the applicability domain (AD) defined by the training dataset. The “amine” group was observed to be essential for antimalarial activity in both NAA and MMV dataset but hydroxyl and carbonyl groups may also be relevant in the NAA dataset. The results of the characterization of the virtual compound library showed significant difference (p value 90%) of the virtual compound library. The virtual compound libraries showed sufficient diversity in structures and majority were structurally diverse from currently registered antimalarial drugs. Finally, up to 70% of the virtual compounds were predicted as active antiplasmodial agents. Conclusions:Molecular property profiling of natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) produced a wealth of information that may guide decisions and facilitate antimalarial drug development from natural products and led to a prioritized list of natural products with in-vitro antiplasmodial activities. Molecular scaffold analysis identified unique scaffolds and virtual scaffolds from NAA that possess desirable drug-like properties, which make them ideal starting points for molecular antimalarial drug design. The machine learning study built, evaluated and identified amply accurate regression and classifier accurate models that were used for virtual screening of natural compound libraries to mine possible antimalarial compounds without the expense of bioactivity assays. Finally, a good amount of the virtual compounds generated were structurally diverse from currently registered antimalarial drugs and potentially active antiplasmodial agents. Filtering and optimization may lead to a collection of virtual compounds with unique chemotypes that may be synthesized and added to screening deck against Plasmodium

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data