288 research outputs found

    Predicting Complexation Thermodynamic Parameters of β-Cyclodextrin with Chiral Guests by Using Swarm Intelligence and Support Vector Machines

    Get PDF
    The Particle Swarm Optimization (PSO) and Support Vector Machines (SVMs) approaches are used for predicting the thermodynamic parameters for the 1:1 inclusion complexation of chiral guests with β-cyclodextrin. A PSO is adopted for descriptor selection in the quantitative structure-property relationships (QSPR) of a dataset of 74 chiral guests due to its simplicity, speed, and consistency. The modified PSO is then combined with SVMs for its good approximating properties, to generate a QSPR model with the selected features. Linear, polynomial, and Gaussian radial basis functions are used as kernels in SVMs. All models have demonstrated an impressive performance with R2 higher than 0.8

    Data fusion by using machine learning and computational intelligence techniques for medical image analysis and classification

    Get PDF
    Data fusion is the process of integrating information from multiple sources to produce specific, comprehensive, unified data about an entity. Data fusion is categorized as low level, feature level and decision level. This research is focused on both investigating and developing feature- and decision-level data fusion for automated image analysis and classification. The common procedure for solving these problems can be described as: 1) process image for region of interest\u27 detection, 2) extract features from the region of interest and 3) create learning model based on the feature data. Image processing techniques were performed using edge detection, a histogram threshold and a color drop algorithm to determine the region of interest. The extracted features were low-level features, including textual, color and symmetrical features. For image analysis and classification, feature- and decision-level data fusion techniques are investigated for model learning using and integrating computational intelligence and machine learning techniques. These techniques include artificial neural networks, evolutionary algorithms, particle swarm optimization, decision tree, clustering algorithms, fuzzy logic inference, and voting algorithms. This work presents both the investigation and development of data fusion techniques for the application areas of dermoscopy skin lesion discrimination, content-based image retrieval, and graphic image type classification --Abstract, page v

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Applying Statistical Mechanics to Improve Computational Sampling Algorithms and Interatomic Potentials

    Get PDF
    In this dissertation the application of statistical mechanics is presented to improve classical simulated annealing and machine learning-based interatomic potentials. Classical simulated annealing is known to be among the most robust global optimization methods. Therefore, many variations of this method have been developed over the last few decades. This dissertation introduces simulated annealing with adaptive cooling and shows its efficiency with respect to the classical simulated annealing. Adaptive cooling simulated annealing makes use of the on-the-fly evaluation of the sta- tistical mechanical properties to adaptively adjust the cooling rate. In this case, the cooling rate is adaptively adjusted based on the instantaneous evaluations of the heat capacities, with the possible future extension to the density of states. Results are presented for Lennard-Jones clusters optimized by adaptive cooling sim- ulated annealing and the classical simulated annealing. The adaptive cooling approach proved to be more efficient than the classical simulated annealing. Statistical mechanics was also used to improve the quality and transferability of machine learning- based interatomic potentials. Machine learning (ML)-based interatomic potentials are currently garnering a lot of attention as they strive to achieve the accuracy of electronic structure methods at the computational cost of empirical potentials. Given their generic functional forms, the transferability of these potentials is highly dependent on the quality of the training set, the generation of which is a highly labor-intensive activity. Good training sets should at once contain a very diverse set of configurations while avoiding redundancies that incur cost without providing benefits. We formalize these requirements in a local entropy maximization framework and propose an automated sampling scheme to sample from this objective function. We show that this approach generates much more diverse training sets than unbiased sampling and is competitive with hand-crafted training sets[1]

    A neuro-genetic hybrid approach to automatic identification of plant leaves

    Get PDF
    Plants are essential for the existence of most living things on this planet. Plants are used for providing food, shelter, and medicine. The ability to identify plants is very important for several applications, including conservation of endangered plant species, rehabilitation of lands after mining activities and differentiating crop plants from weeds. In recent times, many researchers have made attempts to develop automated plant species recognition systems. However, the current computer-based plants recognition systems have limitations as some plants are naturally complex, thus it is difficult to extract and represent their features. Further, natural differences of features within the same plant and similarities between plants of different species cause problems in classification. This thesis developed a novel hybrid intelligent system based on a neuro-genetic model for automatic recognition of plants using leaf image analysis based on novel approach of combining several image descriptors with Cellular Neural Networks (CNN), Genetic Algorithm (GA), and Probabilistic Neural Networks (PNN) to address classification challenges in plant computer-based plant species identification using the images of plant leaves. A GA-based feature selection module was developed to select the best of these leaf features. Particle Swam Optimization (PSO) and Principal Component Analysis (PCA) were also used sideways for comparison and to provide rigorous feature selection and analysis. Statistical analysis using ANOVA and correlation techniques confirmed the effectiveness of the GA-based and PSO-based techniques as there were no redundant features, since the subset of features selected by both techniques correlated well. The number of principal components (PC) from the past were selected by conventional method associated with PCA. However, in this study, GA was used to select a minimum number of PC from the original PC space. This reduced computational cost with respect to time and increased the accuracy of the classifier used. The algebraic nature of the GA’s fitness function ensures good performance of the GA. Furthermore, GA was also used to optimize the parameters of a CNN (CNN for image segmentation) and then uniquely combined with PNN to improve and stabilize the performance of the classification system. The CNN (being an ordinary differential equation (ODE)) was solved using Runge-Kutta 4th order algorithm in order to minimize descritisation errors associated with edge detection. This study involved the extraction of 112 features from the images of plant species found in the Flavia dataset (publically available) using MATLAB programming environment. These features include Zernike Moments (20 ZMs), Fourier Descriptors (21 FDs), Legendre Moments (20 LMs), Hu 7 Moments (7 Hu7Ms), Texture Properties (22 TP) , Geometrical Properties (10 GP), and Colour features (12 CF). With the use of GA, only 14 features were finally selected for optimal accuracy. The PNN was genetically optimized to ensure optimal accuracy since it is not the best practise to fix the tunning parameters for the PNN arbitrarily. Two separate GA algorithms were implemented to optimize the PNN, that is, the GA provided by MATLAB Optimization Toolbox (GA1) and a separately implemented GA (GA2). The best chromosome (PNN spread) for GA1 was 0.035 with associated classification accuracy of 91.3740% while a spread value of 0.06 was obtained from GA2 giving rise to improved classification accuracy of 92.62%. The PNN-based classifier used in this study was benchmarked against other classifiers such as Multi-layer perceptron (MLP), K Nearest Neigbhour (kNN), Naive Bayes Classifier (NBC), Radial Basis Function (RBF), Ensemble classifiers (Adaboost). The best candidate among these classifiers was the genetically optimized PNN. Some computational theoretic properties on PNN are also presented

    Multidimensional Particle Swarm Optimization for Machine Learning

    Get PDF
    Particle Swarm Optimization (PSO) is a stochastic nature-inspired optimization method. It has been successfully used in several application domains since it was introduced in 1995. It has been especially successful when applied to complicated multimodal problems, where simpler optimization methods, e.g., gradient descent, are not able to find satisfactory results. Multidimensional Particle Swarm Optimization (MD-PSO) and Fractional Global Best Formation (FGBF) are extensions of the basic PSO. MD-PSO allows searching for an optimum also when the solution dimensionality is unknown. With a dedicated dimensional PSO process, MD-PSO can search for optimal solution dimensionality. An interleaved positional PSO process simultaneously searches for the optimal solution in that dimensionality. Both the basic PSO and its multidimensional extension MD-PSO are susceptible to premature convergence. FGBF is a plug-in to (MD-)PSO that can help avoid premature convergence and find desired solutions faster. This thesis focuses on applications of MD-PSO and FGBF in different machine learning tasks.Multiswarm versions of MD-PSO and FGBF are introduced to perform dynamic optimization tasks. In dynamic optimization, the search space slowly changes. The locations of optima move and a former local optimum may transform into a global optimum and vice versa. We exploit multiple swarms to track different optima.In order to apply MD-PSO for clustering tasks, two key questions need to be answered: 1) How to encode the particles to represent different data partitions? 2) How to evaluate the fitness of the particles to evaluate the quality of the solutions proposed by the particle positions? The second question is considered especially carefully in this thesis. An extensive comparison of Clustering Validity Indices (CVIs) commonly used as fitness functions in Particle Swarm Clustering (PSC) is conducted. Furthermore, a novel approach to carry out fitness evaluation, namely Fitness Evaluation with Computational Centroids (FECC) is introduced. FECC gives the same fitness to any particle positions that lead to the same data partition. Therefore, it may save some computational efforts and, above all, it can significantly improve the results obtained by using any of the best performing CVIs as the PSC fitness function.MD-PSO can also be used to evolve different neural networks. The results of training Multilayer Perceptrons (MLPs) using the common Backpropagation (BP) algorithm and a global technique based on PSO are compared. The pros and cons of BP and (MD-)PSO in MLP training are discussed. For training Radial Basis Function Neural Networks (RBFNNs), a novel technique based on class-specific clustering of the training samples is introduced. The proposed approach is compared to the common input and input-output clustering approaches and the benefits of using the class-specific approach are experimentally demonstrated. With the class-specific approach, the training complexity is reduced, while the classification performance of the trained RBFNNs may be improved.Collective Network of Binary Classifiers (CNBC) is an evolutionary semantic classifier consisting of several Networks of Binary Classifiers (NBCs) trained to recognize a certain semantic class. NBCs in turn consist of several Binary Classifiers (BCs), which are trained for a certain feature type. Thanks to its topology and the use of MD-PSO as its evolution technique, incremental training can be easily applied to add new training items, classes, and/or features.In feature synthesis, the objective is to exploit ground truth information to transform the original low-level features into more discriminative ones. To learn an efficient synthesis for a dataset, only a fraction of the data needs to be labeled. The learned synthesis can then be applied on unlabeled data to improve classification or retrieval results. In this thesis, two different feature synthesis techniques are introduced. In the first one, MD-PSO is directly used to find proper arithmetic operations to be applied on the elements of the original low-level feature vectors. In the second approach, feature synthesis is carried out using one-against-all perceptrons. In the latter technique, the best results were obtained when MD-PSO was used to train the perceptrons.In all the mentioned applications excluding MLP training, MD-PSO is used together with FGBF. Overall, MD-PSO and FGBF are indeed versatile tools in machine learning. However, computational limitations constrain their use in currently emerging machine learning systems operating on Big Data. Therefore, in the future, it is necessary to divide complex tasks into smaller subproblems and to conquer the large problems via solving the subproblems where the use of MD-PSO and FGBF becomes feasible. Several applications discussed in this thesis already exploit the divide-and-conquer operation model

    Spatial and Content-based Audio Processing using Stochastic Optimization Methods

    Get PDF
    Stochastic optimization (SO) represents a category of numerical optimization approaches, in which the search for the optimal solution involves randomness in a constructive manner. As shown also in this thesis, the stochastic optimization techniques and models have become an important and notable paradigm in a wide range of application areas, including transportation models, financial instruments, and network design. Stochastic optimization is especially developed for solving the problems that are either too difficult or impossible to solve analytically by deterministic optimization approaches. In this thesis, the focus is put on applying several stochastic optimization algorithms to two audio-specific application areas, namely sniper positioning and content-based audio classification and retrieval. In short, the first application belongs to an area of spatial audio, whereas the latter is a topic of machine learning and, more specifically, multimedia information retrieval. The SO algorithms considered in the thesis are particle filtering (PF), particle swarm optimization (PSO), and simulated annealing (SA), which are extended, combined and applied to the specified problems in a novel manner. Based on their iterative and evolving nature, especially the PSO algorithms are often included to the category of evolutionary algorithms. Considering the sniper positioning application, in this thesis the PF and SA algorithms are employed to optimize the parameters of a mathematical shock wave model based on observed firing event wavefronts. Such an inverse problem is suitable for Bayesian approach, which is the main motivation for including the PF approach among the considered optimization methods. It is shown – also with SA – that by applying the stated shock wave model, the proposed stochastic parameter estimation approach provides statistically reliable and qualified results. The content-based audio classification part of the thesis is based on a dedicated framework consisting of several individual binary classifiers. In this work, artificial neural networks (ANNs) are used within the framework, for which the parameters and network structures are optimized based the desired item outputs, i.e. the ground truth class labels. The optimization process is carried out using a multi-dimensional extension of the regular PSO algorithm (MD PSO). The audio retrieval experiments are performed in the context of feature generation (synthesis), which is an approach for generating new audio features/attributes based on some conventional features originally extracted from a particular audio database. Here the MD PSO algorithm is applied to optimize the parameters of the feature generation process, wherein the dimensionality of the generated feature vector is also optimized. Both from practical perspective and the viewpoint of complexity theory, stochastic optimization techniques are often computationally demanding. Because of this, the practical implementations discussed in this thesis are designed as directly applicable to parallel computing. This is an important and topical issue considering the continuous increase of computing grids and cloud services. Indeed, many of the results achieved in this thesis are computed using a grid of several computers. Furthermore, since also personal computers and mobile handsets include an increasing number of processor cores, such parallel implementations are not limited to grid servers only

    An Adjectival Interface for procedural content generation

    Get PDF
    Includes abstract.Includes bibliographical references.In this thesis, a new interface for the generation of procedural content is proposed, in which the user describes the content that they wish to create by using adjectives. Procedural models are typically controlled by complex parameters and often require expert technical knowledge. Since people communicate with each other using language, an adjectival interface to the creation of procedural content is a natural step towards addressing the needs of non-technical and non-expert users. The key problem addressed is that of establishing a mapping between adjectival descriptors, and the parameters employed by procedural models. We show how this can be represented as a mapping between two multi-dimensional spaces, adjective space and parameter space, and approximate the mapping by applying novel function approximation techniques to points of correspondence between the two spaces. These corresponding point pairs are established through a training phase, in which random procedural content is generated and then described, allowing one to map from parameter space to adjective space. Since we ultimately seek a means of mapping from adjective space to parameter space, particle swarm optimisation is employed to select a point in parameter space that best matches any given point in adjective space. The overall result, is a system in which the user can specify adjectives that are then used to create appropriate procedural content, by mapping the adjectives to a suitable set of procedural parameters and employing the standard procedural technique using those parameters as inputs. In this way, none of the control offered by procedural modelling is sacrificed â although the adjectival interface is simpler, it can at any point be stripped away to reveal the standard procedural model and give users access to the full set of procedural parameters. As such, the adjectival interface can be used for rapid prototyping to create an approximation of the content desired, after which the procedural parameters can be used to fine-tune the result. The adjectival interface also serves as a means of intermediate bridging, affording users a more comfortable interface until they are fully conversant with the technicalities of the underlying procedural parameters. Finally, the adjectival interface is compared and contrasted to an interface that allows for direct specification of the procedural parameters. Through user experiments, it is found that the adjectival interface presented in this thesis is not only easier to use and understand, but also that it produces content which more accurately reflects usersâ intentions

    Artificial intelligence approaches for materials-by-design of energetic materials: state-of-the-art, challenges, and future directions

    Full text link
    Artificial intelligence (AI) is rapidly emerging as an enabling tool for solving various complex materials design problems. This paper aims to review recent advances in AI-driven materials-by-design and their applications to energetic materials (EM). Trained with data from numerical simulations and/or physical experiments, AI models can assimilate trends and patterns within the design parameter space, identify optimal material designs (micro-morphologies, combinations of materials in composites, etc.), and point to designs with superior/targeted property and performance metrics. We review approaches focusing on such capabilities with respect to the three main stages of materials-by-design, namely representation learning of microstructure morphology (i.e., shape descriptors), structure-property-performance (S-P-P) linkage estimation, and optimization/design exploration. We provide a perspective view of these methods in terms of their potential, practicality, and efficacy towards the realization of materials-by-design. Specifically, methods in the literature are evaluated in terms of their capacity to learn from a small/limited number of data, computational complexity, generalizability/scalability to other material species and operating conditions, interpretability of the model predictions, and the burden of supervision/data annotation. Finally, we suggest a few promising future research directions for EM materials-by-design, such as meta-learning, active learning, Bayesian learning, and semi-/weakly-supervised learning, to bridge the gap between machine learning research and EM research
    corecore