74 research outputs found

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Bio-AIMS collection of chemoinformatics web tools based on molecular graph information and artificial intelligence models

    Get PDF
    [Abstract] The molecular information encoding into molecular descriptors is the first step into in silico Chemoinformatics methods in Drug Design. The Machine Learning methods are a complex solution to find prediction models for specific biological properties of molecules. These models connect the molecular structure information such as atom connectivity (molecular graphs) or physical-chemical properties of an atom/group of atoms to the molecular activity (Quantitative Structure - Activity Relationship, QSAR). Due to the complexity of the proteins, the prediction of their activity is a complicated task and the interpretation of the models is more difficult. The current review presents a series of 11 prediction models for proteins, implemented as free Web tools on an Artificial Intelligence Model Server in Biosciences, Bio-AIMS (http://bio-aims.udc.es/TargetPred.php). Six tools predict protein activity, two models evaluate drug - protein target interactions and the other three calculate protein - protein interactions. The input information is based on the protein 3D structure for nine models, 1D peptide amino acid sequence for three tools and drug SMILES formulas for two servers. The molecular graph descriptor-based Machine Learning models could be useful tools for in silico screening of new peptides/proteins as future drug targets for specific treatments.Red Gallega de Investigación y Desarrollo de Medicamentos; R2014/025Instituto de Salud Carlos III; PI13/0028

    How to find simple and accurate rules for viral protease cleavage specificities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.</p> <p>Results</p> <p>A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.</p> <p>Conclusion</p> <p>A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.</p

    Cat Swarm based Optimization of Gene Expression Data Classification

    Get PDF
    Abstract-An Artificial Neural Network (ANN) does have the capability to provide solutions of various complex problems. The generalization ability of ANN due to the massively parallel processing capability can be utilized to learn the patterns discovered in the data set which can be represented in terms of a set of rules. This rule can be used to find the solution to a classification problem. The learning ability of the ANN is degraded due to the high dimensionality of the datasets. Hence, to minimize this risk we have used Principal Component Analysis (PCA) and Factor Analysis (FA) which provides a feature reduced dataset to the Multi Layer Perceptron (MLP), the classifier used. Again, since the weight matrices are randomly initialized, hence, in this paper we have used Cat Swarm Optimization (CSO) method to update the weight values of the weight matrix. From the experimental evaluation, it was found that using CSO with the MLP classifier provides better classification accuracy as compared to when the classifier is solely used

    An improved bees algorithm local search mechanism for numerical dataset

    Get PDF
    Bees Algorithm (BA), a heuristic optimization procedure, represents one of the fundamental search techniques is based on the food foraging activities of bees. This algorithm performs a kind of exploitative neighbourhoods search combined with random explorative search. However, the main issue of BA is that it requires long computational time as well as numerous computational processes to obtain a good solution, especially in more complicated issues. This approach does not guarantee any optimum solutions for the problem mainly because of lack of accuracy. To solve this issue, the local search in the BA is investigated by Simple swap, 2-Opt and 3-Opt were proposed as Massudi methods for Bees Algorithm Feature Selection (BAFS). In this study, the proposed extension methods is 4-Opt as search neighbourhood is presented. This proposal was implemented and comprehensively compares and analyse their performances with respect to accuracy and time. Furthermore, in this study the feature selection algorithm is implemented and tested using most popular dataset from Machine Learning Repository (UCI). The obtained results from experimental work confirmed that the proposed extension of the search neighbourhood including 4-Opt approach has provided better accuracy with suitable time than the Massudi methods

    Investigating the structural diversity within a committee of classifiers and their generalization performance

    Get PDF
    This study investigates the measures of diversity within ensembles of classifiers. The use of neural networks is carried out in measuring ensemble diversity by the use of statistical and ecological methods and to some extent information theory. A new way of looking at ensemble diversity is proposed. This ensemble diversity is called ensemble structural diversity, for this study is concerned with the diversity within the structure of the individual classifiers forming an ensemble and not via the outcomes of the individual classifiers. Ensemble structural diversity was also induced within the ensemble by varying the structural parameters (learning parameters) of the artificial machines (classifiers). The importance or the use of these measures was judged by comparing the measure of structural diversity and the ensemble generalization performance. This was done so that comparisons can be made on the robustness of the idea of structural diversity and its relationship with ensemble generalization performance. It was found that diversity could be induced by having ensembles with different structural and implicit (e.g learning) parameters and that this diversity does influence the predictive ability of the ensemble. This was concurrent with literature even though within literature ensemble diversity was viewed from the output as opposed to the structure of the individual classifiers. As the structural diversity increased so did the generalization performance. However there was a point where structural diversity decreased the generalization performance of the ensemble, where from that point onwards as the structural diversity increased the generalization performance decreased. This makes sense because too much of diversity within the ensemble might mean no consensus is reached at all. The disadvantages of comparing structural diversity and the generalization performance (accuracy) of the ensemble are that: an ensemble can be structurally diverse even though all the classifiers within the ensemble approximate the same function which means in this case structural diversity is meaningless in terms of improving the accuracy of the ensemble. The use of ensemble structural diversity measures in developing efficient ensembles still remains to be explored. This study, however, has also shown that diversity can be measured from the structural parameters and moreover reducing the abstractness of diversity by being able to quantify structural diversity making it possible to map a relationship between structural diversity and accuracy. It was observed that structural diversity does improve the accuracy of the ensemble, however, within a limited region of structural diversity

    Coarse-grained modelling of protein structure and internal dynamics: comparative methods and applications

    Get PDF
    The first chapter is devoted to a brief summary of the basic techniques commonly used to characterise protein's internal dynamics, and to perform those primary analyses which are the basis for our further developments. To this purpose we recall the basics of Principal Component Analysis of the covariance matrix of molecular dynamics (MD) trajectories. The overview is aimed at motivating and justifying a posteriori the introduction of coarse-grained models of proteins. In the second chapter we shall discuss dynamical features shared by different conformers of a protein. We'll review previously obtained results, concerning the universality of the vibrational spectrum of globular proteins and the self-similar free energy landscape of specific molecules, namely the G-protein and Adk. Finally, a novel technique will be discussed, based on the theory of Random Matrices, to extract the robust collective coordinates in a set of protein conformers by comparison with a stochastic reference model. The third chapter reports on an extensive investigation of protein internal dynamics modelled in terms of the relative displacement of quasi-rigid groups of amino acids. Making use of the results obtained in the previous chapters, we shall discuss the development of a strategy to optimally partition a protein in units, or domains, whose internal strain is negligible compared to their relative uctuation. These partitions will be used in turn to characterise the dynamical properties of proteins in the framework of a simplified, coarse-grained, description of their motion. In the fourth chapter we shall report on the possibility to use the collective uctuations of proteins as a guide to recognise relationships between them that may not be captured as significant when sequence or structural alignment methods are used. We shall review a method to perform the superposition of two proteins optimising the similarity of the structures as well as the dynamical consistency of the aligned regions; then, we shall next discuss a generalisation of this scheme to accelerate the dynamics-based alignment, in the perspective of dataset-wide applications. Finally, the fifth chapter focuses on a different topic, namely the occurrence of topologically-entangled states (knots) in proteins. Specifically, we shall investigate the sequence and structural properties of knotted proteins, reporting on an exhaustive dataset-wide comparison with unknotted ones. The correspondence, or the lack thereof, between knotted and unknotted proteins allowed us to identify, in knotted chains, small segments of the backbone whose `virtual' excision results in an unknotted structure. These `knot-promoting' loops are thus hypothesised to be involved in the formation of the protein knot, which in turn is likely to cover some role in the biological function of the knotted proteins

    Protein function and inhibitor prediction by statistical learning approach

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    The use of machine learning to improve the effectiveness of ANRS in predicting HIV drug resistance.

    Get PDF
    Master of TeleHealth in Medical Informatics. University of KwaZulu-Natal, Durban, 2016.BACKGROUD HIV has placed a large burden of disease in developing countries. HIV drug resistance is inevitable due to selective pressure. Computer algorithms have been proven to help in determining optimal treatment for HIV drug resistance patients. One such algorithm is the ANRS gold standard interpretation algorithm developed by the French National Agency for AIDS Research AC11 Resistance group. OBJECTIVES The aim of this study is to investigate the possibility of improving the accuracy of the ANRS gold standard in predicting HIV drug resistance. METHODS Data consisting of genome sequence and a HIV drug resistance measure was obtained from the Stanford HIV database. Machine learning factor analysis was performed to determine sequence positions where mutations lead to drug resistance. Sequence positions not found in ANRS were added to the ANRS rules and accuracy was recalculated. RESULTS The machine learning algorithm did find sequence positions, not associated with ANRS, but the model suggests they are important in the prediction of HIV drug resistance. Preliminary results show that for IDV 10 sequence positions where found that were not associated with ANRS rules, 4 for LPV, and 8 for NFV. For NFV, ANRS misclassified 74 resistant profiles as being susceptible to the ARV. Sixty eight of the 74 sequences (92%) were classified as resistance with the inclusion of the eight new sequence positions. No change was found for LPV and a 78% improvement was associated with IDV. CONCLUSION The study shows that there is a possibility of improving ANRS accuracy
    corecore