714 research outputs found
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Parsimonious Kernel Fisher Discrimination
By applying recent results in optimization transfer, a new algorithm for kernel Fisher Discriminant Analysis is provided that makes use of a non-smooth penalty on the coefficients to provide a parsimonious solution. The algorithm is simple, easily programmed and is shown to perform as well as or better than a number of leading machine learning algorithms on a substantial benchmark. It is then applied to a set of extreme small-sample-size problems in virtual screening where it is found to be less accurate than a currently leading approach but is still comparable in a number of cases
Artificial Intelligence-Based Drug Design and Discovery
The drug discovery process from hit-to-lead has been a challenging task that requires simultaneously optimizing numerous factors from maximizing compound activity, efficacy to minimizing toxicity and adverse reactions. Recently, the advance of artificial intelligence technique enables drugs to be efficiently purposed in silico prior to chemical synthesis and experimental evaluation. In this chapter, we present fundamental concepts of artificial intelligence and their application in drug design and discovery. The emphasis will be on machine learning and deep learning, which demonstrated extensive utility in many branches of computer-aided drug discovery including de novo drug design, QSAR (Quantitative Structure–Activity Relationship) analysis, drug repurposing and chemical space visualization. We will demonstrate how artificial intelligence techniques can be leveraged for developing chemoinformatics pipelines and presented with real-world case studies and practical applications in drug design and discovery. Finally, we will discuss limitations and future direction to guide this rapidly evolving field
Machine Learning for In Silico Virtual Screening and Chemical Genomics: New Strategies
Support vector machines and kernel methods belong to the same class of machine learning algorithms that has recently become prominent in both computational biology and chemistry, although both fields have largely ignored each other. These methods are based on a sound mathematical and computationally efficient framework that implicitly embeds the data of interest, respectively proteins and small molecules, in high-dimensional feature spaces where various classification or regression tasks can be performed with linear algorithms. In this review, we present the main ideas underlying these approaches, survey how both the “biological” and the “chemical” spaces have been separately constructed using the same mathematical framework and tricks, and suggest different avenues to unify both spaces for the purpose of in silico chemogenomics
Similarity-based virtual screening using 2D fingerprints
This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available
Cheminformatics: A Patentometric Analysis
Cheminformatics has entrenched itself as a core discipline within chemistry, biology, and allied sciences, more particularly in the field of Drug Design Discovery and Development. The article begins with a patent analysis of the progressing field of cheminformatics from 1996 to early 2021 using the Relecura and Lens patent database. It proceeds with a description of patents in various domains and aspects. The eye-catching mind map shows the landscape of cheminformatics patent search. The results reveal the star rating-wise patent counts and the trends in the sub-technological research areas. At the end of the article, quantum clustering and eminent directions towards the future of cheminformatics have been discussed. This study would provide the directions to academicians, techno enthusiasts, researchers, stakeholders, or investors and helps increase the awareness of the potential of cheminformatics and quantum clustering
- …