4,531 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Machine learning-guided directed evolution for protein engineering

    Get PDF
    Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

    Antibody fragments as probe in biosensor development

    Get PDF
    Today's proteomic analyses are generating increasing numbers of biomarkers, making it essential to possess highly specific probes able to recognize those targets. Antibodies are considered to be the first choice as molecular recognition units due to their target specificity and affinity, which make them excellent probes in biosensor development. However several problems such as difficult directional immobilization, unstable behavior, loss of specificity and steric hindrance, may arise from using these large molecules. Luckily, protein engineering techniques offer designed antibody formats suitable for biomarker analysis. Minimization strategies of antibodies into Fab fragments, scFv or even single-domain antibody fragments like VH, VL or VHHs are reviewed. Not only the size of the probe but also other issues like choice of immobilization tag, type of solid support and probe stability are of critical importance in assay development for biosensing. In this respect, multiple approaches to specifically orient and couple antibody fragments in a generic one-step procedure directly on a biosensor substrate are discussed

    Software for Implementing the Sequential Elimination of Level Combinations Algorithm

    Get PDF
    Genetic algorithms (GAs) are a popular technology to search for an optimum in a large search space. Using new concepts of forbidden array and weighted mutation, Mandal, Wu, and Johnson (2006) used elements of GAs to introduce a new global optimization technique called sequential elimination of level combinations (SELC), that efficiently finds optimums. A SAS macro, and MATLAB and R functions are developed to implement the SELC algorithm.

    Selecting RNA aptamers for synthetic biology: investigating magnesium dependence and predicting binding affinity.

    Get PDF
    The ability to generate RNA aptamers for synthetic biology using in vitro selection depends on the informational complexity (IC) needed to specify functional structures that bind target ligands with desired affinities in physiological concentrations of magnesium. We investigate how selection for high-affinity aptamers is constrained by chemical properties of the ligand and the need to bind in low magnesium. We select two sets of RNA aptamers that bind planar ligands with dissociation constants (K(d)s) ranging from 65 nM to 100 microM in physiological buffer conditions. Aptamers selected to bind the non-proteinogenic amino acid, p-amino phenylalanine (pAF), are larger and more informationally complex (i.e., rarer in a pool of random sequences) than aptamers selected to bind a larger fluorescent dye, tetramethylrhodamine (TMR). Interestingly, tighter binding aptamers show less dependence on magnesium than weaker-binding aptamers. Thus, selection for high-affinity binding may automatically lead to structures that are functional in physiological conditions (1-2.5 mM Mg(2+)). We hypothesize that selection for high-affinity binding in physiological conditions is primarily constrained by ligand characteristics such as molecular weight (MW) and the number of rotatable bonds. We suggest that it may be possible to estimate aptamer-ligand affinities and predict whether a particular aptamer-based design goal is achievable before performing the selection

    Intelligent data acquisition for drug design through combinatorial library design

    Get PDF
    A problem that occurs in machine learning methods for drug discovery is aneed for standardized data. Methods and interest exist for producing new databut due to material and budget constraints it is desirable that each iteration ofproducing data is as efficient as possible. In this thesis, we present two papersmethods detailing different problems for selecting data to produce. We invest-igate Active Learning for models that use the margin in model decisiveness tomeasure the model uncertainty to guide data acquisition. We demonstrate thatthe models perform better with Active Learning than with random acquisitionof data independent of machine learning model and starting knowledge. Wealso study the multi-objective optimization problem of combinatorial librarydesign. Here we present a framework that could process the output of gener-ative models for molecular design and give an optimized library design. Theresults show that the framework successfully optimizes a library based onmolecule availability, for which the framework also attempts to identify usingretrosynthesis prediction. We conclude that the next step in intelligent dataacquisition is to combine the two methods and create a library design modelthat use the information of previous libraries to guide subsequent designs

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Translation of Random Transcripts Generated by TdT: Potential Use in Polysome Peptide Libraries

    Get PDF
    A thesis presented to the faculty of the College of Science and Technology at Morehead State University in partial fulfillment of the requirements for the Degree of Master of Science in Biology by Michael Lane Spencer on July 22, 1998
    corecore