Search CORE

150 research outputs found

Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

Author: Bishop N.
Gillet V.J.
Holliday J.D.
Willett P.
Publication venue: 'SAGE Publications'
Publication date: 01/07/2003
Field of study

This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

Crossref

White Rose Research Online

The computer storage, retrieval and searching of generic structures in chemical patents : the machine-readable representation of generic structures.

Author: Barnard John Mordaunt
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/1983
Field of study

The nature of the generic chemical structures found in patents is described, with a discussion of the types of statement commonly found in them. The available representations for such structures are reviewed, with particular note being given to the suitability of the representation for searching files of such structures. Requirements for the unambiguous representation of generic structures in an "ideal" storage and retrieval system are discussed. The basic principles of the theory of formal languages are reviewed, with particular consideration being given to parsing methods for context-free languages. The Grammar and parsing of computer programming languages, as an example of artificial formal languages, is discussed. Applications of formal language theory to chemistry and information work are briefly reviewed. GENSAL, a formal language for the unambiguous description of generic structures from patents, is presented. It is designed to be intelligible to a chemist or patent agent, yet sufficiently ABSTRACT formaLised to be amenabLe to computer anaLysis. DetaiLed description is given of the facilities it provides for generic structure representation, and there is discussion of its Limitations and the principLes behind its design. A connection-tabLe-based internaL representation for generic structures, caLLed an ECTR <Extended Connection TabLe Representation) is presented. It is designed to represent generic structures unambiguousLy, and to be generated automatically from structures encoded in GENSAL. It is compared to other proposed representations, and its implementation using data types of the programming Language PascaL described. An interpreter program which generates an ECTR from structures encoded in a subset of the GENSAL Language is presented. The principles of its operation are described. Possible applications of GENSAL outside the area of patent documentation are discussed, and suggestions made for further work on the development of a generic structure storage and retrieval system based on GENSAL and ECTRs

White Rose E-theses Online

A survey of chemical information systems

Author: Dominick Wayne D.
Shaikh Aneesa Bashir
Publication venue
Publication date
Field of study

A survey of the features, functions, and characteristics of a fairly wide variety of chemical information storage and retrieval systems currently in operation is given. The types of systems (together with an identification of the specific systems) addressed within this survey are as follows: patents and bibliographies (Derwent's Patent System; IFI Comprehensive Database; PULSAR); pharmacology and toxicology (Chemfile; PAGODE; CBF; HEEDA; NAPRALERT; MAACS); the chemical information system (CAS Chemical Registry System; SANSS; MSSS; CSEARCH; GINA; NMRLIT; CRYST; XTAL; PDSM; CAISF; RTECS Search System; AQUATOX; WDROP; OHMTADS; MLAB; Chemlab); spectra (OCETH; ASTM); crystals (CRYSRC); and physical properties (DETHERM). Summary characteristics and current trends in chemical information systems development are also examined

NASA Technical Reports Server

Similarity Methods in Chemoinformatics

Author: A-Razzak
Adamson
Adamson
Agrafiotis
Agrafiotis
Agrafiotis
Agrafiotis
Ajay Walters
Allen
Attias
Baber
Bajorath
Ballester
Ballester
Barker
Barker
Barnard
Barnard
Barton
Bawden
Bayley
Beitzel
Belkin
Ben-Dor
Bender
Bender
Berks
Berman
Blair
Boecker
Bohl
Bohl
Bostrom
Boyd
Breiman
Bremser
Briem
Brint
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Bunin
Burbridge
Butina
Byvatov
Böhm
Böhm
Cannon
Capelli
Carbó
Carhart
Charifsen
Cheeseright
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Christianini
Clark
Clark
Clark
Clark
Clark
Clark
Clark
Cleves
Cole
Coles
Congreve
Corey
Corey
Cornell
Cosgrove
Cramer
Cramer
Cramer
Cramer
Cramer
Cramer
Crandell
Croft
Cruciani
Cuissart
Dalby
Danziger
Davis
DesJarlais
Diestel
DiMasi
Dittmar
Dixon
Dixon
Dixon
Dixon
Doman
Doweyko
Downie
Downs
Downs
Downs
Eckert
Eckert
Edgar
Egan
El-Hamdouchi
Engels
Erickson
Estrada
Everitt
Ewing
Ewing
Feher
Feldman
Fetchner
Fisanick
Fligner
Flower
Free
Freeland
Friesner
Frimurer
Gasteiger
Gedeck
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Ginn
Ginn
Glen
Godden
Godden
Godden
Godden
Goldman
Good
Good
Good
Good
Good
Gorse
Graf
Grant
Gray
Greco
Green
Griffiths
Gund
Gund
Hagadone
Haigh
Hall
Hann
Hann
Hansch
Hansch
Hansch
Hansch
Harper
Harper
Hassan
Hassan
Hawkins
Hawkins
Hawkins
He
Hert
Hert
Hert
Hert
Hertzberg
Hessler
Hiller
Hinchcliffe
Holliday
Holliday
Holliday
Holliday
Hsu
Huang
Hudson
Hurst
Hyland
Jakes
Jakes
Jarvis
Jones
Jorissen
Kauvar
Kearsley
Keiser
Kelley
Kier
Klein
Klein
Kogej
Kubinyi
Kubinyi
Kubinyi
Kuntz
Kurogi
Lajiness
Langridge
Leach
Leach
Leach
Lee
Leeson
Leiter
Lemmen
Lengauer
Lesk
Lewis
Lind
Lindsay
Lipinski
Lipinski
Lipscomb
Loftus
Lombardino
Longley
Low
Lynch
Lynch
Lynch
Lyne
Maggiora
Mahe
Maizel
Makara
Maldonado
Marshall
Martin
Martin
Martin
Martin
Martin
Mason
Mason
Matter
Medina-Franco
Mestres
Mestres
Mestres
Monge
Moock
Moock
Moon
Morgan
Muller
Munk
Murrall
Murtagh
Ng
Nikolova
Nishibata
Nübling
Oda
Onodera
Oprea
Oprea
Oprea
Oprea
Ott
Paolini
Paris
Patterson
Pearlman
Pearlman
Pearlman
Perekhodtsev
Pickett
Prathipati
Pretsch
Proudfoot
Raha
Rarey
Rarey
Rarey
Rasmussen
Ray
Raymond
Raymond
Raymond
Raymond
Raymond
Raymond
Robertson
Rogers
Rush
Rush
Rusinko
Rössler
Sadowski
Saeh
Salim
Salton
Sasaki
Schneider
Schneider
Schneider
Schofield
Schreyer
Schuffenhauer
Schuffenhauer
Schuffenhauer
Schuffenhauer
Shanmugasundaram
Shelley
Shemetulskis
Shenton
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Shively
Sirois
Smeaton
Snarey
Sneath
Spärck Jones
Spärck Jones
Stahl
Stahura
Steinbach
Steindl
Stiefl
Stiefl
Sultan
Sussenguth
Svetnik
Takahashi
Tate
Taylor
Teague
Terrett
Thorner
Thorner
Todeschini
Tong
Tong
Triballeau
Truchon
Tversky
Ullmann
van de Waterbeemd
van de Waterbeemd
van Rijsbergen
Veber
Verdonk
Verheij
Vieth
Vleduts
Wagener
Waldman
Walters
Wang
Wang
Ward
Warmuth
Warr
Warren
Weininger
Weisgerber
Whittle
Whittle
Whittle
Wild
Wild
Wild
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Williams
Wilson
Wilton
Wipke
Wipke
Worboys
Xia
Xue
Yang
Yin
Yu
Zernov
Zhang
Zupan
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

promoting access to White Rose research paper

CiteSeerX

Crossref

White Rose Research Online

Recommended from our members

Chemical Information Bulletin

Author: American Chemical Society. Division of Chemical Information.
Vogel Teri M.
Publication venue: American Chemical Society. Division of Chemical Information.
Publication date: 01/11/2020
Field of study

Periodic supplement for "the regular journals of the American Chemical Society," containing annotated bibliographies of chemical documentation literature as well as information about meetings, conferences, awards, scholarships, and other news from the American Chemical Society (ACS) Division of Chemical Literature

UNT Digital Library

Recommended from our members

A review of molecular representation in the age of machine learning

Author: Goodman Jonathan M
Lapkin Alexei A
Wigh Daniel S
Publication venue: WIREs Computational Molecular Science
Publication date: 07/02/2022
Field of study

Funder: UCB; Id: http://dx.doi.org/10.13039/100011110Abstract: Research in chemistry increasingly requires interdisciplinary work prompted by, among other things, advances in computing, machine learning, and artificial intelligence. Everyone working with molecules, whether chemist or not, needs an understanding of the representation of molecules in a machine‐readable format, as this is central to computational chemistry. Four classes of representations are introduced: string, connection table, feature‐based, and computer‐learned representations. Three of the most significant representations are simplified molecular‐input line‐entry system (SMILES), International Chemical Identifier (InChI), and the MDL molfile, of which SMILES was the first to successfully be used in conjunction with a variational autoencoder (VAE) to yield a continuous representation of molecules. This is noteworthy because a continuous representation allows for efficient navigation of the immensely large chemical space of possible molecules. Since 2018, when the first model of this type was published, considerable effort has been put into developing novel and improved methodologies. Most, if not all, researchers in the community make their work easily accessible on GitHub, though discussion of computation time and domain of applicability is often overlooked. Herein, we present questions for consideration in future work which we believe will make chemical VAEs even more accessible. This article is categorized under: Data Science > Chemoinformatic

Apollo (Cambridge)

Enhancing Reaction-based de novo Design using Machine Learning

Author: Ghiandoni Gian Marco
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/12/2019
Field of study

De novo design is a branch of chemoinformatics that is concerned with the rational design of molecular structures with desired properties, which specifically aims at achieving suitable pharmacological and safety profiles when applied to drug design. Scoring, construction, and search methods are the main components that are exploited by de novo design programs to explore the chemical space to encourage the cost-effective design of new chemical entities. In particular, construction methods are concerned with providing strategies for compound generation to address issues such as drug-likeness and synthetic accessibility. Reaction-based de novo design consists of combining building blocks according to transformation rules that are extracted from collections of known reactions, intending to restrict the enumerated chemical space into a manageable number of synthetically accessible structures. The reaction vector is an example of a representation that encodes topological changes occurring in reactions, which has been integrated within a structure generation algorithm to increase the chances of generating molecules that are synthesisable. The general aim of this study was to enhance reaction-based de novo design by developing machine learning approaches that exploit publicly available data on reactions. A series of algorithms for reaction standardisation, fingerprinting, and reaction vector database validation were introduced and applied to generate new data on which the entirety of this work relies. First, these collections were applied to the validation of a new ligand-based design tool. The tool was then used in a case study to design compounds which were eventually synthesised using very similar procedures to those suggested by the structure generator. A reaction classification model and a novel hierarchical labelling system were then developed to introduce the possibility of applying transformations by class. The model was augmented with an algorithm for confidence estimation, and was used to classify two datasets from industry and the literature. Results from the classification suggest that the model can be used effectively to gain insights on the nature of reaction collections. Classified reactions were further processed to build a reaction class recommendation model capable of suggesting appropriate reaction classes to apply to molecules according to their fingerprints. The model was validated, then integrated within the reaction vector-based design framework, which was assessed on its performance against the baseline algorithm. Results from the de novo design experiments indicate that the use of the recommendation model leads to a higher synthetic accessibility and a more efficient management of computational resources

White Rose E-theses Online

Recommended from our members

Information extraction from chemical patents

Author: Jessop David M
Publication venue: University of Cambridge
Publication date: 15/03/2011
Field of study

The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye – an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) – is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye – 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community.Unileve

Apollo (Cambridge)

Recommended from our members

Chemical Information Bulletin

Author: American Chemical Society. Division of Chemical Information.
Korolev Svetlana
Publication venue: American Chemical Society. Division of Chemical Information.
Publication date
Field of study

Created as a supplement for "the regular journals of the American Chemical Society," this publication contains annotated bibliographies of chemical documentation literature as well as information about meetings, conferences, awards, scholarships, and other news from the American Chemical Society (ACS) Division of Chemical Information (CINF)

UNT Digital Library

Open Source Workflow Engine for Cheminformatics: From Data Curation to Data Analysis

Author: Kuhn Thomas
Publication venue
Publication date: 01/01/2009
Field of study

The recent release of large open access chemistry databases into the public domain generates a demand for flexible tools to process them so as to discover new knowledge. To support Open Drug Discovery and Open Notebook Science on top of these data resources, is it desirable for the processing tools to be Open Source and available to everyone. The aim of this project was the development of an Open Source workflow engine to solve crucial cheminformatics problems. As a consequence, the CDK-Taverna project developed in the course of this thesis builds a cheminformatics workflow solution through the combination of different Open Source projects such as Taverna (workflow engine), the Chemistry Development Kit (CDK, cheminformatics library) and Pgchem::Tigress (chemistry database cartridge). The work on this project includes the implementation of over 160 different workers, which focus on cheminformatics tasks. The application of the developed methods to real world problems was the final objective of the project. The validation of Open Source software libraries and of chemical data derived from different databases is mandatory to all cheminformatics workflows. Methods to detect the atom types of chemical structures were used to validate the atom typing of the Chemistry Development Kit and to identify curation problems while processing different public databases, including the EBI drug databases ChEBI and ChEMBL as well as the natural products Chapman & Hall Chemical Database. The CDK atom typing shows a lack on atom types of heavier atoms but fits the need of databases containing organic substances including natural products. To support combinatorial chemistry an implementation of a reaction enumeration workflow was realized. It is based on generic reactions with lists of reactants and allows the generation of chemical libraries up to O(1000) molecules. Supervised machine learning techniques (perceptron-type artificial neural networks and support vector machines) were used as a proof of concept for quantitative modelling of adhesive polymer kinetics with the Mathematica GNWI.CIP package. This opens the perspective of an integration of high-level "experimental mathematics" into the CDK-Taverna based scientific pipelining. A chemical diversity analysis based on two different public and one proprietary databases including over 200,000 molecules was a large-scale application of the methods developed. For the chemical diversity analysis different molecular properties are calculated using the Chemistry Development Kit. The analysis of these properties was performed with Adaptive-Resonance-Theory (ART 2-A algorithm) for an automatic unsupervised classification of open categorical problems. The result shows a similar coverage of the chemical space of the two databases containing natural products (one public, one proprietary) whereas the ChEBI database covers a distinctly different chemical space. As a consequence these comparisons reveal interesting white-spots in the proprietary database. The combination of these results with pharmacological annotations of the molecules leads to further research and modelling activities

Kölner UniversitätsPublikationsServer