739 research outputs found
Similarity Methods in Chemoinformatics
promoting access to White Rose research paper
The computer storage, retrieval and searching of generic structures in chemical patents : the machine-readable representation of generic structures.
The nature of the generic chemical structures found in patents is
described, with a discussion of the types of statement commonly
found in them. The available representations for such structures
are reviewed, with particular note being given to the suitability
of the representation for searching files of such structures.
Requirements for the unambiguous representation of generic
structures in an "ideal" storage and retrieval system are
discussed.
The basic principles of the theory of formal languages are
reviewed, with particular consideration being given to parsing
methods for context-free languages. The Grammar and parsing of
computer programming languages, as an example of artificial
formal languages, is discussed. Applications of formal language
theory to chemistry and information work are briefly reviewed.
GENSAL, a formal language for the unambiguous description of
generic structures from patents, is presented. It is designed to
be intelligible to a chemist or patent agent, yet sufficiently
ABSTRACT
formaLised to be amenabLe to computer anaLysis. DetaiLed
description is given of the facilities it provides for generic
structure representation, and there is discussion of its
Limitations and the principLes behind its design.
A connection-tabLe-based internaL representation for generic
structures, caLLed an ECTR <Extended Connection TabLe
Representation) is presented. It is designed to represent generic
structures unambiguousLy, and to be generated automatically from
structures encoded in GENSAL. It is compared to other proposed
representations, and its implementation using data types of the
programming Language PascaL described.
An interpreter program which generates an ECTR from structures
encoded in a subset of the GENSAL Language is presented. The
principles of its operation are described.
Possible applications of GENSAL outside the area of patent
documentation are discussed, and suggestions made for further
work on the development of a generic structure storage and
retrieval system based on GENSAL and ECTRs
Recommended from our members
Chemical Information Bulletin
Created as a supplement for "the regular journals of the American Chemical Society," this publication contains annotated bibliographies of chemical documentation literature as well as information about meetings, conferences, awards, scholarships, and other news from the American Chemical Society (ACS) Division of Chemical Information (CINF)
Recommended from our members
Chemical Information Bulletin
Periodic supplement for "the regular journals of the American Chemical Society," containing annotated bibliographies of chemical documentation literature as well as information about meetings, conferences, awards, scholarships, and other news from the American Chemical Society (ACS) Division of Chemical Literature
Recommended from our members
Chemical Information Bulletin
Periodic supplement for "the regular journals of the American Chemical Society," containing annotated bibliographies of chemical documentation literature as well as information about meetings, conferences, awards, scholarships, and other news from the American Chemical Society (ACS) Division of Chemical Literature
The evolution of an on-line chemical search system for an industrial research unit.
The objectives of this study were to design an information
system, using modern computer technology, to meet a research
chemist's need for chemical structural information, to quantify
the effects of increasing degrees of computer technology on the
use made of the facilities, and to relate the use of the service
back to the individual chemist, his performance and background.
A computer system was developed based on Wiswesser Line Notation
and molecular formula as the chemical structure descriptors. Systems design and analysis were performed so that access to the
information could be obtained directly for individual compounds
and more generally for classes of compounds.
As the system was being developed, its use by information staff
was monitored by constant interaction with the people concerned.
Where appropriate, the system was modifiea to meet information
staff's requirements, but a number of precautions had to be
introduced to prevent mis-use.
The research chemists' use of the information services was
studied retrospectively over a two-year period. In addition
to the use made, several other factors were observed for each
chemist. These included performance measures and background
information on the chemists' research role.
The data showed a steady increase in the demand for the services
by the research chemist as the degree of computerisation
increased. The use made of the services related closely to the
number of compounds prepared by each chemist, but there was no
significant correlation between a chemist's success in preparing
biologically active compounds and his information use.
The very individual way in which chemists conduct their research
was highlighted by the wide range of use of the information
facilities and the low correlation with background factors. This
makes the design of on-line systems for use by chemists themselves
complex and justifies the existence of the information scientist
as an interface
Management: A continuing bibliography with indexes
This biliography lists 919 reports, articles, and other documents introduced into the NASA scientific and technical information system in 1981
Enhancing Reaction-based de novo Design using Machine Learning
De novo design is a branch of chemoinformatics that is concerned with the rational design of molecular structures with desired properties, which specifically aims at achieving suitable pharmacological and safety profiles when applied to drug design. Scoring, construction, and search methods are the main components that are exploited by de novo design programs to explore the chemical space to encourage the cost-effective design of new chemical entities. In particular, construction methods are concerned with providing strategies for compound generation to address issues such as drug-likeness and synthetic accessibility.
Reaction-based de novo design consists of combining building blocks according to transformation rules that are extracted from collections of known reactions, intending to restrict the enumerated chemical space into a manageable number of synthetically accessible structures. The reaction vector is an example of a representation that encodes topological changes occurring in reactions, which has been integrated within a structure generation algorithm to increase the chances of generating molecules that are synthesisable.
The general aim of this study was to enhance reaction-based de novo design by developing machine learning approaches that exploit publicly available data on reactions. A series of algorithms for reaction standardisation, fingerprinting, and reaction vector database validation were introduced and applied to generate new data on which the entirety of this work relies. First, these collections were applied to the validation of a new ligand-based design tool. The tool was then used in a case study to design compounds which were eventually synthesised using very similar procedures to those suggested by the structure generator.
A reaction classification model and a novel hierarchical labelling system were then developed to introduce the possibility of applying transformations by class. The model was augmented with an algorithm for confidence estimation, and was used to classify two datasets from industry and the literature. Results from the classification suggest that the model can be used effectively to gain insights on the nature of reaction collections.
Classified reactions were further processed to build a reaction class recommendation model capable of suggesting appropriate reaction classes to apply to molecules according to their fingerprints. The model was validated, then integrated within the reaction vector-based design framework, which was assessed on its performance against the baseline algorithm. Results from the de novo design experiments indicate that the use of the recommendation model leads to a higher synthetic accessibility and a more efficient management of computational resources
- …