739 research outputs found

    Similarity Methods in Chemoinformatics

    Get PDF
    promoting access to White Rose research paper

    The computer storage, retrieval and searching of generic structures in chemical patents : the machine-readable representation of generic structures.

    Get PDF
    The nature of the generic chemical structures found in patents is described, with a discussion of the types of statement commonly found in them. The available representations for such structures are reviewed, with particular note being given to the suitability of the representation for searching files of such structures. Requirements for the unambiguous representation of generic structures in an "ideal" storage and retrieval system are discussed. The basic principles of the theory of formal languages are reviewed, with particular consideration being given to parsing methods for context-free languages. The Grammar and parsing of computer programming languages, as an example of artificial formal languages, is discussed. Applications of formal language theory to chemistry and information work are briefly reviewed. GENSAL, a formal language for the unambiguous description of generic structures from patents, is presented. It is designed to be intelligible to a chemist or patent agent, yet sufficiently ABSTRACT formaLised to be amenabLe to computer anaLysis. DetaiLed description is given of the facilities it provides for generic structure representation, and there is discussion of its Limitations and the principLes behind its design. A connection-tabLe-based internaL representation for generic structures, caLLed an ECTR <Extended Connection TabLe Representation) is presented. It is designed to represent generic structures unambiguousLy, and to be generated automatically from structures encoded in GENSAL. It is compared to other proposed representations, and its implementation using data types of the programming Language PascaL described. An interpreter program which generates an ECTR from structures encoded in a subset of the GENSAL Language is presented. The principles of its operation are described. Possible applications of GENSAL outside the area of patent documentation are discussed, and suggestions made for further work on the development of a generic structure storage and retrieval system based on GENSAL and ECTRs

    The evolution of an on-line chemical search system for an industrial research unit.

    Get PDF
    The objectives of this study were to design an information system, using modern computer technology, to meet a research chemist's need for chemical structural information, to quantify the effects of increasing degrees of computer technology on the use made of the facilities, and to relate the use of the service back to the individual chemist, his performance and background. A computer system was developed based on Wiswesser Line Notation and molecular formula as the chemical structure descriptors. Systems design and analysis were performed so that access to the information could be obtained directly for individual compounds and more generally for classes of compounds. As the system was being developed, its use by information staff was monitored by constant interaction with the people concerned. Where appropriate, the system was modifiea to meet information staff's requirements, but a number of precautions had to be introduced to prevent mis-use. The research chemists' use of the information services was studied retrospectively over a two-year period. In addition to the use made, several other factors were observed for each chemist. These included performance measures and background information on the chemists' research role. The data showed a steady increase in the demand for the services by the research chemist as the degree of computerisation increased. The use made of the services related closely to the number of compounds prepared by each chemist, but there was no significant correlation between a chemist's success in preparing biologically active compounds and his information use. The very individual way in which chemists conduct their research was highlighted by the wide range of use of the information facilities and the low correlation with background factors. This makes the design of on-line systems for use by chemists themselves complex and justifies the existence of the information scientist as an interface

    Management: A continuing bibliography with indexes

    Get PDF
    This biliography lists 919 reports, articles, and other documents introduced into the NASA scientific and technical information system in 1981

    Matching algorithms for handling three dimensional molecular co-ordinate data.

    Get PDF

    Enhancing Reaction-based de novo Design using Machine Learning

    Get PDF
    De novo design is a branch of chemoinformatics that is concerned with the rational design of molecular structures with desired properties, which specifically aims at achieving suitable pharmacological and safety profiles when applied to drug design. Scoring, construction, and search methods are the main components that are exploited by de novo design programs to explore the chemical space to encourage the cost-effective design of new chemical entities. In particular, construction methods are concerned with providing strategies for compound generation to address issues such as drug-likeness and synthetic accessibility. Reaction-based de novo design consists of combining building blocks according to transformation rules that are extracted from collections of known reactions, intending to restrict the enumerated chemical space into a manageable number of synthetically accessible structures. The reaction vector is an example of a representation that encodes topological changes occurring in reactions, which has been integrated within a structure generation algorithm to increase the chances of generating molecules that are synthesisable. The general aim of this study was to enhance reaction-based de novo design by developing machine learning approaches that exploit publicly available data on reactions. A series of algorithms for reaction standardisation, fingerprinting, and reaction vector database validation were introduced and applied to generate new data on which the entirety of this work relies. First, these collections were applied to the validation of a new ligand-based design tool. The tool was then used in a case study to design compounds which were eventually synthesised using very similar procedures to those suggested by the structure generator. A reaction classification model and a novel hierarchical labelling system were then developed to introduce the possibility of applying transformations by class. The model was augmented with an algorithm for confidence estimation, and was used to classify two datasets from industry and the literature. Results from the classification suggest that the model can be used effectively to gain insights on the nature of reaction collections. Classified reactions were further processed to build a reaction class recommendation model capable of suggesting appropriate reaction classes to apply to molecules according to their fingerprints. The model was validated, then integrated within the reaction vector-based design framework, which was assessed on its performance against the baseline algorithm. Results from the de novo design experiments indicate that the use of the recommendation model leads to a higher synthetic accessibility and a more efficient management of computational resources
    • …
    corecore