2 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    The computer storage, retrieval and searching of generic structures in chemical patents : the machine-readable representation of generic structures.

    Get PDF
    The nature of the generic chemical structures found in patents is described, with a discussion of the types of statement commonly found in them. The available representations for such structures are reviewed, with particular note being given to the suitability of the representation for searching files of such structures. Requirements for the unambiguous representation of generic structures in an "ideal" storage and retrieval system are discussed. The basic principles of the theory of formal languages are reviewed, with particular consideration being given to parsing methods for context-free languages. The Grammar and parsing of computer programming languages, as an example of artificial formal languages, is discussed. Applications of formal language theory to chemistry and information work are briefly reviewed. GENSAL, a formal language for the unambiguous description of generic structures from patents, is presented. It is designed to be intelligible to a chemist or patent agent, yet sufficiently ABSTRACT formaLised to be amenabLe to computer anaLysis. DetaiLed description is given of the facilities it provides for generic structure representation, and there is discussion of its Limitations and the principLes behind its design. A connection-tabLe-based internaL representation for generic structures, caLLed an ECTR <Extended Connection TabLe Representation) is presented. It is designed to represent generic structures unambiguousLy, and to be generated automatically from structures encoded in GENSAL. It is compared to other proposed representations, and its implementation using data types of the programming Language PascaL described. An interpreter program which generates an ECTR from structures encoded in a subset of the GENSAL Language is presented. The principles of its operation are described. Possible applications of GENSAL outside the area of patent documentation are discussed, and suggestions made for further work on the development of a generic structure storage and retrieval system based on GENSAL and ECTRs
    corecore