101 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    ClassyFire: automated chemical classification with a comprehensive, computable taxonomy

    Get PDF
    Additional file 5. Use cases. Text-based search on the ClassyFire web server. (A) Building the query. (B) Sparteine, one of the returned compounds

    Development of deep learning applications for the automated extraction of chemical information from scientific literature

    Get PDF
    This dissertation focuses on developing deep learning applications for extracting chemical information from scientific literature, particularly targeting the automated recognition of molecular structures in images. DECIMER Segmentation, a novel application, employs a Mask Region-based Convolutional Neural Network (MRCNN) model to segment chemical structures in documents, aided by a mask expansion algorithm, marking a significant advancement in processing chemical literature. The Optical Chemical Structure Recognition (OCSR) tool DECIMER Image Transformer uses an encoder-decoder architecture to convert chemical structure depictions into the machine-readable SMILES format. The model has been trained on over 450 million pairs of images and SMILES representations. Its ability to interpret various depiction styles, including hand-drawn structures, sets a new standard in OCSR. To artificially generate large and diverse OCSR training datasets using multiple cheminformatics toolkits, RanDepict was developed. The diversification of training data ensures robust model generalisation across different chemical structure depictions. A unique dataset of hand-drawn molecule images was created to evaluate the model's performance in interpreting these challenging depictions. This dataset further contributes to the understanding of automated structure recognition from diverse styles. The integration of these technologies led to the creation of DECIMER.ai, an open-source web application that combines segmentation and interpretation tools, allowing users to extract and process chemical information from literature efficiently. The work concludes with a discussion on the significance of open data in advancing molecular informatics, highlighting the potential to broader chemical research domains. By adhering to FAIR data standards and open-source principles, the tools developed for this dissertation are designed for adaptability and future development within the community

    Patent Database: Their Importance in Prior Art Documentation and Patent Search

    Get PDF
    In knowledge based economies the nation’s economic status depends on the production, distribution and use of knowledge and information. The recent trend in the economic growth of nations is mainly determined by innovative technological knowhow of the individuals. Intellectual property has gained attention in this era of knowledge. The vast amount of data generated through the application of intellectual assets is managed with the help of various in- silico tools. In recent days, the patent databases have gained importance due to the detailed information available on the granted patent and other details, such as, legal status of the patent applications, which are not available through any other literature search. This review paper attempts to describe different types of patent databases available, their unique features, strengths, weakness and their major purpose. This paper details the information on how to access a patent database, the relevance of patent information obtained from these databases in prior art search, patent analysis, and the drawbacks present in these patent databases

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Science Inside Law: The Making of a New Patent Class in the International Patent Classification

    Get PDF
    Recent studies of patents have argued that the very materiality and techniques of legal media, such as the written patent document, are vital for the legal construction of a patentable invention. Developing the centrality placed on patent documents further, it becomes important to understand how these documents are ordered and mobilized. Patent classification answers the necessity of making the virtual nature of textual claims practicable by linking written inscription to bureaucracy. Here, the epistemological organization of documents overlaps with the grid of patent administration. How are scientific inventions represented in such a process? If we examine the process of creating a new patent category within the International Patent Classification (IPC), it becomes clear that disagreements about the substance of the novel inventive subject matter have been resolved by computer simulations of patent documents in draft classifications. The practical needs of patent examiners were the most important concerns in the making of a new category. Such a lack of epistemological mediation between the scientific and legal identities of an invention depicts a legal understanding that science is already inside patent law. From an internal legal perspective, the self-referential introduction of the new patent category may make practical sense; however it becomes problematic from a technological and scientific standpoint as the remit of the patent classification also affects other social contexts and practice

    Towards Inference of a Biochemical Ontology From a Metabolic Database

    Get PDF
    In order to predict the metabolic fate of an arbitrary compound based solely on structure, it is useful to be able to identify substructural ‘functional groups’ that are biochemically reactive. These functional groups are the substructural elements that can be removed and replaced to transform one compound into another. This problem of identifying functional groups is related to the problem of classifying compounds. The research presented here discusses the state of the art in biochemical databases and how these sources may be applied to the problem of classifying compounds based solely on structure. We describe a biochemical informatics system for processing molecular data and describe how 100 255 compositional (hasA) relationships are inferred between 835 abstractions and 9500 metabolites from the KEGG Ligand database. Specifically, we focus on the identification of amino acids and consider ways in which the inference of biochemical ontologies for metabolites will be improved in the future
    corecore