Search CORE

34 research outputs found

Special Libraries, May-June 1978

Author: Special Libraries Association
Publication venue: SJSU ScholarWorks
Publication date: 01/05/1978
Field of study

Volume 69, Issue 5-6https://scholarworks.sjsu.edu/sla_sl_1978/1004/thumbnail.jp

SJSU ScholarWorks

The evolution of an on-line chemical search system for an industrial research unit.

Author: Eakin Diane Rosemary
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/1977
Field of study

The objectives of this study were to design an information system, using modern computer technology, to meet a research chemist's need for chemical structural information, to quantify the effects of increasing degrees of computer technology on the use made of the facilities, and to relate the use of the service back to the individual chemist, his performance and background. A computer system was developed based on Wiswesser Line Notation and molecular formula as the chemical structure descriptors. Systems design and analysis were performed so that access to the information could be obtained directly for individual compounds and more generally for classes of compounds. As the system was being developed, its use by information staff was monitored by constant interaction with the people concerned. Where appropriate, the system was modifiea to meet information staff's requirements, but a number of precautions had to be introduced to prevent mis-use. The research chemists' use of the information services was studied retrospectively over a two-year period. In addition to the use made, several other factors were observed for each chemist. These included performance measures and background information on the chemists' research role. The data showed a steady increase in the demand for the services by the research chemist as the degree of computerisation increased. The use made of the services related closely to the number of compounds prepared by each chemist, but there was no significant correlation between a chemist's success in preparing biologically active compounds and his information use. The very individual way in which chemists conduct their research was highlighted by the wide range of use of the information facilities and the low correlation with background factors. This makes the design of on-line systems for use by chemists themselves complex and justifies the existence of the information scientist as an interface

White Rose E-theses Online

Information retrieval and text mining technologies for chemistry

Author: Abacha A. B.
Alberts D.
Alfonso Valencia
American Chemical Society
Anália Lourenço
Aphinyanaphongs Y.
Appelt D. E.
Aramaki E.
Aronson A. R.
Asahara M.
Babych B.
Baeza-Yates R.
Bambenek J.
Barnard J. M.
Bast H.
Batista-Navarro R.
Batista-Navarro R. T.
Bian J.
Bies A.
Bikel D. M.
Blaschke C.
Brecher J. S.
Brill E.
Bunescu R.
Bunescu R. C.
Califf M. E.
Carpenter B.
Caruana R.
Chee B. W.
Chhieng D.
Chinchor N.
Chiticariu L.
Chowdhury M. F. M.
Chowdhury M. F. M.
Ciravegna F.
Cleverdon C. W.
Coden A.
Cohen R.
Collier N.
Corbett P.
Corbett P.
Cover T. M.
Craven M.
Cummings M. D.
Currano J. N.
Currano J. N.
Currano J. N.
Currano J. N.
Cutting D. R.
Davis C. H.
Dieb T. M.
Dieb T. M.
Dogan R. I.
Downs G. M.
Dunikowski L. G.
Embarek M.
Eom J.-H.
Faber J.
Fall C. J.
Fattore M.
Fennell R. W.
Freund Y.
Fujiyoshi A.
Fukuda K.
Gale W. A.
Garcelon N.
Garnier J.-P.
Garten Y.
Ginn R.
Giuliano C.
Gold S.
Grefenstette G.
Grishman R.
Gurulingappa H.
Gurulingappa H.
Gusfield D.
He Y.
Hearst M. A.
Hersh W.
Hersh W.
Hirschman L.
Hobbs J. R.
Hodge G. M.
Holzinger A.
Hsueh P.-Y.
Huber T.
Iyer S. V
Jackson P.
Joachims T.
Johnson D.
Jonnalagadda S.
Jonnalagadda S.
Julen Oyarzabal
Jurafsky D.
Kaewphan S.
Kaewphan S.
Karkaletsis V.
Katragadda S.
Kazama J.
Kazawa H.
Kelly L.
Kenny P. W.
Kim J.-D.
Kim Y.
Kleene S. C.
Kolárik C.
Kongburan W.
Kornai A.
Kraaij W.
Krallinger M.
Krallinger M.
Krallinger M.
Kremer G.
Kreuzthaler M.
Kucera H.
Lai H.
Lawson A. J.
Leaman R.
Leaman R.
Lee C.-H.
Levenshtein V. I.
Levin M. A.
Li J.
Li N.
Li Y.
Liu X.
Locke W. N.
Lovins J. B.
Lowe D. M.
Lupu M.
Lupu M.
Mackenzie C. E.
Manning C. D.
Mansouri A.
Martin E.
Martin Krallinger
Mattmann C.
Maynard D.
McCallum A.
McEwen L.
McKnight L.
McNaught A.
Meystre S. M.
Michalski S. R.
Michie D.
Mihalcea R.
Mitton R.
Miwa M.
Mollá D.
Murray-Rust P.
Müller B.
Nebel A.
Nikfarjam A.
Névéol A.
Névéol A.
Obdulia Rabal
Pang B.
Panico R.
Perez-Iratxeta C.
Ponomareva N.
Ratinov L.
Ratnaparkhi A.
Read J.
Rebholz-Schuhmann D.
Reeker L. H.
Rocchio J. J.
Rohbeck H.-G.
Rosario B.
Roth D. L.
Rupp C. J.
Rupp C. J.
Sagae K.
Salim N.
Salton G.
Sanchez-Cisneros D.
Saracevic T.
Sasaki Y.
Schapire R. E.
Schenck R.
Schenck R. J.
Schlaf A.
Schuemie M. J.
Segura Bedmar I.
Segura-Bedmar I.
Sekine S.
Sequeira E.
Settles B.
Settles B.
Sewell W.
Shen D.
Shidha M. V
Singhal A.
Smith E. G.
Stamatatos E.
Sutton C.
Sætre R.
Taylor K. T.
Tharatipyakul A.
Tomanek K.
Tomanek K.
Tsuruoka Y.
Tsuruoka Y.
Täger W.
Urbain J.
van Rijsbergen C. J.
Vapnik V. N.
Vasserman A.
Visweswaran S.
Voorhees E. M.
Wang W.
Wang Y.
Wei C.-H.
Wei C.-H.
Wermter J.
Wilbur W. J.
Willett P.
Willett P.
Williams A. J.
Witten I. H.
Workman M. L.
Wrublewski D. T.
Xu R.
Xue N.
Yan S.
Yang C.
Yang C. C.
Yang Y.
Zass E.
Zipf G. K.
Zipf G. K.
Zitnik S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Extraction of chemical structures and reactions from the literature

Author: Lowe Daniel Mark
Publication venue: University of Cambridge
Publication date: 09/10/2012
Field of study

The ever increasing quantity of chemical literature necessitates the creation of automated techniques for extracting relevant information. This work focuses on two aspects: the conversion of chemical names to computer readable structure representations and the extraction of chemical reactions from text. Chemical names are a common way of communicating chemical structure information. OPSIN (Open Parser for Systematic IUPAC Nomenclature), an open source, freely available algorithm for converting chemical names to structures was developed. OPSIN employs a regular grammar to direct tokenisation and parsing leading to the generation of an XML parse tree. Nomenclature operations are applied successively to the tree with many requiring the manipulation of an in-memory connection table representation of the structure under construction. Areas of nomenclature supported are described with attention being drawn to difficulties that may be encountered in name to structure conversion. Results on sets of generated names and names extracted from patents are presented. On generated names, recall of between 96.2% and 99.0% was achieved with a lower bound of 97.9% on precision with all results either being comparable or superior to the tested commercial solutions. On the patent names OPSIN s recall was 2-10% higher than the tested solutions when the patent names were processed as found in the patents. The uses of OPSIN as a web service and as a tool for identifying chemical names in text are shown to demonstrate the direct utility of this algorithm. A software system for extracting chemical reactions from the text of chemical patents was developed. The system relies on the output of ChemicalTagger, a tool for tagging words and identifying phrases of importance in experimental chemistry text. Improvements to this tool required to facilitate this task are documented. The structure of chemical entities are where possible determined using OPSIN in conjunction with a dictionary of name to structure relationships. Extracted reactions are atom mapped to confirm that they are chemically consistent. 424,621 atom mapped reactions were extracted from 65,034 organic chemistry USPTO patents. On a sample of 100 of these extracted reactions chemical entities were identified with 96.4% recall and 88.9% precision. Quantities could be associated with reagents in 98.8% of cases and 64.9% of cases for products whilst the correct role was assigned to chemical entities in 91.8% of cases. Qualitatively the system captured the essence of the reaction in 95% of cases. This system is expected to be useful in the creation of searchable databases of reactions from chemical patents and in facilitating analysis of the properties of large populations of reactions

Apollo (Cambridge)

Special Libraries, July 1984

Author: Special Libraries Association
Publication venue: SJSU ScholarWorks
Publication date: 01/07/1981
Field of study

Volume 75, Issue 3https://scholarworks.sjsu.edu/sla_sl_1984/1002/thumbnail.jp

SJSU ScholarWorks

Substructure Searching of Computer-Readable Chemical Abstracts Service Ninth Collective Index Chemical Nomenclature Files

Author: G. G. Vander Stouw
J. A. Scott
L. D. Mitchell
W. Fisanick
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Data bases and data base systems related to NASA's aerospace program. A bibliography with indexes

Author
Publication venue
Publication date
Field of study

This bibliography lists 1778 reports, articles, and other documents introduced into the NASA scientific and technical information system, 1975 through 1980

NASA Technical Reports Server

Cheminformatics and Computational Approaches for Identifying and Managing Unknown Chemicals in the Environment

Author: Lai Adelene
Publication venue: Friedrich Schiller University, Jena, Germany
Publication date: 01/01/2022
Field of study

In most societies, using chemical products has become a part of daily life. Worldwide, over 350,000 chemicals have been registered for use in e.g., daily household consumption, industrial processes, agriculture, etc. However, despite the benefits chemicals may bring to society, their usage, production, and disposal, which leads to their eventual release into the environment has multiple implications. Anthropogenic chemicals have been detected in myriad ecosystems all over the planet, as well as in the tissues of wildlife and humans. The potential consequences of such chemical pollution are not fully understood, but links to the onset of human disease and threats to biodiversity have been attributed to the presence of chemicals in our environment. Mitigating the potential negative effects of chemicals typically involves regulatory steps and multiple stakeholders. One key aspect thereof is environmental monitoring, which consists of environmental sampling, measurement, data analysis, and reporting. In recent years, advancements in Liquid Chromatography-High Resolution Mass Spectrometry (LC-HRMS), open chemical databases, and software have enabled researchers to identify known (e.g., pesticides) as well as unknown environmental chemicals, commonly referred to as suspect or non-target compounds. However, identifying unknown chemicals, particularly non-targets, remains extremely challenging because of the lack of a priori knowledge on the analytes - all that is available are their mass spectrometry signals. In fact, the number of unknown features in a typical mass spectrum of an environmental sample is in the range of thousands to tens of thousands, and therefore requires feature prioritisation before identification within a suitable workflow. In this dissertation work, collaborations with two regulatory authorities responsible for environmental monitoring sought to identify relevant unknown compounds in the environment, specifically by developing computational workflows for unknown identification in LC-HRMS data. The first collaboration culminated in Publication A, which involved a joint project with the Zürcher Amt für Wasser, Energie und Luft. Environmental samples taken from wastewater treatment plant sites in Switzerland were retrospectively analysed using a pre-screening workflow that prioritised features suitable for non-target identification. For this purpose, a multi-step Quality Control algorithm that checks the quality of mass spectral data in terms of peak intensities, alignment, and signal-to-noise ratio was developed and used within pre-screening. This algorithm was incorporated into the R package Shinyscreen. Features that were prioritised by pre-screening then underwent identification using the in silico fragmentation tool MetFrag. To obtain these identifications, MetFrag was coupled to various open chemical information resources such as spectral databases like MassBank Europe and MassBank of North America, as well as suspect lists from the NORMAN Suspect List Exchange and the CompTox Chemicals Dashboard database. One confirmed and twenty-one tentative compound identifications were achieved and reported according to an established confidence level scheme. Comprehensive data interpretation and detailed communication of MetFrag’s results was performed as a means of formulating evidence-based recommendations that may inform future environmental monitoring campaigns. Building on the pre-screening and identification workflow developed in Publication A, Publication B resulted from a collaboration with the Luxembourgish Administration de la gestion de l’eau that sought to identify, and where possible quantify unknown chemicals in Luxembourgish surface waters. More specifically, surface water samples collected as part of a two-year national monitoring campaign were measured using LC-HRMS and screened for pharmaceutical parent compounds and their transformation products. Compared to pharmaceutical compound information, which is publicly available from local authorities (and was used in the suspect list), information on transformation products is relatively scarce. Therefore, new approaches were developed in this work to mine data from the PubChem database as well as from the literature in order to formulate a suspect list containing pharmaceutical transformation products, in addition to their parent compounds. Overall, 94 pharmaceuticals and 14 transformation products were identified, of which 88 and 2 were confirmed identifications respectively. The spatio-temporal occurrence and distribution of these compounds throughout the Luxembourgish environment were analysed using advanced data visualisations that highlighted patterns in certain regions and time periods of high incidence. These findings may support future chemicals management measures, particularly in environmental monitoring. Another challenging aspect of managing chemicals is that they mostly exist as complex mixtures within the environment as well as chemical products. Substances of Unknown or Variable composition, Complex reaction products or Biological materials (UVCBs) make up 20-40% of international chemical registries and include chlorinated paraffins, polymer mixtures, petroleum fractions, and essential oils. However, little is known about their chemical identities and/or compositions, which poses formidable obstacles to assessing their environmental fate and toxicity, let alone identification in the environment. Publication C addresses the challenges of UVCBs by taking an interdisciplinary approach in reviewing the literature that incorporates considerations of their chemical representations, toxicity, environmental fate, exposure, and regulatory approaches. Improved substance registration requirements, grouping techniques to simplify assessment, and the use of Mixture InChI to represent UVCBs in a findable, accessible, interoperable, and reusable (FAIR) way in databases are amongst the key recommendations of this work. A specific type of UVCB, mixtures of homologous compounds, are commonly detected in environmental samples, including many High Production Volume (HPV) compounds such as surfactants. Compounds forming homologous series are related by a common core fragment and repeating chemical subunit, and can be represented using general formulae (e.g., CnF2n+1COOH) and/or Markush structures. However, a significant identification bottleneck is the inability to match their characteristic analytical signals in LC-HRMS data with chemicals in databases; while comb-like elution patterns and constant differences in mass-to-charge ratio indicate the presence of homologous series in samples, most chemical databases do not contain annotated homologous series. To address this gap, Publication D introduces a cheminformatics algorithm, OngLai, to detect homologous series within compound datasets. OngLai, openly implemented in Python using the RDKit, detects homologous series based on two inputs: a list of compounds and the chemical structure of a repeating unit. OngLai was applied to three open datasets from environmental chemistry, exposomics, and natural products, in which thousands of homologous series with a CH2 repeating unit were detected. Classification of homologous series in compound datasets is expected to advance their analytical detection in samples. Overall, the work in this dissertation contributed to the advancement of identifying and managing unknown chemicals in the environment using cheminformatics and computational approaches. All work conducted followed Open Science and FAIR data principles: all code, datasets, analyses, and results generated, including the final peer-reviewed publications, are openly available to the public. These efforts are intended to spur further developments in unknown chemical identification and management towards protecting the environment and human health

Open Repository and Bibliography - Luxembourg

Digitale Bibliothek Thüringen

Play Among Books

Author: _ch3n81 Alice
Roman Miro
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 11/01/2022
Field of study

How does coding change the way we think about architecture? Miro Roman and his AI Alice_ch3n81 develop a playful scenario in which they propose coding as the new literacy of information. They convey knowledge in the form of a project model that links the fields of architecture and information through two interwoven narrative strands in an “infinite flow” of real books

Directory of Open Access Books (DOAB)

Play Among Books

Author: _ch3n81 Alice
Roman Miro
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library