99 research outputs found

    Пользовательский интерфейс для извлечения химико-структурной информации из систематического названия органического соединения

    Get PDF
    The user's interface «Nomenclature Generator» for extraction of the chemical structure information from the systematic name of organic compound represented according to IUPAC nomenclature is developed at the All-Russian Institute for Scientific and Technical Information of Russian Academy of Sciences.В ВИНИТИ РАН разработан пользовательский интерфейс «Номенклатурный Генератор», предназначенный для автоматического извлечения химико-структурной информации из систематического названия органического соединения, данного в номенклатуре ИЮПАК

    Пользовательский интерфейс для извлечения химико-структурной информации из систематического названия органического соединения

    Get PDF
    В ВИНИТИ РАН разработан пользовательский интерфейс «Номенклатурный Генератор», предназначенный для автоматического извлечения химико-структурной информации из систематического названия органического соединения, данного в номенклатуре ИЮПАК

    Chemoinformatics approaches for new drugs discovery

    Get PDF
    Chemoinformatics uses computational methods and technologies to solve chemical problems. It works on molecular structures, their representations, properties and related data. The first and most important phase in this field is the translation of interconnected atomic systems into in-silico models, ensuring complete and correct chemical information transfer. In the last 20 years the chemical databases evolved from the state of molecular repositories to research tools for new drugs identification, while the modern high-throughput technologies allow for continuous chemical libraries size increase as highlighted by publicly available repository like PubChem [http://pubchem.ncbi.nlm.nih.gov/], ZINC [http://zinc.docking.org/], ChemSpider[http://www.chemspider. com/]. Chemical libraries fundamental requirements are molecular uniqueness, absence of ambiguity, chemical correctness (related to atoms, bonds, chemical orthography), standardized storage and registration formats. The aim of this work is the development of chemoinformatics tools and data for drug discovery process. The first part of the research project was focused on accessible commercial chemical space analysis; looking for molecular redundancy and in-silico models correctness in order to identify a unique and univocal molecular descriptor for chemical libraries indexing. This allows for the 0%-redundancy achievement on a 42 millions compounds library. The protocol was implemented as MMsDusty, a web based tool for molecular databases cleaning. The major protocol developed is MMsINC, a chemoinformatics platform based on a starting number of 4 millions non-redundant high-quality annotated and biomedically relevant chemical structures; the library is now being expanded up to 460 millions compounds. MMsINC is able to perform various types of queries, like substructure or similarity search and descriptors filtering. MMsINC is interfaced with PDB(Protein Data Bank)[http://www.rcsb.org/pdb/home/home.do] and related to approved drugs. The second developed protocol is called pepMMsMIMIC, a peptidomimetic screening tool based on multiconformational chemical libraries; the screening process uses pharmacophoric fingerprints similarity to identify small molecules able to geometrically and chemically mimic endogenous peptides or proteins. The last part of this project lead to the implementation of an optimized and exhaustive conformational space analysis protocol for small molecules libraries; this is crucial for high quality 3D molecular models prediction as requested in chemoinformatics applications. The torsional exploration was optimized in the range of most frequent dihedral angles seen in X-ray solved small molecules structures of CSD(Cambridge Structural Database); by appling this on a 89 millions structures library was generated a library of 2.6 x 10 exp 7 high quality conformers. Tools, protocols and platforms developed in this work allow for chemoinformatics analysis and screening on large size chemical libraries achieving high quality, correct and unique chemical data and in-silico model

    Data Base Mapping Model and Search Scheme to Facilitate Resource Sharing: Volume 1, Mapping of Chemical Data Bases and Mapping of Data Base Data Elements Using a Rational Data Base Structure

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / NSF SIS 74-1855

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

    Biochemistry students' difficulties with the symbolic and visual language used in molecular biology.

    Get PDF
    Thesis (Ph.D.)-University of KwaZulu-Natal, Pietermaritzburg, 2007.This study reports on recurring difficulties experienced by undergraduate students with respect to understanding and interpretation of certain symbolism, nomenclature, terminology, shorthand notation, models and other visual representations employed in the field of Molecular Biology to communicate information. Based on teaching experience and guidelines set out by a four-level methodological framework, data on various topic-related difficulties was obtained by inductive analyses of students’ written responses to specifically designed, free-response and focused probes. In addition, interviews, think-aloud exercises and student-generated diagrams were also used to collect information. Both unanticipated and recurring difficulties were compared with scientifically correct propositional knowledge, categorized and subsequently classified. Students were adept at providing the meaning of the symbol “Δ” in various scientific contexts; however, some failed to recognize its use to depict the deletion of a leucine biosynthesis gene in the form, Δ leu. “Hazard to leucine”, “change to leucine” and “abbreviation for isoleucine” were some of the erroneous interpretations of this polysemic symbol. Investigations on these definitions suggest a constructivist approach to knowledge construction and the inappropriate transfer of knowledge from prior mental schemata. The symbol, “::”, was poorly differentiated by students in its use to indicate gene integration or transposition and in tandem gene fusion. Idiosyncratic perceptions emerged suggesting that it is, for example, a proteinaceous component linking genes in a chromosome or the centromere itself associated with the mitotic spindle or “electrons” between genes in the same way that it is symbolically shown in Lewis dot diagrams which illustrate covalent bonding between atoms. In an oligonucleotide shorthand notation, some students used valency to differentiate the phosphite trivalent form of the phosphorus atom from the pentavalent phosphodiester group, yet the concept of valency was poorly understood. By virtue of the visual form of a shorthand notation of the 3,5 phosphodiester link in DNA, the valency was incorrectly read. VSEPR theory and the Octet Rule were misunderstood or forgotten when trying to explain the valency of the phosphorus atom in synthetic oligonucleotide intermediates. Plasmid functional domains were generally well-understood although restriction mapping appeared to be a cognitively demanding task. Rote learning and substitution of definitions were evident in the explanation of promoter and operator functions. The concept of gene expression posed difficulties to many students who believed that genes contain the entity they encode. Transcription and translation of in tandem gene fusions were poorly explained by some students as was the effect of plasmid conformation on transformation and gene expression. With regard to the selection of transformants or the hybridoma, some students could not engage in reasoning or lateral thinking as protoconcepts and domain-specific information were poorly understood. A failure to integrate and reason with factual information on phenotypic traits, media components and biochemical pathways were evident in written and oral presentations. DNA-strand nomenclature and associated function were problematic to some students as they failed to differentiate coding strand from template strand and were prone to interchange the labelling of these. A substitution of labels with those characterizing DNA replication intermediates demonstrated erroneous information transfer. DNA replication models posed difficulties integrating molecular mechanisms and detail with line drawings, coupled with inaccurate illustrations of sequential replication features. Finally, a remediation model is presented, demonstrating a shift in assessment score dispersion from a range of 0 - 4.5 to 4 - 9 when learners are guided metacognitively to work with domain-specific or critical knowledge from an information bank. The present work shows that varied forms of symbolism can present students with complex learning difficulties as the underlying information depicted by these is understood in a superficial way. It is imperative that future studies be focused on the standardization of symbol use, perhaps governed by convention that determines the manner in which threshold information is disseminated on symbol use, coupled by innovative teaching strategies which facilitate an improved understanding of the use of symbolic representations in Molecular Biology. As Molecular Biology advances, it is likely that experts will continue to use new and diverse forms of symbolic representations to explain their findings. The explanation of futuristic Science is likely to develop a symbolic language that will impose great teaching challenges and unimaginable learning difficulties to new generation teachers and learners, respectively
    corecore