168 research outputs found

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

    IST Austria Thesis

    Get PDF
    This dissertation focuses on algorithmic aspects of program verification, and presents modeling and complexity advances on several problems related to the static analysis of programs, the stateless model checking of concurrent programs, and the competitive analysis of real-time scheduling algorithms. Our contributions can be broadly grouped into five categories. Our first contribution is a set of new algorithms and data structures for the quantitative and data-flow analysis of programs, based on the graph-theoretic notion of treewidth. It has been observed that the control-flow graphs of typical programs have special structure, and are characterized as graphs of small treewidth. We utilize this structural property to provide faster algorithms for the quantitative and data-flow analysis of recursive and concurrent programs. In most cases we make an algebraic treatment of the considered problem, where several interesting analyses, such as the reachability, shortest path, and certain kind of data-flow analysis problems follow as special cases. We exploit the constant-treewidth property to obtain algorithmic improvements for on-demand versions of the problems, and provide data structures with various tradeoffs between the resources spent in the preprocessing and querying phase. We also improve on the algorithmic complexity of quantitative problems outside the algebraic path framework, namely of the minimum mean-payoff, minimum ratio, and minimum initial credit for energy problems. Our second contribution is a set of algorithms for Dyck reachability with applications to data-dependence analysis and alias analysis. In particular, we develop an optimal algorithm for Dyck reachability on bidirected graphs, which are ubiquitous in context-insensitive, field-sensitive points-to analysis. Additionally, we develop an efficient algorithm for context-sensitive data-dependence analysis via Dyck reachability, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is (i)~linear in the number of call sites and (ii)~only logarithmic in the size of the whole library, as opposed to linear in the size of the whole library. Finally, we prove that Dyck reachability is Boolean Matrix Multiplication-hard in general, and the hardness also holds for graphs of constant treewidth. This hardness result strongly indicates that there exist no combinatorial algorithms for Dyck reachability with truly subcubic complexity. Our third contribution is the formalization and algorithmic treatment of the Quantitative Interprocedural Analysis framework. In this framework, the transitions of a recursive program are annotated as good, bad or neutral, and receive a weight which measures the magnitude of their respective effect. The Quantitative Interprocedural Analysis problem asks to determine whether there exists an infinite run of the program where the long-run ratio of the bad weights over the good weights is above a given threshold. We illustrate how several quantitative problems related to static analysis of recursive programs can be instantiated in this framework, and present some case studies to this direction. Our fourth contribution is a new dynamic partial-order reduction for the stateless model checking of concurrent programs. Traditional approaches rely on the standard Mazurkiewicz equivalence between traces, by means of partitioning the trace space into equivalence classes, and attempting to explore a few representatives from each class. We present a new dynamic partial-order reduction method called the Data-centric Partial Order Reduction (DC-DPOR). Our algorithm is based on a new equivalence between traces, called the observation equivalence. DC-DPOR explores a coarser partitioning of the trace space than any exploration method based on the standard Mazurkiewicz equivalence. Depending on the program, the new partitioning can be even exponentially coarser. Additionally, DC-DPOR spends only polynomial time in each explored class. Our fifth contribution is the use of automata and game-theoretic verification techniques in the competitive analysis and synthesis of real-time scheduling algorithms for firm-deadline tasks. On the analysis side, we leverage automata on infinite words to compute the competitive ratio of real-time schedulers subject to various environmental constraints. On the synthesis side, we introduce a new instance of two-player mean-payoff partial-information games, and show how the synthesis of an optimal real-time scheduler can be reduced to computing winning strategies in this new type of games

    Compiling for parallel multithreaded computation on symmetric multiprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 145-149).by Andrew Shaw.Ph.D

    Automatic Loanword Identification Using Tree Reconciliation

    Get PDF
    Die Verwendung von computerbasierten Methoden in der Historischen Linguistik stieg in den letzten Jahren stetig an. Phylogenetische Methoden, welche zur Bestimmung der Evolutionsgeschichte und Verwandtschaftsgraden zwischen Organismen entwickelt wurden, erhielten Einzug in die Historische Linguistik. Die Verfügbarkeit von maschinenlesbaren Daten förderten deren Anpassung und Weiterentwicklung. Während einige Algorithmen zur Rekonstruktion der sprachlichen Evolutionsgeschichte übernommen wurden, wurde den Methoden für horizontalen Transfer kaum Beachtung geschenkt. Angelehnt an die Parallele zwischen horizontalem Gentransfer und Entlehnung, werden in dieser Arbeit phylogenetische Methoden zur Erkennung von horizontalem Gentransfer für die Identifikation von Lehnwörtern verwendet. Die Algorithmen für horizontalen Gentransfer basieren auf dem Vergleich zweier phylogenetischer Bäume. In der Linguistik bildet der Sprachbaum die Sprachgeschichte ab, während ein Konzeptbaum die Evolutionsgeschichte einzelner Wörter repräsentiert. Die Rekonstruktion eines Sprachbaumes ist wissenschaftlich fundiert, wohingegen die Rekonstruktion von Konzeptbäumen bisher wenig erforscht wurde. Eine erhebliche Innovation dieser Arbeit ist die Einführung verschiedener Methoden zur Rekonstruktion von stabilen Konzeptbäumen. Da die Algorithmen zur Erkennung von horizontalem Transfer auf einem Baumvergleich basieren, deuten die Unterschiede zwischen einem Sprachbaum und einem Konzeptbaum auf Lehnwörter innerhalb der Daten hin. Daher wird sowohl die Methodik, als auch ein geeigneter Algorithmus in einem linguistischen Kontext eingeführt. Die Ergebnisse der Lehnworterkennung werden mithilfe eines neu entwickelten Goldstandards evaluiert und mit drei weiteren Algorithmen aus der Historischen Computerlinguistik verglichen. Ziel der Arbeit ist zu erläutern, inwieweit Algorithmen basierend auf dem Vergleich zweier Bäume für die automatische Lehnworterkennung verwendet und in welchem Umfang Lehnwörter erfolgreich innerhalb der Daten bestimmt werden können. Die Identifikation von Lehnwörtern trägt zu einem tieferen Verständnis von Sprachkontakt und den unterschiedlichen Arten von Lehnwörtern bei. Daher ist die Adaption von phylogenetischen Methoden nicht nur lohnenswert für die Bestimmungen von Entlehnungen, sondern dient auch als Basis für weitere, detailliertere Analysen auf den Gebieten der automatischen Lehnworterkennung und Kontaktlinguistik.The use of computational methods in historical linguistics increased during the last years. Phylogenetic methods, which explore the evolutionary history and relationships among organisms, found their way into historical linguistics. The availability of machine-readable data accelerated their adaptation and development. While some methods addressing the evolution of languages are integrated into linguistics, scarcely any attention has been paid to methods analyzing horizontal transmission. Inspired by the parallel between horizontal gene transfer and borrowing, this thesis aims at adapting horizontal transfer methods into computational historical linguistics to identify borrowing scenarios along with the transferred loanwords. Computational methods modeling horizontal transfer are based on the framework of tree reconciliation. The methods attempt to detect horizontal transfer by fitting the evolutionary history of words to the evolution of their corresponding languages, both represented in phylogenetic trees. The discordance between the two evolutionary scenarios indicates the influence of loanwords due to language contact. The tree reconciliation framework is introduced in a linguistic setting along with an appropriate algorithm, which is applied to linguistic trees to detect loanwords. While the reconstruction of language trees is scientifically substantiated, little research has so far be done on the reconstruction of concept trees, representing the words’ histories. One major innovation of this thesis is the introduction of various methods to reconstruct reliable concept trees and determine their stability in order to achieve reasonable results in terms of loanword detection. The results of the tree reconciliation are evaluated against a newly developed gold standard and compared to three methods established for the task of language contact detection in computational historical linguistics. The main aim of this thesis is to clarify the purpose of tree reconciliation methods in linguistics. The following analyses should give insights to which degree the direct transfer of phylogenetic methods into the field of linguistics is fruitful and can be used to discover borrowings along with the transferred loanwords. The identification of loanwords is a first step into the direction of a deeper understanding of contact scenarios and possible types of loanwords present in linguistic data. The adaptation of phylogenetic methods is not only worthwhile to shed light on detailed horizontal transmissions, but serves as basis for further, more detailed analyses in the field of contact linguistics

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Active self-diagnosis in telecommunication networks

    Get PDF
    Les réseaux de télécommunications deviennent de plus en plus complexes, notamment de par la multiplicité des technologies mises en œuvre, leur couverture géographique grandissante, la croissance du trafic en quantité et en variété, mais aussi de par l évolution des services fournis par les opérateurs. Tout ceci contribue à rendre la gestion de ces réseaux de plus en plus lourde, complexe, génératrice d erreurs et donc coûteuse pour les opérateurs. On place derrière le terme réseaux autonome l ensemble des solutions visant à rendre la gestion de ce réseau plus autonome. L objectif de cette thèse est de contribuer à la réalisation de certaines fonctions autonomiques dans les réseaux de télécommunications. Nous proposons une stratégie pour automatiser la gestion des pannes tout en couvrant les différents segments du réseau et les services de bout en bout déployés au-dessus. Il s agit d une approche basée modèle qui adresse les deux difficultés du diagnostic basé modèle à savoir : a) la façon d'obtenir un tel modèle, adapté à un réseau donné à un moment donné, en particulier si l'on souhaite capturer plusieurs couches réseau et segments et b) comment raisonner sur un modèle potentiellement énorme, si l'on veut gérer un réseau national par exemple. Pour répondre à la première difficulté, nous proposons un nouveau concept : l auto-modélisation qui consiste d abord à construire les différentes familles de modèles génériques, puis à identifier à la volée les instances de ces modèles qui sont déployées dans le réseau géré. La seconde difficulté est adressée grâce à un moteur d auto-diagnostic actif, basé sur le formalisme des réseaux Bayésiens et qui consiste à raisonner sur un fragment du modèle du réseau qui est augmenté progressivement en utilisant la capacité d auto-modélisation: des observations sont collectées et des tests réalisés jusqu à ce que les fautes soient localisées avec une certitude suffisante. Cette approche de diagnostic actif a été expérimentée pour réaliser une gestion multi-couches et multi-segments des alarmes dans un réseau IMS.While modern networks and services are continuously growing in scale, complexity and heterogeneity, the management of such systems is reaching the limits of human capabilities. Technically and economically, more automation of the classical management tasks is needed. This has triggered a significant research effort, gathered under the terms self-management and autonomic networking. The aim of this thesis is to contribute to the realization of some self-management properties in telecommunication networks. We propose an approach to automatize the management of faults, covering the different segments of a network, and the end-to-end services deployed over them. This is a model-based approach addressing the two weaknesses of model-based diagnosis namely: a) how to derive such a model, suited to a given network at a given time, in particular if one wishes to capture several network layers and segments and b) how to reason a potentially huge model, if one wishes to manage a nation-wide network for example. To address the first point, we propose a new concept called self-modeling that formulates off-line generic patterns of the model, and identifies on-line the instances of these patterns that are deployed in the managed network. The second point is addressed by an active self-diagnosis engine, based on a Bayesian network formalism, that consists in reasoning on a progressively growing fragment of the network model, relying on the self-modeling ability: more observations are collected and new tests are performed until the faults are localized with sufficient confidence. This active diagnosis approach has been experimented to perform cross-layer and cross-segment alarm management on an IMS network.RENNES1-Bibl. électronique (352382106) / SudocSudocFranceF

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 248, ISAAC 2022, Complete Volum
    corecore