67 research outputs found
An extensive English language bibliography on graph theory and its applications
Bibliography on graph theory and its application
Computational methods for small molecules
Metabolism is the system of chemical reactions sustaining life in the cells of living organisms. It is responsible for cellular processes that break down nutrients for energy and produce building blocks for necessary molecules. The study of metabolism is vital to many disciplines in medicine and pharmacy. Chemical reactions operate on small molecules called metabolites, which form the core of metabolism. In this thesis we propose efficient computational methods for small molecules in metabolic applications. In this thesis we discuss four distinctive studies covering two major themes: the atom-level description of biochemical reactions, and analysis of tandem mass spectrometric measurements of metabolites.
In the first part we study atom-level descriptions of organic reactions. We begin by proposing an optimal algorithm for determining the atom-to-atom correspondences between the reactant and product metabolites of organic reactions. In addition, we introduce a graph edit distance based cost as the mathematical formalism to determine optimality of atom mappings. We continue by proposing a compact single-graph representation of reactions using the atom mappings. We investigate the utility of the new representation in a reaction function classification task, where a descriptive category of the reaction's function is predicted. To facilitate the prediction, we introduce the first feasible path-based graph kernel, which describes the reactions as path sequences to high classification accuracy.
In the second part we turn our focus on analysing tandem mass spectrometric measurements of metabolites. In a tandem mass spectrometer, an input molecule structure is fragmented into substructures or fragments, whose masses are observed. We begin by studying the fragment identification problem. A combinatorial algorithm is presented to enumerate candidate substructures based on the given masses. We also demonstrate the usefulness of utilising approximated bond energies as a cost function to rank the candidate structures according to their chemical feasibility. We propose fragmentation tree models to describe the dependencies between fragments for higher identification accuracy.
We continue by studying a closely related problem where an unknown metabolite is elucidated based on its tandem mass spectrometric fragment signals. This metabolite identification task is an important problem in metabolomics, underpinning the subsequent modelling and analysis efforts. We propose an automatic machine learning framework to predict a set of structural properties of the unknown metabolite. The properties are turned into candidate structures by a novel statistical model. We introduce the first mass spectral kernels and explore three feature classes to facilitate the prediction. The kernels introduce support for high-accuracy mass spectrometric measurements for enhanced predictive accuracy.Tässä väitöskirjassa esitetään tehokkaita laskennallisia menetelmiä pienille molekyyleille aineenvaihduntasovelluksissa. Aineenvaihdunta on kemiallisten reaktioiden järjestelmä, joka ylläpitää elämää solutasolla. Aineenvaihduntaprosessit hajottavat ravinteita energiaksi ja rakennusaineiksi soluille tarpeellisten molekyylien valmistamiseen. Kemiallisten reaktioiden muokkaamia pieniä molekyylejä kutsutaan metaboliiteiksi. Tämä väitöskirja sisältää neljä itsenäistä tutkimusta, jotka jakautuvat teemallisesti biokemiallisten reaktioiden atomitason kuvaamiseen ja metaboliittien massaspektrometriamittausten analysointiin.
Väitöskirjan ensimmäisessä osassa käsitellään biokemiallisten reaktioiden atomitason kuvauksia. Väitöskirjassa esitellään optimaalinen algoritmi reaktioiden lähtö- ja tuoteaineiden välisten atomikuvausten määrittämiseen. Optimaalisuus määrittyy verkkojen editointietäisyyteen perustuvalla kustannusfunktiolla. Optimaalinen atomikuvaus mahdollistaa reaktion kuvaamisen yksikäsitteisesti yhdellä verkolla. Uutta reaktiokuvausta hyödynnetään reaktion funktion ennustustehtävässä, jossa pyritään määrittämään reaktiota sanallisesti kuvaava kategoria automaattisesti. Väitöskirjassa esitetään polku-perustainen verkkokerneli, joka kuvaa reaktiot atomien polkusekvensseinä verrattuna aiempiin kulkusekvensseihin saavuttaen paremman ennustustarkkuuden.
Väitöskirjan toisessa osassa analysoidaan metaboliittien tandem-massaspektrometriamittauksia. Tandem-massaspektrometri hajottaa analysoitavan syötemolekyylin fragmenteiksi ja mittaa niiden massa-varaus suhteet. Väitöskirjassa esitetään perusteellinen kombinatorinen algoritmi fragmenttien tunnistamiseen. Menetelmän kustannusfunktio perustuu fragmenttien sidosenergioiden vertailuun. Lopuksi väitöskirjassa esitetään fragmentaatiopuut, joiden avulla voidaan mallintaa fragmenttien välisiä suhteita ja saavuttaa parempi tunnistustarkkuus.
Fragmenttien tunnistuksen ohella voidaan tunnistaa myös analysoitavia metaboliitteja. Ongelma on merkittävä ja edellytys aineenvaihdunnun analyyseille. Väitöskirjassa esitetään koneoppimismenetelmä, joka ennustaa tuntemattoman metaboliitin rakennetta kuvaavia piirteitä ja muodostaa niiden perusteella rakenne-ennusteita tilastollisesti. Menetelmä esittelee ensimmäiset erityisesti massaspektrometriadataan soveltuvat kernel-funktiot ja saavuttaa hyvän ennustustarkkuuden
Impact of Symmetries in Graph Clustering
Diese Dissertation beschäftigt sich mit der durch die Automorphismusgruppe definierten Symmetrie von Graphen und wie sich diese auf eine Knotenpartition, als Ergebnis von Graphenclustering, auswirkt. Durch eine Analyse von nahezu 1700 Graphen aus verschiedenen Anwendungsbereichen kann gezeigt werden, dass mehr als 70 % dieser Graphen Symmetrien enthalten. Dies bildet einen Gegensatz zum kombinatorischen Beweis, der besagt, dass die Wahrscheinlichkeit eines zufälligen Graphen symmetrisch zu sein bei zunehmender Größe gegen Null geht. Das Ergebnis rechtfertigt damit die Wichtigkeit weiterer Untersuchungen, die auf mögliche Auswirkungen der Symmetrie eingehen. Bei der Analyse werden sowohl sehr kleine Graphen (10 000 000 Knoten/>25 000 000 Kanten) berücksichtigt.
Weiterhin wird ein theoretisches Rahmenwerk geschaffen, das zum einen die detaillierte Quantifizierung von Graphensymmetrie erlaubt und zum anderen Stabilität von Knotenpartitionen hinsichtlich dieser Symmetrie formalisiert. Eine Partition der Knotenmenge, die durch die Aufteilung in disjunkte Teilmengen definiert ist, wird dann als stabil angesehen, wenn keine Knoten symmetriebedingt von der einen in die andere Teilmenge abgebildet werden und dadurch die Partition verändert wird. Zudem wird definiert, wie eine mögliche Zerlegbarkeit der Automorphismusgruppe in unabhängige Untergruppen als lokale Symmetrie interpretiert werden kann, die dann nur Auswirkungen auf einen bestimmten Bereich des Graphen hat. Um die Auswirkungen der Symmetrie auf den gesamten Graphen und auf Partitionen zu quantifizieren, wird außerdem eine Entropiedefinition präsentiert, die sich an der Analyse dynamischer Systeme orientiert. Alle Definitionen sind allgemein und können daher für beliebige Graphen angewandt werden. Teilweise ist sogar eine Anwendbarkeit für beliebige Clusteranalysen gegeben, solange deren Ergebnis in einer Partition resultiert und sich eine Symmetrierelation auf den Datenpunkten als Permutationsgruppe angeben lässt.
Um nun die tatsächliche Auswirkung von Symmetrie auf Graphenclustering zu untersuchen wird eine zweite Analyse durchgeführt. Diese kommt zum Ergebnis, dass von 629 untersuchten symmetrischen Graphen 72 eine instabile Partition haben. Für die Analyse werden die Definitionen des theoretischen Rahmenwerks verwendet. Es wird außerdem festgestellt, dass die Lokalität der Symmetrie eines Graphen maßgeblich beeinflusst, ob dessen Partition stabil ist oder nicht. Eine hohe Lokalität resultiert meist in einer stabilen Partition und eine stabile Partition impliziert meist eine hohe Lokalität.
Bevor die obigen Ergebnisse beschrieben und definiert werden, wird eine umfassende Einführung in die verschiedenen benötigten Grundlagen gegeben. Diese umfasst die formalen Definitionen von Graphen und statistischen Graphmodellen, Partitionen, endlichen Permutationsgruppen, Graphenclustering und Algorithmen dafür, sowie von Entropie. Ein separates Kapitel widmet sich ausführlich der Graphensymmetrie, die durch eine endliche Permutationsgruppe, der Automorphismusgruppe, beschrieben wird. Außerdem werden Algorithmen vorgestellt, die die Symmetrie von Graphen ermitteln können und, teilweise, auch das damit eng verwandte Graphisomorphie Problem lösen.
Am Beispiel von Graphenclustering gibt die Dissertation damit Einblicke in mögliche Auswirkungen von Symmetrie in der Datenanalyse, die so in der Literatur bisher wenig bis keine Beachtung fanden
Recommended from our members
Graph theory in America 1876-1950
This narrative is a history of the contributions made to graph theory in the United States of America by American mathematicians and others who supported the growth of scholarship in that country, between the years 1876 and 1950.
The beginning of this period coincided with the opening of the first research university in the United States of America, The Johns Hopkins University (although undergraduates were also taught), providing the facilities and impetus for the development of new ideas. The hiring, from England, of one of the foremost mathematicians of the time provided the necessary motivation for research and development for a new generation of American scholars. In addition, it was at this time that home-grown research mathematicians were first coming to prominence.
At the beginning of the twentieth century European interest in graph theory, and to some extent the four-colour problem, began to wane. Over three decades, American mathematicians took up this field of study - notably, Oswald Veblen, George Birkhoff, Philip Franklin, and Hassler Whitney. It is necessary to stress that these four mathematicians and all the other scholars mentioned in this history were not just graph theorists but worked in many other disciplines. Indeed, they not only made significant contributions to diverse fields but, in some cases, they created those fields themselves and set the standards for others to follow. Moreover, whilst they made considerable contributions to graph theory in general, two of them developed important ideas in connection with the four-colour problem. Grounded in a paper by Alfred Bray Kempe that was notorious for its fallacious 'proof' of the four-colour theorem, these ideas were the concepts of an unavoidable set and a reducible configuration.
To place the story of these scholars within the history of mathematics, America, and graph theory, brief accounts are presented of the early years of graph theory, the early years of mathematics and graph theory in the USA, and the effects of the founding of the first institute for postgraduate study in America. Additionally, information has been included on other influences by such global events as the two world wars, the depression, the influx of European scholars into the United States of America, mainly during the 1930s, and the parallel development of graph theory in Europe.
Until the end of the nineteenth century, graph theory had been almost entirely the prerogative of European mathematicians. Perhaps the first work in graph theory carried out in America was by Charles Sanders Peirce, arguably America's greatest logician and philosopher at the time. In the 1860s, he studied the four-colour conjecture and claimed to have written at least two papers on the subject during that decade, but unfortunately neither of these has survived. William Edward Story entered the field in 1879, with unfortunate consequences, but it was not until 1897 that an American mathematician presented a lecture on the subject, albeit only to have the paper disappear. Paul Wernicke presented a lecture on the four-colour problem to the American Mathematician Society, but again the paper has not survived. However, his 1904 paper has survived and added to the story of graph theory, and particularly the four-colour conjecture.
The year 1912 saw the real beginning of American graph theory with Veblen and Birkhoff publishing major contributions to the subject. It was around this time that European mathematicians appeared to lose interest in graph theory. In the period 1912 to 1950 much of the progress made in the subject was from America and by 1950 not only had the United States of America become the foremost country for mathematics, it was the leading centre for graph theory
SELFIES and the future of molecular string representations
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science
SELFIES and the future of molecular string representations
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science
Applied and Computational Statistics
Research without statistics is like water in the sand; the latter is necessary to reap the benefits of the former. This collection of articles is designed to bring together different approaches to applied statistics. The studies presented in this book are a tiny piece of what applied statistics means and how statistical methods find their usefulness in different fields of research from theoretical frames to practical applications such as genetics, computational chemistry, and experimental design. This book presents several applications of the statistics: A new continuous distribution with five parameters—the modified beta Gompertz distribution; A method to calculate the p-value associated with the Anderson–Darling statistic; An approach of repeated measurement designs; A validated model to predict statement mutations score; A new family of structural descriptors, called the extending characteristic polynomial (EChP) family, used to express the link between the structure of a compound and its properties. This collection brings together authors from Europe and Asia with a specific contribution to the knowledge in regards to theoretical and applied statistics
Principles of Security and Trust: 7th International Conference, POST 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings
authentication; computer science; computer software selection and evaluation; cryptography; data privacy; formal logic; formal methods; formal specification; internet; privacy; program compilers; programming languages; security analysis; security systems; semantics; separation logic; software engineering; specifications; verification; world wide we
- …