31 research outputs found

    Inductive queries for a drug designing robot scientist

    Get PDF
    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

    Comparing graphs

    Get PDF
    Graphs are a well-studied mathematical concept, which has become ubiquitous to represent structured data in many application domains like computer vision, social network analysis or chem- and bioinformatics. The ever-increasing amount of data in these domains requires to efficiently organize and extract information from large graph data sets. In this context techniques for comparing graphs are fundamental, e.g., in order to obtain meaningful similarity measures between graphs. These are a prerequisite for the application of a variety of data mining algorithms to the domain of graphs. Hence, various approaches to graph comparison evolved and are wide-spread in practice. This thesis is dedicated to two different strategies for comparing graphs: maximum common subgraph problems and graph kernels. We study maximum common subgraph problems, which are based on classical graph-theoretical concepts for graph comparison and are NP-hard in the general case. We consider variants of the maximum common subgraph problem in restricted graph classes, which are highly relevant for applications in cheminformatics. We develop a polynomial-time algorithm, which allows to compute a maximum common subgraph under block and bridge preserving isomorphism in series-parallel graphs. This generalizes the problem of computing maximum common biconnected subgraphs in series-parallel graphs. We show that previous approaches to this problem, which are based on the separators represented by standard graph decompositions, fail. We introduce the concept of potential separators to overcome this issue and use them algorithmically to solve the problem in series-parallel graphs. We present algorithms with improved bounds on running time for the subclass of outerplanar graphs. Finally, we establish a sufficient condition for maximum common subgraph variants to allow derivation of graph distance metrics. This leads to polynomial-time computable graph distance metrics in restricted graph classes. This progress constitutes a step towards solving practically relevant maximum common subgraph problems in polynomial time. The second contribution of this thesis is to graph kernels, which have their origin in specific data mining algorithms. A key property of graph kernels is that they allow to consider a large (possibly infinite) number of features and can support graphs with arbitrary annotation, while being efficiently computable. The main contributions of this part of the thesis are (i) the development of novel graph kernels, which are especially designed for attributed graphs with arbitrary annotations and (ii) the systematic study of implicit and explicit mapping into a feature space for computation of graph kernels w.r.t. its impact on the running time and the ability to consider arbitrary annotations. We propose graph kernels based on bijections between subgraphs and walks of fixed length. In an experimental study we show that these approaches provide a viable alternative to known techniques, in particular for graphs with complex annotations

    Acta Cybernetica : Volume 18. Number 3.

    Get PDF

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 248, ISAAC 2022, Complete Volum

    visone - Software for the Analysis and Visualization of Social Networks

    Get PDF
    We present the software tool visone which combines graph-theoretic methods for the analysis of social networks with tailored means of visualization. Our main contribution is the design of novel graph-layout algorithms which accurately reflect computed analyses results in well-arranged drawings of the networks under consideration. Besides this, we give a detailed description of the design of the software tool and the provided analysis methods

    Sublinear Computation Paradigm

    Get PDF
    This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF

    Proceedings. 19. Workshop Computational Intelligence, Dortmund, 2. - 4. Dezember 2009

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 19. Workshops „Computational Intelligence“ des Fachausschusses 5.14 der VDI/VDE-Gesellschaft fĂĽr Mess- und Automatisierungstechnik (GMA) und der Fachgruppe „Fuzzy-Systeme und Soft-Computing“ der Gesellschaft fĂĽr Informatik (GI), der vom 2.-4. Dezember 2009 im Haus Bommerholz bei Dortmund stattfindet

    Network Theory Approaches for the Analysis of Ordered Data.

    Get PDF
    PhD ThesisGraphs are mathematical structures comprised of a set of nodes connected by edges, and network science is the application of graph theory to real world data. Networks are used as a model to analyse how entities, either individual actors, or complex systems, interact with one another. The research here will consist of extracting networks (we will use the terms \graph" and \network" interchangeably) from ordered series, which we will focus on series ordered by time. We will either do this with the aid of the visibility graph, which is a method, based on visibility, for mapping a time series in to a graph, or through estimating the wavelet correlation, a more conventional method used in neuroscience. The aim is to describe the structure of time series and their underlying dynamical properties in graphtheoretical terms, and then using this motivation to analyse large data sets spanning several disciplines. We will describe a method, using the visibility graph, for quantifying reversibility of non-stationary processes and apply this method to a large nancial data set, with the intent of ranking companies based on their irreversibility. We also use the visibility graph to develop a method which e ciently quanti es the asymmetries between minima and maxima in time series, and we then apply the method to a variety of data sets. Continuing with the theme of visibility, we study the spectral properties of visibility graphs extracted from trajectories of the logistic map undergoing a period-doubling route to chaos (known as the Feigenbaum scenario). Finally, we will use wavelet correlation to construct networks from fMRI time series, and examine community structure with the aim of di erentiating between brain networks of patients with schizophrenia from control subjects. The general format throughout this thesis will start with theory, followed by extensive numerical simulations, which we can then apply the methods to real data sets
    corecore