439 research outputs found

    An Incidence Geometry approach to Dictionary Learning

    Full text link
    We study the Dictionary Learning (aka Sparse Coding) problem of obtaining a sparse representation of data points, by learning \emph{dictionary vectors} upon which the data points can be written as sparse linear combinations. We view this problem from a geometry perspective as the spanning set of a subspace arrangement, and focus on understanding the case when the underlying hypergraph of the subspace arrangement is specified. For this Fitted Dictionary Learning problem, we completely characterize the combinatorics of the associated subspace arrangements (i.e.\ their underlying hypergraphs). Specifically, a combinatorial rigidity-type theorem is proven for a type of geometric incidence system. The theorem characterizes the hypergraphs of subspace arrangements that generically yield (a) at least one dictionary (b) a locally unique dictionary (i.e.\ at most a finite number of isolated dictionaries) of the specified size. We are unaware of prior application of combinatorial rigidity techniques in the setting of Dictionary Learning, or even in machine learning. We also provide a systematic classification of problems related to Dictionary Learning together with various algorithms, their assumptions and performance

    MATCOS-10

    Get PDF

    The Vadalog System: Datalog-based Reasoning for Knowledge Graphs

    Full text link
    Over the past years, there has been a resurgence of Datalog-based systems in the database community as well as in industry. In this context, it has been recognized that to handle the complex knowl\-edge-based scenarios encountered today, such as reasoning over large knowledge graphs, Datalog has to be extended with features such as existential quantification. Yet, Datalog-based reasoning in the presence of existential quantification is in general undecidable. Many efforts have been made to define decidable fragments. Warded Datalog+/- is a very promising one, as it captures PTIME complexity while allowing ontological reasoning. Yet so far, no implementation of Warded Datalog+/- was available. In this paper we present the Vadalog system, a Datalog-based system for performing complex logic reasoning tasks, such as those required in advanced knowledge graphs. The Vadalog system is Oxford's contribution to the VADA research programme, a joint effort of the universities of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the main contribution of this paper, we illustrate the first implementation of Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive termination control strategy. We also provide a comprehensive experimental evaluation.Comment: Extended version of VLDB paper <https://doi.org/10.14778/3213880.3213888

    Succinct data structures for dynamics strings

    Get PDF
    A new simple algorithm for optimal embedding of complete binary trees into hypercubes as well as a node-by-node algorithm for embedding of nearly complete binary trees into hypercubes are presented

    Modelling biomolecules through atomistic graphs: theory, implementation, and applications

    Get PDF
    Describing biological molecules through computational models enjoys ever-growing popularity. Never before has access to computational resources been easier for scientists across the natural sciences. The need for accurate, efficient, and robust modelling tools is therefore irrefutable. This, in turn, calls for highly interdisciplinary research, which the thesis presented here is a product of. Through the successful marriage of techniques from mathematical graph theory, theoretical insights from chemistry and biology, and the tools of modern computer science, we are able to computationally construct accurate depictions of biomolecules as atomistic graphs, in which individual atoms become nodes and chemical bonds/interactions are represented by weighted edges. When combined with methods from graph theory and network science, this approach has previously been shown to successfully reveal various properties of proteins, such as dynamics, rigidity, multi-scale organisation, allostery, and protein-protein interactions, and is well poised to set new standards in terms of computational feasibility, multi-scale resolution (from atoms to domains) and time-scales (from nanoseconds to milliseconds). Therefore, building on previous work in our research group spanning over 15 years and to further encourage and facilitate research into this growing field, this thesis's main contribution is to provide a formalised foundation for the construction of atomistic graphs. The most crucial aspect of constructing atomistic graphs of large biomolecules compared to small molecules is the necessity to include a variety of different types of bonds and interactions, because larger biomolecules attain their unique structural layout mainly through weaker interactions, e.g. hydrogen bonds, the hydrophobic effect or π-π interactions. Whilst most interaction types are well-studied and have readily available methodology which can be used to construct atomistic graphs, this is not the case for hydrophobic interactions. To fill this gap, the work presented herein includes novel methodology for encoding the hydrophobic effect in atomistic graphs, that accounts for the many-body effect and non-additivity. Then, a standalone software package for constructing atomistic graphs from structural data is presented. Herein lies the heart of this thesis: the combination of a variety of methodologies for a range of bond/interaction types, as well as an implementation that is deterministic, easy-to-use and efficient. Finally, some promising avenues for utilising atomistic graphs in combination with graph theoretical tools such as Markov Stability as well as other approaches such as Multilayer Networks to study various properties of biomolecules are presented.Open Acces
    corecore