1,380 research outputs found
Operation of weaving partial Steiner triple systems
We introduce an operation of a kind of product which associates with a
partial Steiner triple system another partial Steiner triple system, the
starting one being a quotient of the result. We discuss relations of our
product to some other product-like constructions and prove some
preservation/non-preservation theorems. In particular, we show series of
anti-Pasch Steiner triple systems which are obtained that way
Recommended from our members
Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia
Self-Enforcing Access Control for Encrypted RDF
The amount of raw data exchanged via web protocols is
steadily increasing. Although the Linked Data infrastructure could
potentially be used to selectively share RDF data with different individuals
or organisations, the primary focus remains on the unrestricted
sharing of public data. In order to extend the Linked Data paradigm to
cater for closed data, there is a need to augment the existing infrastructure
with robust security mechanisms. At the most basic level both access
control and encryption mechanisms are required. In this paper, we propose
a flexible and dynamic mechanism for securely storing and efficiently
querying RDF datasets. By employing an encryption strategy based on
Functional Encryption (FE) in which controlled data access does not
require a trusted mediator, but is instead enforced by the cryptographic
approach itself, we allow for fine-grained access control over encrypted
RDF data while at the same time reducing the administrative overhead
associated with access control management
Hydrogen bonding in organic systems: a study using x-ray and neutron diffraction and database analyses.
This thesis covers three topics related to the field of crystal engineering. Three different approaches to improving the understanding of hydrogen bonding are covered; analysis of a family of related molecules, investigations of specific functional groups and a systematic, data-driven study of intramolecular hydrogen bonding patterns. Chapters 2 to 4 and chapter 11 cover the background theory to the different methods used to obtain the data discussed in the remainder of the thesis. X-ray and neutron diffraction techniques are discussed, along with sections describing the Cambridge Structural Database, which was used as a data source throughout this work, and a brief section on intermolecular forces. Crystal structure analyses of seventeen gem-alkynol molecules are given in chapters 5 to 10. The gem-alkynol functionality is particularly interesting for a study of intermolecular interactions as it is a combination of both a strong and weak hydrogen bonding group. The group of molecules was investigated with the aim of locating robust supramolecular motifs. The group is subdivided into sections containing molecules with similar structures and their packing patterns are discussed. The second experimental section, chapters 12 and 13, comprises statistical studies into the function of the azido and cyano functional groups as hydrogen bond acceptors. The technique used was to use the Cambridge Structural Database as a data source for the main analysis, then complement the results with simple theoretical calculations. The remaining chapter, 14, describes a systematic analysis of intermolecular hydrogen bonded motifs. A data-driven approach was designed which allows direct comparison of motifs by means of a probability ordered list
Learning Models over Relational Data using Sparse Tensors and Functional Dependencies
Integrated solutions for analytics over relational databases are of great
practical importance as they avoid the costly repeated loop data scientists
have to deal with on a daily basis: select features from data residing in
relational databases using feature extraction queries involving joins,
projections, and aggregations; export the training dataset defined by such
queries; convert this dataset into the format of an external learning tool; and
train the desired model using this tool. These integrated solutions are also a
fertile ground of theoretically fundamental and challenging problems at the
intersection of relational and statistical data models.
This article introduces a unified framework for training and evaluating a
class of statistical learning models over relational databases. This class
includes ridge linear regression, polynomial regression, factorization
machines, and principal component analysis. We show that, by synergizing key
tools from database theory such as schema information, query structure,
functional dependencies, recent advances in query evaluation algorithms, and
from linear algebra such as tensor and matrix operations, one can formulate
relational analytics problems and design efficient (query and data)
structure-aware algorithms to solve them.
This theoretical development informed the design and implementation of the
AC/DC system for structure-aware learning. We benchmark the performance of
AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting
and advertisement planning applications, AC/DC can learn polynomial regression
models and factorization machines with at least the same accuracy as its
competitors and up to three orders of magnitude faster than its competitors
whenever they do not run out of memory, exceed 24-hour timeout, or encounter
internal design limitations.Comment: 61 pages, 9 figures, 2 table
Information Architecture for a Chemical Modeling Knowledge Graph
Machine learning models for chemical property predictions are high dimension design challenges spanning multiple disciplines. Free and open-source software libraries have streamlined the model implementation process, but the design complexity remains. In order better navigate and understand the machine learning design space, model information needs to be organized and contextualized. In this work, instances of chemical property models and their associated parameters were stored in a Neo4j property graph database. Machine learning model instances were created with permutations of dataset, learning algorithm, molecular featurization, data scaling, data splitting, hyperparameters, and hyperparameter optimization techniques. The resulting graph contains over 83,000 nodes and 4 million edges and can be explored with interactive visualization software. The structure of the property graph is centered around models and molecules which enables efficient and intuitive inter- and intra-model evaluation. We use a curated lipophilicity dataset to demonstrate graph use cases. Difficult to predict molecules were identified across multiple models simultaneously. Powerful and expressive graph queries were implemented to identify molecular fragments that were both prevalent and associated with high lipophilicity prediction error
- …