923 research outputs found
Compositional Algorithms on Compositional Data: Deciding Sheaves on Presheaves
Algorithmicists are well-aware that fast dynamic programming algorithms are
very often the correct choice when computing on compositional (or even
recursive) graphs. Here we initiate the study of how to generalize this
folklore intuition to mathematical structures writ large. We achieve this
horizontal generality by adopting a categorial perspective which allows us to
show that: (1) structured decompositions (a recent, abstract generalization of
many graph decompositions) define Grothendieck topologies on categories of data
(adhesive categories) and that (2) any computational problem which can be
represented as a sheaf with respect to these topologies can be decided in
linear time on classes of inputs which admit decompositions of bounded width
and whose decomposition shapes have bounded feedback vertex number. This
immediately leads to algorithms on objects of any C-set category; these include
-- to name but a few examples -- structures such as: symmetric graphs, directed
graphs, directed multigraphs, hypergraphs, directed hypergraphs, databases,
simplicial complexes, circular port graphs and half-edge graphs.
Thus we initiate the bridging of tools from sheaf theory, structural graph
theory and parameterized complexity theory; we believe this to be a very
fruitful approach for a general, algebraic theory of dynamic programming
algorithms. Finally we pair our theoretical results with concrete
implementations of our main algorithmic contribution in the AlgebraicJulia
ecosystem.Comment: Revised and simplified notation and improved exposition. The
companion code can be found here:
https://github.com/AlgebraicJulia/StructuredDecompositions.j
Database queries and constraints via lifting problems
Previous work has demonstrated that categories are useful and expressive
models for databases. In the present paper we build on that model, showing that
certain queries and constraints correspond to lifting problems, as found in
modern approaches to algebraic topology. In our formulation, each so-called
SPARQL graph pattern query corresponds to a category-theoretic lifting problem,
whereby the set of solutions to the query is precisely the set of lifts. We
interpret constraints within the same formalism and then investigate some basic
properties of queries and constraints. In particular, to any database we
can associate a certain derived database \Qry(\pi) of queries on . As an
application, we explain how giving users access to certain parts of
\Qry(\pi), rather than direct access to , improves ones ability to
manage the impact of schema evolution
Non-Negative Discriminative Data Analytics
Due to advancements in data acquisition techniques, collecting datasets representing samples from multi-views has become more common recently (Jia et al. 2019). For instance, in genomics, a lymphoma patient’s dataset may include data on gene expression, single nucleotide polymorphism (SNP), and array Comparative genomic hybridization (aCGH) measurements. Learning from multiple views about the same objective, in general, obtains a better understanding of the hidden patterns of the data compared to learning from a single view data. Most of the existing multi-view learning techniques such as canonical correlation analysis (Hotelling et al. 1936) and multi-view support vector machine (Farquhar et al. 2006), multiple kernel learning (Zhang et al. 2016) are focused on extracting the shared information among multiple datasets.
However, in some real-world applications, it’s appealing to extract the discriminative knowledge of multiple datasets, namely discriminative data analytics. For example, consider the one dataset as gene-expression measurements of cancer patients, and the other dataset as the gene-expression levels of healthy volunteers and the goal is to cluster cancer patients according to the molecular sub-types. Performing a single view analysis such as principal component analysis (PCA) on any of the dataset yields information related to the common knowledge between the two datasets (Garte et al. 1996). Addressing such challenge, contrastive PCA (Abid et al. 2017) and discriminative (d) PCA in (Jia et al. 2019) are proposed in to extract one dataset-specific information often missed by PCA.
Inspired by dPCA, we propose a novel discriminative multi-view learning algorithm, namely Non-negative Discriminative Analysis (DNA), to extract the unique information of one dataset (a.k.a. view) with respect to the other dataset. This boils down to solving a non-negative matrix factorization problem. Furthermore, we apply the proposed DNA framework in various real-world down-stream machine learning applications such as feature selections, dimensionality reduction, classification, and clustering
- …