26,219 research outputs found
ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)
We present ExplainIt!, a declarative, unsupervised root-cause analysis engine
that uses time series monitoring data from large complex systems such as data
centres. ExplainIt! empowers operators to succinctly specify a large number of
causal hypotheses to search for causes of interesting events. ExplainIt! then
ranks these hypotheses, reducing the number of causal dependencies from
hundreds of thousands to a handful for human understanding. We show how a
declarative language, such as SQL, can be effective in declaratively
enumerating hypotheses that probe the structure of an unknown probabilistic
graphical causal model of the underlying system. Our thesis is that databases
are in a unique position to enable users to rapidly explore the possible causal
mechanisms in data collected from diverse sources. We empirically demonstrate
how ExplainIt! had helped us resolve over 30 performance issues in a commercial
product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201
Some results related to dense families of database relations
The dense families of database relations were introduced by Järvinen [7]. The aim of this paper is to investigate some new properties of dense families of database relations, and their applications. That is, we characterize functional dependencies and minimal keys in terms of dense families. We give a necessary and sufficient condition for an abitrary family to be R— dense family. We prove that with a given relation R the equality set ER is an R—dense family whose size is at most m(m-1)/2, where m is the number of tuples in R. We also prove that the set of all minimal keys of relation R is the transversal hypergraph of the complement of the equality set ER. We give an effective algorithm finding all minimal keys of a given relation R. We also give an algorithm which from a given relation R finds a cover of functional dependencies that holds in R. The complexity of these algorithms is also esimated
Recommended from our members
Multi-omic Analyses of Extensively Decayed Pinus contorta Reveal Expression of a Diverse Array of Lignocellulose-Degrading Enzymes.
Fungi play a key role cycling nutrients in forest ecosystems, but the mechanisms remain uncertain. To clarify the enzymatic processes involved in wood decomposition, the metatranscriptomics and metaproteomics of extensively decayed lodgepole pine were examined by RNA sequencing (RNA-seq) and liquid chromatography-tandem mass spectrometry (LC-MS/MS), respectively. Following de novo metatranscriptome assembly, 52,011 contigs were searched for functional domains and homology to database entries. Contigs similar to basidiomycete transcripts dominated, and many of these were most closely related to ligninolytic white rot fungi or cellulolytic brown rot fungi. A diverse array of carbohydrate-active enzymes (CAZymes) representing a total of 132 families or subfamilies were identified. Among these were 672 glycoside hydrolases, including highly expressed cellulases or hemicellulases. The CAZymes also included 162 predicted redox enzymes classified within auxiliary activity (AA) families. Eighteen of these were manganese peroxidases, which are key components of ligninolytic white rot fungi. The expression of other redox enzymes supported the working of hydroquinone reduction cycles capable of generating reactive hydroxyl radicals. These have been implicated as diffusible oxidants responsible for cellulose depolymerization by brown rot fungi. Thus, enzyme diversity and the coexistence of brown and white rot fungi suggest complex interactions of fungal species and degradative strategies during the decay of lodgepole pine.IMPORTANCE The deconstruction of recalcitrant woody substrates is a central component of carbon cycling and forest health. Laboratory investigations have contributed substantially toward understanding the mechanisms employed by model wood decay fungi, but few studies have examined the physiological processes in natural environments. Herein, we identify the functional genes present in field samples of extensively decayed lodgepole pine (Pinus contorta), a major species distributed throughout the North American Rocky Mountains. The classified transcripts and proteins revealed a diverse array of oxidative and hydrolytic enzymes involved in the degradation of lignocellulose. The evidence also strongly supports simultaneous attack by fungal species employing different enzymatic strategies
Learning and Interpreting Multi-Multi-Instance Learning Networks
We introduce an extension of the multi-instance learning problem where
examples are organized as nested bags of instances (e.g., a document could be
represented as a bag of sentences, which in turn are bags of words). This
framework can be useful in various scenarios, such as text and image
classification, but also supervised learning over graphs. As a further
advantage, multi-multi instance learning enables a particular way of
interpreting predictions and the decision function. Our approach is based on a
special neural network layer, called bag-layer, whose units aggregate bags of
inputs of arbitrary size. We prove theoretically that the associated class of
functions contains all Boolean functions over sets of sets of instances and we
provide empirical evidence that functions of this kind can be actually learned
on semi-synthetic datasets. We finally present experiments on text
classification, on citation graphs, and social graph data, which show that our
model obtains competitive results with respect to accuracy when compared to
other approaches such as convolutional networks on graphs, while at the same
time it supports a general approach to interpret the learnt model, as well as
explain individual predictions.Comment: JML
GraphClust: alignment-free structural clustering of local RNA secondary structures
Motivation: Clustering according to sequence–structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs
- …