701 research outputs found
The Infinity Mirror Test for Graph Models
Graph models, like other machine learning models, have implicit and explicit
biases built-in, which often impact performance in nontrivial ways. The model's
faithfulness is often measured by comparing the newly generated graph against
the source graph using any number or combination of graph properties.
Differences in the size or topology of the generated graph therefore indicate a
loss in the model. Yet, in many systems, errors encoded in loss functions are
subtle and not well understood. In the present work, we introduce the Infinity
Mirror test for analyzing the robustness of graph models. This straightforward
stress test works by repeatedly fitting a model to its own outputs. A
hypothetically perfect graph model would have no deviation from the source
graph; however, the model's implicit biases and assumptions are exaggerated by
the Infinity Mirror test, exposing potential issues that were previously
obscured. Through an analysis of thousands of experiments on synthetic and
real-world graphs, we show that several conventional graph models degenerate in
exciting and informative ways. We believe that the observed degenerative
patterns are clues to the future development of better graph models.Comment: This was submitted to IEEE TKDE 2020, 12 pages and 8 figure
DNA ANALYSIS USING GRAMMATICAL INFERENCE
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.
To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions
In this paper, we present a general framework for learning social affordance
grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human
interactions, and transfer the grammar to humanoids to enable a real-time
motion inference for human-robot interaction (HRI). Based on Gibbs sampling,
our weakly supervised grammar learning can automatically construct a
hierarchical representation of an interaction with long-term joint sub-tasks of
both agents and short term atomic actions of individual agents. Based on a new
RGB-D video dataset with rich instances of human interactions, our experiments
of Baxter simulation, human evaluation, and real Baxter test demonstrate that
the model learned from limited training data successfully generates human-like
behaviors in unseen scenarios and outperforms both baselines.Comment: The 2017 IEEE International Conference on Robotics and Automation
(ICRA
Network Traffic Analysis Using Stochastic Grammars
Network traffic analysis is widely used to infer information from Internet traffic. This is possible even if the traffic is encrypted. Previous work uses traffic characteristics, such as port numbers, packet sizes, and frequency, without looking for more subtle patterns in the network traffic. In this work, we use stochastic grammars, hidden Markov models (HMMs) and probabilistic context-free grammars (PCFGs), as pattern recognition tools for traffic analysis. HMMs are widely used for pattern recognition and detection. We use a HMM inference approach. With inferred HMMs, we use confidence intervals (CI) to detect if a data sequence matches the HMM. To compare HMMs, we define a normalized Markov metric. A statistical test is used to determine model equivalence. Our metric systematically removes the least likely events from both HMMs until the remaining models are statistically equivalent. This defines the distance between models. We extend the use of HMMs to PCFGs, which have more expressive power. We estimate PCFG production probabilities from data. A statistical test is used for detection. We present three applications of HMM and PCFG detection to network traffic analysis. First, we infer the presence of protocol tunneling through Tor (the onion router) anonymization network. The Markov metric quantifies the similarity of network traffic HMMs in Tor to identify the protocol. It also measures communication noise in Tor network. We use HMMs to detect centralized botnet traffic. We infer HMMs from botnet traffic data and detect botnet infections. Experimental results show that HMMs can accurately detect Zeus botnet traffic. To hide their locations better, newer botnets have P2P control structures. Hierarchical P2P botnets contain recursive and hierarchical patterns. We use PCFGs to detect P2P botnet traffic. Experimentation on real-world traffic data shows that PCFGs can accurately differentiate between P2P botnet traffic and normal Internet traffic
Unsupervised Lexicon Discovery from Acoustic Input
We present a model of unsupervised phonological lexicon discovery -- the problem of simultaneously learning phoneme-like and word-like units from acoustic input. Our model builds on earlier models of unsupervised phone-like unit discovery from acoustic data (Lee and Glass, 2012), and unsupervised symbolic lexicon discovery using the Adaptor Grammar framework (Johnson et al., 2006), integrating these earlier approaches using a probabilistic model of phonological variation. We show that the model is competitive with state-of-the-art spoken term discovery systems, and present analyses exploring the model's behavior and the kinds of linguistic structures it learns
SG-VAE: Scene Grammar Variational Autoencoder to Generate New Indoor Scenes
Deep generative models have been used in recent years to learn coherent latent representations in order to synthesize high-quality images. In this work, we propose a neural network to learn a generative model for sampling consistent indoor scene layouts. Our method learns the co-occurrences, and appearance parameters such as shape and pose, for different objects categories through a grammar-based auto-encoder, resulting in a compact and accurate representation for scene layouts. In contrast to existing grammar-based methods with a user-specified grammar, we construct the grammar automatically by extracting a set of production rules on reasoning about object co-occurrences in training data. The extracted grammar is able to represent a scene by an augmented parse tree. The proposed auto-encoder encodes these parse trees to a latent code, and decodes the latent code to a parse tree, thereby ensuring the generated scene is always valid. We experimentally demonstrate that the proposed auto-encoder learns not only to generate valid scenes (i.e. the arrangements and appearances of objects), but it also learns coherent latent representations where nearby latent samples decode to similar scene outputs. The obtained generative model is applicable to several computer vision tasks such as 3D pose and layout estimation from RGB-D data
Prospects for Declarative Mathematical Modeling of Complex Biological Systems
Declarative modeling uses symbolic expressions to represent models. With such
expressions one can formalize high-level mathematical computations on models
that would be difficult or impossible to perform directly on a lower-level
simulation program, in a general-purpose programming language. Examples of such
computations on models include model analysis, relatively general-purpose
model-reduction maps, and the initial phases of model implementation, all of
which should preserve or approximate the mathematical semantics of a complex
biological model. The potential advantages are particularly relevant in the
case of developmental modeling, wherein complex spatial structures exhibit
dynamics at molecular, cellular, and organogenic levels to relate genotype to
multicellular phenotype. Multiscale modeling can benefit from both the
expressive power of declarative modeling languages and the application of model
reduction methods to link models across scale. Based on previous work, here we
define declarative modeling of complex biological systems by defining the
operator algebra semantics of an increasingly powerful series of declarative
modeling languages including reaction-like dynamics of parameterized and
extended objects; we define semantics-preserving implementation and
semantics-approximating model reduction transformations; and we outline a
"meta-hierarchy" for organizing declarative models and the mathematical methods
that can fruitfully manipulate them
- …