79 research outputs found
Automatic Clustering of a Network Protocol with Weakly-Supervised Clustering
Abstraction is a fundamental part when learning behavioral models of systems.
Usually the process of abstraction is manually defined by domain experts. This
paper presents a method to perform automatic abstraction for network protocols.
In particular a weakly supervised clustering algorithm is used to build an
abstraction with a small vocabulary size for the widely used TLS protocol. To
show the effectiveness of the proposed method we compare the resultant abstract
messages to a manually constructed (reference) abstraction. With a small amount
of side-information in the form of a few labeled examples this method finds an
abstraction that matches the reference abstraction perfectly
Sound event detection using weakly-labeled semi-supervised data with GCRNNS, VAT and Self-Adaptive Label Refinement
In this paper, we present a gated convolutional recurrent neural network
based approach to solve task 4, large-scale weakly labelled semi-supervised
sound event detection in domestic environments, of the DCASE 2018 challenge.
Gated linear units and a temporal attention layer are used to predict the onset
and offset of sound events in 10s long audio clips. Whereby for training only
weakly-labelled data is used. Virtual adversarial training is used for
regularization, utilizing both labelled and unlabeled data. Furthermore, we
introduce self-adaptive label refinement, a method which allows unsupervised
adaption of our trained system to refine the accuracy of frame-level class
predictions. The proposed system reaches an overall macro averaged event-based
F-score of 34.6%, resulting in a relative improvement of 20.5% over the
baseline system.Comment: Accepted at DCASE 2018 Workshop for oral presentatio
Exact Maximum Margin Structure Learning of Bayesian Networks
Recently, there has been much interest in finding globally optimal Bayesian
network structures. These techniques were developed for generative scores and
can not be directly extended to discriminative scores, as desired for
classification. In this paper, we propose an exact method for finding network
structures maximizing the probabilistic soft margin, a successfully applied
discriminative score. Our method is based on branch-and-bound techniques within
a linear programming framework and maintains an any-time solution, together
with worst-case sub-optimality bounds. We apply a set of order constraints for
enforcing the network structure to be acyclic, which allows a compact problem
representation and the use of general-purpose optimization techniques. In
classification experiments, our methods clearly outperform generatively trained
network structures and compete with support vector machines.Comment: ICM
Optimisation of Overparametrized Sum-Product Networks
It seems to be a pearl of conventional wisdom that parameter learning in deep
sum-product networks is surprisingly fast compared to shallow mixture models.
This paper examines the effects of overparameterization in sum-product networks
on the speed of parameter optimisation. Using theoretical analysis and
empirical experiments, we show that deep sum-product networks exhibit an
implicit acceleration compared to their shallow counterpart. In fact,
gradient-based optimisation in deep tree-structured sum-product networks is
equal to gradient ascend with adaptive and time-varying learning rates and
additional momentum terms.Comment: Workshop on Tractable Probabilistic Models (TPM) at ICML 201
Sum-Product Networks for Sequence Labeling
We consider higher-order linear-chain conditional random fields (HO-LC-CRFs)
for sequence modelling, and use sum-product networks (SPNs) for representing
higher-order input- and output-dependent factors. SPNs are a recently
introduced class of deep models for which exact and efficient inference can be
performed. By combining HO-LC-CRFs with SPNs, expressive models over both the
output labels and the hidden variables are instantiated while still enabling
efficient exact inference. Furthermore, the use of higher-order factors allows
us to capture relations of multiple input segments and multiple output labels
as often present in real-world data. These relations can not be modelled by the
commonly used first-order models and higher-order models with local factors
including only a single output label. We demonstrate the effectiveness of our
proposed models for sequence labeling. In extensive experiments, we outperform
other state-of-the-art methods in optical character recognition and achieve
competitive results in phone classification
Bayesian network classifier versus k-NN classifier
The aim of this paper is to compare Bayesian network classifiers to the k-NN classifier based on a subset of features. This subset is established by means of sequential feature selection methods. Experimental results show that Bayesian network classifiers more often achieve a better classification rate on different data sets than selective k-NN classifiers. The k-NN classifier performs well in the case where the number of samples for learning the parameters of the Bayesian network is small. Bayesian network classifiers outperform selective k-NN methods in terms of memory requirements and computational demands. This paper demonstrates the strength of Bayesian networks for classification
Order-based Discriminative Structure Learning for Bayesian Network Classifiers
We introduce a simple empirical order-based greedy heuristic for learning discriminative Bayesian network structures. We propose two metrics for establishing the ordering of N features. They are based on the conditional mutual information. Given an ordering, we can find the discriminative classifier structure with O (N q) score evaluations (where constant q is the maximum number of parents per node). We present classification results on the UCI repository (Merz, Murphy, & Aha 1997), for a phonetic classification task using the TIMIT database (Lamel, Kassel, & Seneff 1986), and for the MNIST handwritten digit recognition task (Le-Cun et al. 1998). The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (naive greedy) Bayesian network learning approach, but does so with a factor of ∼10 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features.
Differentiable TAN Structure Learning for Bayesian Network Classifiers
Learning the structure of Bayesian networks is a difficult combinatorial
optimization problem. In this paper, we consider learning of tree-augmented
naive Bayes (TAN) structures for Bayesian network classifiers with discrete
input features. Instead of performing a combinatorial optimization over the
space of possible graph structures, the proposed method learns a distribution
over graph structures. After training, we select the most probable structure of
this distribution. This allows for a joint training of the Bayesian network
parameters along with its TAN structure using gradient-based optimization. The
proposed method is agnostic to the specific loss and only requires that it is
differentiable. We perform extensive experiments using a hybrid
generative-discriminative loss based on the discriminative probabilistic
margin. Our method consistently outperforms random TAN structures and Chow-Liu
TAN structures.Comment: Accepted at PGM 202
Blind Speech Separation and Dereverberation using Neural Beamforming
In this paper, we present the Blind Speech Separation and Dereverberation
(BSSD) network, which performs simultaneous speaker separation, dereverberation
and speaker identification in a single neural network. Speaker separation is
guided by a set of predefined spatial cues. Dereverberation is performed by
using neural beamforming, and speaker identification is aided by embedding
vectors and triplet mining. We introduce a frequency-domain model which uses
complex-valued neural networks, and a time-domain variant which performs
beamforming in latent space. Further, we propose a block-online mode to process
longer audio recordings, as they occur in meeting scenarios. We evaluate our
system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word
Error Rate (WER) and Equal Error Rate (EER).Comment: 13 pages, 9 figure
Fixed Points of Belief Propagation -- An Analysis via Polynomial Homotopy Continuation
Belief propagation (BP) is an iterative method to perform approximate
inference on arbitrary graphical models. Whether BP converges and if the
solution is a unique fixed point depends on both the structure and the
parametrization of the model. To understand this dependence it is interesting
to find \emph{all} fixed points. In this work, we formulate a set of polynomial
equations, the solutions of which correspond to BP fixed points. To solve such
a nonlinear system we present the numerical polynomial-homotopy-continuation
(NPHC) method. Experiments on binary Ising models and on error-correcting codes
show how our method is capable of obtaining all BP fixed points. On Ising
models with fixed parameters we show how the structure influences both the
number of fixed points and the convergence properties. We further asses the
accuracy of the marginals and weighted combinations thereof. Weighting
marginals with their respective partition function increases the accuracy in
all experiments. Contrary to the conjecture that uniqueness of BP fixed points
implies convergence, we find graphs for which BP fails to converge, even though
a unique fixed point exists. Moreover, we show that this fixed point gives a
good approximation, and the NPHC method is able to obtain this fixed point
- …