487 research outputs found
Towards faster numerical solution of Continuous Time Markov Chains stored by symbolic data structures
This work considers different aspects of model-based performance- and dependability analysis. This research area analyses systems (e.g. computer-, telecommunication- or production-systems) in order to quantify their performance and reliability. Such an analysis can be carried out already in the planning phase, without a physically existing system. All aspects treated in this work are based on finite state spaces (i.e. the models only have finitely many states) and a representation of the state graphs by Multi-Terminal Binary Decision Diagrams (MTBDDs). Currently, there are many tools that transform high-level model specifications (e.g. process algebra or Petri-Net) to low-level models (e.g. Markov chains). Markov chains can be represented by sparse matrices. For complex models very large state spaces may occur (this phenomenon is called state space explosion in the literature) and accordingly very large matrices representing the state graphs. The problem of building the model from the specification and storing the state graph can be regarded as solved: There are heuristics for compactly storing the state graph by MTBDD or Kronecker data structure and there are efficient algorithms for the model generation and functional analysis. For the quantitative analysis there are still problems due to the size of the underlying state space. This work provides some methods to alleviate the problems in case of MTBDD-based storage of the state graph. It is threefold:
1. For the generation of smaller state graphs in the model generation phase (which usually are easier to solve) a symbolic elimination algorithm is developed.
2. For the calculation of steady-state probabilities of Markov chains a multilevel algorithm is developed which allows for faster solutions.
3. For calculating the most probable paths in a state graph, the mean time to the first failure of a system and related measures, a path-based solver is developed
On the Learning Behavior of Adaptive Networks - Part I: Transient Analysis
This work carries out a detailed transient analysis of the learning behavior
of multi-agent networks, and reveals interesting results about the learning
abilities of distributed strategies. Among other results, the analysis reveals
how combination policies influence the learning process of networked agents,
and how these policies can steer the convergence point towards any of many
possible Pareto optimal solutions. The results also establish that the learning
process of an adaptive network undergoes three (rather than two) well-defined
stages of evolution with distinctive convergence rates during the first two
stages, while attaining a finite mean-square-error (MSE) level in the last
stage. The analysis reveals what aspects of the network topology influence
performance directly and suggests design procedures that can optimize
performance by adjusting the relevant topology parameters. Interestingly, it is
further shown that, in the adaptation regime, each agent in a sparsely
connected network is able to achieve the same performance level as that of a
centralized stochastic-gradient strategy even for left-stochastic combination
strategies. These results lead to a deeper understanding and useful insights on
the convergence behavior of coupled distributed learners. The results also lead
to effective design mechanisms to help diffuse information more thoroughly over
networks.Comment: to appear in IEEE Transactions on Information Theory, 201
Understanding and mitigating universal adversarial perturbations for computer vision neural networks
Deep neural networks (DNNs) have become the algorithm of choice for many computer vision applications. They are able to achieve human level performance in many computer vision tasks, and enable the automation and large-scale deployment of applications such as object tracking, autonomous vehicles, and medical imaging. However, DNNs expose software applications to systemic vulnerabilities in the form of Universal Adversarial Perturbations (UAPs): input perturbation attacks that can cause DNNs to make classification errors on large sets of inputs.
Our aim is to improve the robustness of computer vision DNNs to UAPs without sacrificing the models' predictive performance. To this end, we increase our understanding of these vulnerabilities by investigating the visual structures and patterns commonly appearing in UAPs. We demonstrate the efficacy and pervasiveness of UAPs by showing how Procedural Noise patterns can be used to generate efficient zero-knowledge attacks for different computer vision models and tasks at minimal cost to the attacker. We then evaluate the UAP robustness of various shape and texture-biased models, and found that applying them in ensembles provides marginal improvement to robustness.
To mitigate UAP attacks, we develop two novel approaches. First, we propose the Jacobian of DNNs to measure the sensitivity of computer vision DNNs. We derive theoretical bounds and provide empirical evidence that shows how a combination of Jacobian regularisation and ensemble methods allow for increased model robustness against UAPs without degrading the predictive performance of computer vision DNNs. Our results evince a robustness-accuracy trade-off against UAPs that is better than those of models trained in conventional ways. Finally, we design a detection method that analyses the hidden layer activation values to identify a variety of UAP attacks in real-time with low-latency. We show that our work outperforms existing defences under realistic time and computation constraints.Open Acces
Sparse reduced-rank regression for imaging genetics studies: models and applications
We present a novel statistical technique; the sparse reduced rank regression (sRRR) model
which is a strategy for multivariate modelling of high-dimensional imaging responses and
genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity
in the regression coefficients, identifying subsets of genetic markers that best explain
the variability observed in subsets of the phenotypes. To properly exploit the rich structure
present in each of the imaging and genetics domains, we additionally propose the use of
several structured penalties within the sRRR model. Using simulation procedures that accurately
reflect realistic imaging genetics data, we present detailed evaluations of the sRRR
method in comparison with the more traditional univariate linear modelling approach. In
all settings considered, we show that sRRR possesses better power to detect the deleterious
genetic variants. Moreover, using a simple genetic model, we demonstrate the potential
benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to
extracting averages over regions of interest in the brain. Since this entails the use of phenotypic
vectors of enormous dimensionality, we suggest the use of a sparse classification
model as a de-noising step, prior to the imaging genetics study. Finally, we present the
application of a data re-sampling technique within the sRRR model for model selection.
Using this approach we are able to rank the genetic markers in order of importance of association
to the phenotypes, and similarly rank the phenotypes in order of importance to
the genetic markers. In the very end, we illustrate the application perspective of the proposed
statistical models in three real imaging genetics datasets and highlight some potential
associations
Efficient operator-coarsening multigrid schemes for local discontinuous Galerkin methods
An efficient -multigrid scheme is presented for local discontinuous
Galerkin (LDG) discretizations of elliptic problems, formulated around the idea
of separately coarsening the underlying discrete gradient and divergence
operators. We show that traditional multigrid coarsening of the primal
formulation leads to poor and suboptimal multigrid performance, whereas
coarsening of the flux formulation leads to optimal convergence and is
equivalent to a purely geometric multigrid method. The resulting
operator-coarsening schemes do not require the entire mesh hierarchy to be
explicitly built, thereby obviating the need to compute quadrature rules,
lifting operators, and other mesh-related quantities on coarse meshes. We show
that good multigrid convergence rates are achieved in a variety of numerical
tests on 2D and 3D uniform and adaptive Cartesian grids, as well as for curved
domains using implicitly defined meshes and for multi-phase elliptic interface
problems with complex geometry. Extension to non-LDG discretizations is briefly
discussed
Parameter Estimation with Maximal Updated Densities
A recently developed measure-theoretic framework solves a stochastic inverse
problem (SIP) for models where uncertainties in model output data are
predominantly due to aleatoric (i.e., irreducible) uncertainties in model
inputs (i.e., parameters). The subsequent inferential target is a distribution
on parameters. Another type of inverse problem is to quantify uncertainties in
estimates of "true" parameter values under the assumption that such
uncertainties should be reduced as more data are incorporated into the problem,
i.e., the uncertainty is considered epistemic. A major contribution of this
work is the formulation and solution of such a parameter identification problem
(PIP) within the measure-theoretic framework developed for the SIP. The
approach is novel in that it utilizes a solution to a stochastic forward
problem (SFP) to update an initial density only in the parameter directions
informed by the model output data. In other words, this method performs
"selective regularization" only in the parameter directions not informed by
data. The solution is defined by a maximal updated density (MUD) point where
the updated density defines the measure-theoretic solution to the PIP. Another
significant contribution of this work is the full theory of existence and
uniqueness of MUD points for linear maps with Gaussian distributions.
Data-constructed Quantity of Interest (QoI) maps are also presented and
analyzed for solving the PIP within this measure-theoretic framework as a means
of reducing uncertainties in the MUD estimate. We conclude with a demonstration
of the general applicability of the method on two problems involving either
spatial or temporal data for estimating uncertain model parameters.Comment: Code: github.com/mathematicalmichael/mud.gi
Recommended from our members
Optimisation Methods For Training Deep Neural Networks in Speech Recognition
Automatic Speech Recognition (ASR) is an example of a sequence to sequence level classification task where, given an acoustic waveform, the goal is to produce the correct word level hypotheses. In machine learning, a classification problem such as ASR is solved in two stages: an inference stage that models the uncertainty associated with the choice of hypothesis given the acoustic waveform using a mathematical model, and a decision stage which employs the inference model in conjunction with decision theory to make optimal class assignments. With the advent of careful network initialisation and GPU computing, hybrid Hidden Markov Models (HMMs) augmented with Deep Neural Networks (DNNs) have shown to outperform traditional HMMs using Gaussian Mixture Models (GMMs) in solving the inference problem for ASR. In comparison to GMMs, DNNs possess a better capability to model the underlying non-linear data manifold due to their deep and complex structure. While the structure of such models gives rich modelling capability, it also creates complex dependencies between the parameters which can make learning difficult via first order stochastic gradient descent (SGD). The task of finding the best procedure to train DNNs continues to be an active area of research and has been made even more challenging by the availability of ever more training data. This thesis focuses on designing better optimisation approaches to train hybrid HMM-DNN models using sequence level discriminative criterion which is a natural loss function that preserves the sequential ordering of frames within a spoken utterance. The thesis presents an implementation of the second order Hessian Free (HF) optimisation method, and shows how the method can made efficient through appropriate modifications to the Conjugate Gradient algorithm. To achieve better convergence than SGD, this work explores the Natural Gradient method to train DNNs with discriminative sequence training. In the DNN literature, the method has been applied to train models for the Maximum Likelihood objective criterion. A novel contribution of this thesis is to extend this approach to the domain of Minimum Bayes Risk objective functions for discriminative sequence training. With sigmoid models trained on a 50hr and 200hr training set from the Multi-Genre Broadcast 1 (MGB1) transcription task, the NG method applied in a HF styled optimisation framework is shown to achieve better Word Error Rate (WER) reductions on the MGB1 development set than SGD from sequence training.
This thesis also addresses the particular issue of overfitting between the training criterion and WER, that primarily arises during sequence training of DNN models that use Rectified Linear Units (ReLUs) as activation functions. It is shown how by scaling with the Gauss Newton matrix, the HF method unlike other approaches can overcome this issue. Seeing that different optimisers work best with different models, it is attractive to have a consistent optimisation framework that is agnostic to the choice of activation function. To address the issue, this thesis develops the geometry of the underlying function space captured by different realisations of DNN model parameters, and presents the design considerations for an optimisation algorithm to be well defined on this space. Building on this analysis, a novel optimisation technique called NGHF is presented that uses both the direction of steepest descent on a probabilistic manifold and local curvature information to effectively probe the error surface. The basis of the method relies on an alternative derivation of Taylor’s theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. Apart from being well defined on the function space, when framed within a HF style optimisation framework, the method of NGHF is shown to achieve the greatest WER reductions from sequence training on the MGB1 development set with both sigmoid and ReLU based models trained on the 200hr MGB1 training set. The evaluation of the above optimisation methods in training different DNN model architectures is also presented.IDB Cambridge International Scholarshi
- …