64 research outputs found
Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure
<p>Abstract</p> <p>Background</p> <p>Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight.</p> <p>Results</p> <p>In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements.</p> <p>Conclusions</p> <p>We present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at <url>http://noble.gs.washington.edu/proj/pssp</url>.</p
Event-Driven Contrastive Divergence for Spiking Neuromorphic Systems
Restricted Boltzmann Machines (RBMs) and Deep Belief Networks have been
demonstrated to perform efficiently in a variety of applications, such as
dimensionality reduction, feature learning, and classification. Their
implementation on neuromorphic hardware platforms emulating large-scale
networks of spiking neurons can have significant advantages from the
perspectives of scalability, power dissipation and real-time interfacing with
the environment. However the traditional RBM architecture and the commonly used
training algorithm known as Contrastive Divergence (CD) are based on discrete
updates and exact arithmetics which do not directly map onto a dynamical neural
substrate. Here, we present an event-driven variation of CD to train a RBM
constructed with Integrate & Fire (I&F) neurons, that is constrained by the
limitations of existing and near future neuromorphic hardware platforms. Our
strategy is based on neural sampling, which allows us to synthesize a spiking
neural network that samples from a target Boltzmann distribution. The recurrent
activity of the network replaces the discrete steps of the CD algorithm, while
Spike Time Dependent Plasticity (STDP) carries out the weight updates in an
online, asynchronous fashion. We demonstrate our approach by training an RBM
composed of leaky I&F neurons with STDP synapses to learn a generative model of
the MNIST hand-written digit dataset, and by testing it in recognition,
generation and cue integration tasks. Our results contribute to a machine
learning-driven approach for synthesizing networks of spiking neurons capable
of carrying out practical, high-level functionality.Comment: (Under review
Active labeling in deep learning and its application to emotion prediction
Recent breakthroughs in deep learning have made possible the learning of deep layered hierarchical representations of sensory input. Stacked restricted Boltzmann machines (RBMs), also called deep belief networks (DBNs), and stacked autoencoders are two representative deep learning methods. The key idea is greedy layer-wise unsupervised pre-training followed by supervised fine-tuning, which can be done efficiently and overcomes the difficulty of local minima when training all layers of a deep neural network at once. Deep learning has been shown to achieve outstanding performance in a number of challenging real-world applications. Existing deep learning methods involve a large number of meta-parameters, such as the number of hidden layers, the number of hidden nodes, the sparsity target, the initial values of weights, the type of units, the learning rate, etc. Existing applications usually do not explain why the decisions were made and how changes would affect performance. Thus, it is difficult for a novice user to make good decisions for a new application in order to achieve good performance. In addition, most of the existing works are done on simple and clean datasets and assume a fixed set of labeled data, which is not necessarily true for real-world applications. The main objectives of this dissertation are to investigate the optimal meta-parameters of deep learning networks as well as the effects of various data pre-processing techniques, propose a new active labeling framework for cost-effective selection of labeled data, and apply deep learning to a real-world application--emotion prediction via physiological sensor data, based on real-world, complex, noisy, and heterogeneous sensor data. For meta-parameters and data pre-processing techniques, this study uses the benchmark MNIST digit recognition image dataset and a sleep-stage-recognition sensor dataset and empirically compares the deep network's performance with a number of different meta-parameters and decisions, including raw data vs. pre-processed data by Principal Component Analysis (PCA) with or without whitening, various structures in terms of the number of layers and the number of nodes in each layer, stacked RBMs vs. stacked autoencoders. For active labeling, a new framework for both stacked RBMs and stacked autoencoders is proposed based on three metrics: least confidence, margin sampling, and entropy. On the MINIST dataset, the methods outperform random labeling consistently by a significant margin. On the other hand, the proposed active labeling methods perform similarly to random labeling on the sleep-stage-recognition dataset due to the noisiness and inconsistency in the data. For the application of deep learning to emotion prediction via physiological sensor data, a software pipeline has been developed. The system first extracts features from the raw data of four channels in an unsupervised fashion and then builds three classifiers to classify the levels of arousal, valence, and liking based on the learned features. The classification accuracy is 0.609, 0.512, and 0.684, respectively, which is comparable with existing methods based on expert designed features.Includes bibliographical references (pages 80-86)
Sparse graphical models for cancer signalling
Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling.
First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathway and network-based priors, and illustrate the proposed method on both synthetic and drug response data.
Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters.
We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments.
Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure.
Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data
Random Projection in Deep Neural Networks
This work investigates the ways in which deep learning methods can benefit
from random projection (RP), a classic linear dimensionality reduction method.
We focus on two areas where, as we have found, employing RP techniques can
improve deep models: training neural networks on high-dimensional data and
initialization of network parameters. Training deep neural networks (DNNs) on
sparse, high-dimensional data with no exploitable structure implies a network
architecture with an input layer that has a huge number of weights, which often
makes training infeasible. We show that this problem can be solved by
prepending the network with an input layer whose weights are initialized with
an RP matrix. We propose several modifications to the network architecture and
training regime that makes it possible to efficiently train DNNs with learnable
RP layer on data with as many as tens of millions of input features and
training examples. In comparison to the state-of-the-art methods, neural
networks with RP layer achieve competitive performance or improve the results
on several extremely high-dimensional real-world datasets. The second area
where the application of RP techniques can be beneficial for training deep
models is weight initialization. Setting the initial weights in DNNs to
elements of various RP matrices enabled us to train residual deep networks to
higher levels of performance
Influence modelling and learning between dynamic bayesian networks using score-based structure learning
A Ph.D. thesis submitted to the Faculty of Science, University of the Witwatersrand,
in fulfillment of the requirements for the degree of Doctor of Philosophy in Computer
Science
May 2018Although partially observable stochastic processes are ubiquitous in many fields of science,
little work has been devoted to discovering and analysing the means by which several such
processes may interact to influence each other. In this thesis we extend probabilistic structure
learning between random variables to the context of temporal models which represent
partially observable stochastic processes. Learning an influence structure and distribution
between processes can be useful for density estimation and knowledge discovery.
A common approach to structure learning, in observable data, is score-based structure
learning, where we search for the most suitable structure by using a scoring metric to value
structural configurations relative to the data. Most popular structure scores are variations on
the likelihood score which calculates the probability of the data given a potential structure.
In observable data, the decomposability of the likelihood score, which is the ability to
represent the score as a sum of family scores, allows for efficient learning procedures and
significant computational saving. However, in incomplete data (either by latent variables or
missing samples), the likelihood score is not decomposable and we have to perform
inference to evaluate it. This forces us to use non-linear optimisation techniques to optimise
the likelihood function. Furthermore, local changes to the network can affect other parts of
the network, which makes learning with incomplete data all the more difficult.
We define two general types of influence scenarios: direct influence and delayed influence
which can be used to define influence around richly structured spaces; consisting of
multiple processes that are interrelated in various ways. We will see that although it is
possible to capture both types of influence in a single complex model by using a setting of
the parameters, complex representations run into fragmentation issues. This is handled by
extending the language of dynamic Bayesian networks to allow us to construct single
compact models that capture the properties of a system’s dynamics, and produce influence
distributions dynamically.
The novelty and intuition of our approach is to learn the optimal influence structure in
layers. We firstly learn a set of independent temporal models, and thereafter, optimise a
structure score over possible structural configurations between these temporal models. Since
the search for the optimal structure is done using complete data we can take advantage of
efficient learning procedures from the structure learning literature. We provide the
following contributions: we (a) introduce the notion of influence between temporal models;
(b) extend traditional structure scores for random variables to structure scores for temporal
models; (c) provide a complete algorithm to recover the influence structure between
temporal models; (d) provide a notion of structural assembles to relate temporal models for
types of influence; and finally, (e) provide empirical evidence for the effectiveness of our
method with respect to generative ground-truth distributions.
The presented results emphasise the trade-off between likelihood of an influence structure to
the ground-truth and the computational complexity to express it. Depending on the
availability of samples we might choose different learning methods to express influence
relations between processes. On one hand, when given too few samples, we may choose to
learn a sparse structure using tree-based structure learning or even using no influence
structure at all. On the other hand, when given an abundant number of samples, we can use
penalty-based procedures that achieve rich meaningful representations using local search
techniques.
Once we consider high-level representations of dynamic influence between temporal models,
we open the door to very rich and expressive representations which emphasise the
importance of knowledge discovery and density estimation in the temporal setting.MT 201
- …