9 research outputs found
Denoising Deep Neural Networks Based Voice Activity Detection
Recently, the deep-belief-networks (DBN) based voice activity detection (VAD)
has been proposed. It is powerful in fusing the advantages of multiple
features, and achieves the state-of-the-art performance. However, the deep
layers of the DBN-based VAD do not show an apparent superiority to the
shallower layers. In this paper, we propose a denoising-deep-neural-network
(DDNN) based VAD to address the aforementioned problem. Specifically, we
pre-train a deep neural network in a special unsupervised denoising greedy
layer-wise mode, and then fine-tune the whole network in a supervised way by
the common back-propagation algorithm. In the pre-training phase, we take the
noisy speech signals as the visible layer and try to extract a new feature that
minimizes the reconstruction cross-entropy loss between the noisy speech
signals and its corresponding clean speech signals. Experimental results show
that the proposed DDNN-based VAD not only outperforms the DBN-based VAD but
also shows an apparent performance improvement of the deep layers over
shallower layers.Comment: This paper has been accepted by IEEE ICASSP-2013, and will be
published online after May, 201
High Performance Computing Techniques to Better Understand Protein Conformational Space
This thesis presents an amalgamation of high performance computing techniques to get better insight into protein molecular dynamics. Key aspects of protein function and dynamics can be learned from their conformational space. Datasets that represent the complex nuances of a protein molecule are high dimensional. Efficient dimensionality reduction becomes indispensable for the analysis of such exorbitant datasets. Dimensionality reduction forms a formidable portion of this work and its application has been explored for other datasets as well. It begins with the parallelization of a known non-liner feature reduction algorithm called Isomap. The code for the algorithm was re-written in C with portions of it parallelized using OpenMP. Next, a novel data instance reduction method was devised which evaluates the information content offered by each data point, which ultimately helps in truncation of the dataset with much fewer data points to evaluate. Once a framework has been established to reduce the number of variables representing a dataset, the work is extended to explore algebraic topology techniques to extract meaningful information from these datasets. This step is the one that helps in sampling the conformations of interest of a protein molecule. The method employs the notion of hierarchical clustering to identify classes within a molecule, thereafter, algebraic topology is used to analyze these classes. Finally, the work is concluded by presenting an approach to solve the open problem of protein folding. A Monte-Carlo based tree search algorithm is put forth to simulate the pathway that a certain protein conformation undertakes to reach another conformation.
The dissertation, in its entirety, offers solutions to a few problems that hinder the progress of solution for the vast problem of understanding protein dynamics. The motion of a protein molecule is guided by changes in its energy profile. In this course the molecule gradually slips from one energy class to another. Structurally, this switch is transient spanning over milliseconds or less and hence is difficult to be captured solely by the work in wet laboratories
Kernel-Based Ranking. Methods for Learning and Performance Estimation
Machine learning provides tools for automated construction of predictive
models in data intensive areas of engineering and science. The family of
regularized kernel methods have in the recent years become one of the mainstream
approaches to machine learning, due to a number of advantages the
methods share. The approach provides theoretically well-founded solutions
to the problems of under- and overfitting, allows learning from structured
data, and has been empirically demonstrated to yield high predictive performance
on a wide range of application domains. Historically, the problems
of classification and regression have gained the majority of attention in the
field. In this thesis we focus on another type of learning problem, that of
learning to rank.
In learning to rank, the aim is from a set of past observations to learn
a ranking function that can order new objects according to how well they
match some underlying criterion of goodness. As an important special case
of the setting, we can recover the bipartite ranking problem, corresponding
to maximizing the area under the ROC curve (AUC) in binary classification.
Ranking applications appear in a large variety of settings, examples
encountered in this thesis include document retrieval in web search, recommender
systems, information extraction and automated parsing of natural
language. We consider the pairwise approach to learning to rank, where
ranking models are learned by minimizing the expected probability of ranking
any two randomly drawn test examples incorrectly. The development
of computationally efficient kernel methods, based on this approach, has in
the past proven to be challenging. Moreover, it is not clear what techniques
for estimating the predictive performance of learned models are the most
reliable in the ranking setting, and how the techniques can be implemented
efficiently.
The contributions of this thesis are as follows. First, we develop
RankRLS, a computationally efficient kernel method for learning to rank,
that is based on minimizing a regularized pairwise least-squares loss. In
addition to training methods, we introduce a variety of algorithms for tasks
such as model selection, multi-output learning, and cross-validation, based
on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm,
which is one of the most well established methods for learning to
rank. Third, we study the combination of the empirical kernel map and reduced
set approximation, which allows the large-scale training of kernel machines
using linear solvers, and propose computationally efficient solutions
to cross-validation when using the approach. Next, we explore the problem
of reliable cross-validation when using AUC as a performance criterion,
through an extensive simulation study. We demonstrate that the proposed
leave-pair-out cross-validation approach leads to more reliable performance
estimation than commonly used alternative approaches. Finally, we present
a case study on applying machine learning to information extraction from
biomedical literature, which combines several of the approaches considered
in the thesis. The thesis is divided into two parts. Part I provides the background
for the research work and summarizes the most central results, Part
II consists of the five original research articles that are the main contribution
of this thesis.Siirretty Doriast