2,191 research outputs found
Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification
Implicit discourse relation classification is of great challenge due to the
lack of connectives as strong linguistic cues, which motivates the use of
annotated implicit connectives to improve the recognition. We propose a feature
imitation framework in which an implicit relation network is driven to learn
from another neural network with access to connectives, and thus encouraged to
extract similarly salient features for accurate classification. We develop an
adversarial model to enable an adaptive imitation scheme through competition
between the implicit network and a rival feature discriminator. Our method
effectively transfers discriminability of connectives to the implicit features,
and achieves state-of-the-art performance on the PDTB benchmark.Comment: To appear in ACL201
Automated Website Fingerprinting through Deep Learning
Several studies have shown that the network traffic that is generated by a
visit to a website over Tor reveals information specific to the website through
the timing and sizes of network packets. By capturing traffic traces between
users and their Tor entry guard, a network eavesdropper can leverage this
meta-data to reveal which website Tor users are visiting. The success of such
attacks heavily depends on the particular set of traffic features that are used
to construct the fingerprint. Typically, these features are manually engineered
and, as such, any change introduced to the Tor network can render these
carefully constructed features ineffective. In this paper, we show that an
adversary can automate the feature engineering process, and thus automatically
deanonymize Tor traffic by applying our novel method based on deep learning. We
collect a dataset comprised of more than three million network traces, which is
the largest dataset of web traffic ever used for website fingerprinting, and
find that the performance achieved by our deep learning approaches is
comparable to known methods which include various research efforts spanning
over multiple years. The obtained success rate exceeds 96% for a closed world
of 100 websites and 94% for our biggest closed world of 900 classes. In our
open world evaluation, the most performant deep learning model is 2% more
accurate than the state-of-the-art attack. Furthermore, we show that the
implicit features automatically learned by our approach are far more resilient
to dynamic changes of web content over time. We conclude that the ability to
automatically construct the most relevant traffic features and perform accurate
traffic recognition makes our deep learning based approach an efficient,
flexible and robust technique for website fingerprinting.Comment: To appear in the 25th Symposium on Network and Distributed System
Security (NDSS 2018
Noise Injection Node Regularization for Robust Learning
We introduce Noise Injection Node Regularization (NINR), a method of
injecting structured noise into Deep Neural Networks (DNN) during the training
stage, resulting in an emergent regularizing effect. We present theoretical and
empirical evidence for substantial improvement in robustness against various
test data perturbations for feed-forward DNNs when trained under NINR. The
novelty in our approach comes from the interplay of adaptive noise injection
and initialization conditions such that noise is the dominant driver of
dynamics at the start of training. As it simply requires the addition of
external nodes without altering the existing network structure or optimization
algorithms, this method can be easily incorporated into many standard problem
specifications. We find improved stability against a number of data
perturbations, including domain shifts, with the most dramatic improvement
obtained for unstructured noise, where our technique outperforms other existing
methods such as Dropout or regularization, in some cases. We further show
that desirable generalization properties on clean data are generally
maintained.Comment: 16 pages, 9 figure
Learning with Limited Labeled Data in Biomedical Domain by Disentanglement and Semi-Supervised Learning
In this dissertation, we are interested in improving the generalization of deep neural networks for biomedical data (e.g., electrocardiogram signal, x-ray images, etc). Although deep neural networks have attained state-of-the-art performance and, thus, deployment across a variety of domains, similar performance in the clinical setting remains challenging due to its ineptness to generalize across unseen data (e.g., new patient cohort).
We address this challenge of generalization in the deep neural network from two perspectives: 1) learning disentangled representations from the deep network, and 2) developing efficient semi-supervised learning (SSL) algorithms using the deep network.
In the former, we are interested in designing specific architectures and objective functions to learn representations, where variations in the data are well separated, i.e., disentangled. In the latter, we are interested in designing regularizers that encourage the underlying neural function\u27s behavior toward a common inductive bias to avoid over-fitting the function to small labeled data.
Our end goal is to improve the generalization of the deep network for the diagnostic model in both of these approaches. In disentangled representations, this translates to appropriately learning latent representations from the data, capturing the observed input\u27s underlying explanatory factors in an independent and interpretable way. With data\u27s expository factors well separated, such disentangled latent space can then be useful for a large variety of tasks and domains within data distribution even with a small amount of labeled data, thus improving generalization. In developing efficient semi-supervised algorithms, this translates to utilizing a large volume of the unlabelled dataset to assist the learning from the limited labeled dataset, commonly encountered situation in the biomedical domain.
By drawing ideas from different areas within deep learning like representation learning (e.g., autoencoder), variational inference (e.g., variational autoencoder), Bayesian nonparametric (e.g., beta-Bernoulli process), learning theory (e.g., analytical learning theory), function smoothing (Lipschitz Smoothness), etc., we propose several leaning algorithms to improve generalization in the associated task. We test our algorithms on real-world clinical data and show that our approach yields significant improvement over existing methods. Moreover, we demonstrate the efficacy of the proposed models in the benchmark data and simulated data to understand different aspects of the proposed learning methods.
We conclude by identifying some of the limitations of the proposed methods, areas of further improvement, and broader future directions for the successful adoption of AI models in the clinical environment
General Greedy De-bias Learning
Neural networks often make predictions relying on the spurious correlations
from the datasets rather than the intrinsic properties of the task of interest,
facing sharp degradation on out-of-distribution (OOD) test data. Existing
de-bias learning frameworks try to capture specific dataset bias by annotations
but they fail to handle complicated OOD scenarios. Others implicitly identify
the dataset bias by special design low capability biased models or losses, but
they degrade when the training and testing data are from the same distribution.
In this paper, we propose a General Greedy De-bias learning framework (GGD),
which greedily trains the biased models and the base model. The base model is
encouraged to focus on examples that are hard to solve with biased models, thus
remaining robust against spurious correlations in the test stage. GGD largely
improves models' OOD generalization ability on various tasks, but sometimes
over-estimates the bias level and degrades on the in-distribution test. We
further re-analyze the ensemble process of GGD and introduce the Curriculum
Regularization inspired by curriculum learning, which achieves a good trade-off
between in-distribution and out-of-distribution performance. Extensive
experiments on image classification, adversarial question answering, and visual
question answering demonstrate the effectiveness of our method. GGD can learn a
more robust base model under the settings of both task-specific biased models
with prior knowledge and self-ensemble biased model without prior knowledge.Comment: This work has been submitted to IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Blockout: Dynamic Model Selection for Hierarchical Deep Networks
Most deep architectures for image classification--even those that are trained
to classify a large number of diverse categories--learn shared image
representations with a single model. Intuitively, however, categories that are
more similar should share more information than those that are very different.
While hierarchical deep networks address this problem by learning separate
features for subsets of related categories, current implementations require
simplified models using fixed architectures specified via heuristic clustering
methods. Instead, we propose Blockout, a method for regularization and model
selection that simultaneously learns both the model architecture and
parameters. A generalization of Dropout, our approach gives a novel
parametrization of hierarchical architectures that allows for structure
learning via back-propagation. To demonstrate its utility, we evaluate Blockout
on the CIFAR and ImageNet datasets, demonstrating improved classification
accuracy, better regularization performance, faster training, and the clear
emergence of hierarchical network structures
- …