10,537 research outputs found
MEG Decoding Across Subjects
Brain decoding is a data analysis paradigm for neuroimaging experiments that
is based on predicting the stimulus presented to the subject from the
concurrent brain activity. In order to make inference at the group level, a
straightforward but sometimes unsuccessful approach is to train a classifier on
the trials of a group of subjects and then to test it on unseen trials from new
subjects. The extreme difficulty is related to the structural and functional
variability across the subjects. We call this approach "decoding across
subjects". In this work, we address the problem of decoding across subjects for
magnetoencephalographic (MEG) experiments and we provide the following
contributions: first, we formally describe the problem and show that it belongs
to a machine learning sub-field called transductive transfer learning (TTL).
Second, we propose to use a simple TTL technique that accounts for the
differences between train data and test data. Third, we propose the use of
ensemble learning, and specifically of stacked generalization, to address the
variability across subjects within train data, with the aim of producing more
stable classifiers. On a face vs. scramble task MEG dataset of 16 subjects, we
compare the standard approach of not modelling the differences across subjects,
to the proposed one of combining TTL and ensemble learning. We show that the
proposed approach is consistently more accurate than the standard one
Automated Website Fingerprinting through Deep Learning
Several studies have shown that the network traffic that is generated by a
visit to a website over Tor reveals information specific to the website through
the timing and sizes of network packets. By capturing traffic traces between
users and their Tor entry guard, a network eavesdropper can leverage this
meta-data to reveal which website Tor users are visiting. The success of such
attacks heavily depends on the particular set of traffic features that are used
to construct the fingerprint. Typically, these features are manually engineered
and, as such, any change introduced to the Tor network can render these
carefully constructed features ineffective. In this paper, we show that an
adversary can automate the feature engineering process, and thus automatically
deanonymize Tor traffic by applying our novel method based on deep learning. We
collect a dataset comprised of more than three million network traces, which is
the largest dataset of web traffic ever used for website fingerprinting, and
find that the performance achieved by our deep learning approaches is
comparable to known methods which include various research efforts spanning
over multiple years. The obtained success rate exceeds 96% for a closed world
of 100 websites and 94% for our biggest closed world of 900 classes. In our
open world evaluation, the most performant deep learning model is 2% more
accurate than the state-of-the-art attack. Furthermore, we show that the
implicit features automatically learned by our approach are far more resilient
to dynamic changes of web content over time. We conclude that the ability to
automatically construct the most relevant traffic features and perform accurate
traffic recognition makes our deep learning based approach an efficient,
flexible and robust technique for website fingerprinting.Comment: To appear in the 25th Symposium on Network and Distributed System
Security (NDSS 2018
Stacking classifiers for anti-spam filtering of e-mail
We evaluate empirically a scheme for combining classifiers, known as stacked
generalization, in the context of anti-spam filtering, a novel cost-sensitive
application of text categorization. Unsolicited commercial e-mail, or "spam",
floods mailboxes, causing frustration, wasting bandwidth, and exposing minors
to unsuitable content. Using a public corpus, we show that stacking can improve
the efficiency of automatically induced anti-spam filters, and that such
filters can be used in real-life applications
miSTAR : miRNA target prediction through modeling quantitative and qualitative miRNA binding site information in a stacked model structure
In microRNA (miRNA) target prediction, typically two levels of information need to be modeled: the number of potential miRNA binding sites present in a target mRNA and the genomic context of each individual site. Single model structures insufficiently cope with this complex training data structure, consisting of feature vectors of unequal length as a consequence of the varying number of miRNA binding sites in different mRNAs. To circumvent this problem, we developed a two-layered, stacked model, in which the influence of binding site context is separately modeled. Using logistic regression and random forests, we applied the stacked model approach to a unique data set of 7990 probed miRNA-mRNA interactions, hereby including the largest number of miRNAs in model training to date. Compared to lower-complexity models, a particular stacked model, named miSTAR (miRNA stacked model target prediction; www.mi-star.org), displays a higher general performance and precision on top scoring predictions. More importantly, our model outperforms published and widely used miRNA target prediction algorithms. Finally, we highlight flaws in cross-validation schemes for evaluation of miRNA target prediction models and adopt a more fair and stringent approach
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
- …