13,609 research outputs found
COMET: A Recipe for Learning and Using Large Ensembles on Massive Data
COMET is a single-pass MapReduce algorithm for learning on large-scale data.
It builds multiple random forest ensembles on distributed blocks of data and
merges them into a mega-ensemble. This approach is appropriate when learning
from massive-scale data that is too large to fit on a single machine. To get
the best accuracy, IVoting should be used instead of bagging to generate the
training subset for each decision tree in the random forest. Experiments with
two large datasets (5GB and 50GB compressed) show that COMET compares favorably
(in both accuracy and training time) to learning on a subsample of data using a
serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble
evaluation which dynamically decides how many ensemble members to evaluate per
data point; this can reduce evaluation cost by 100X or more
Hashmod: A Hashing Method for Scalable 3D Object Detection
We present a scalable method for detecting objects and estimating their 3D
poses in RGB-D data. To this end, we rely on an efficient representation of
object views and employ hashing techniques to match these views against the
input frame in a scalable way. While a similar approach already exists for 2D
detection, we show how to extend it to estimate the 3D pose of the detected
objects. In particular, we explore different hashing strategies and identify
the one which is more suitable to our problem. We show empirically that the
complexity of our method is sublinear with the number of objects and we enable
detection and pose estimation of many 3D objects with high accuracy while
outperforming the state-of-the-art in terms of runtime.Comment: BMVC 201
PerfWeb: How to Violate Web Privacy with Hardware Performance Events
The browser history reveals highly sensitive information about users, such as
financial status, health conditions, or political views. Private browsing modes
and anonymity networks are consequently important tools to preserve the privacy
not only of regular users but in particular of whistleblowers and dissidents.
Yet, in this work we show how a malicious application can infer opened websites
from Google Chrome in Incognito mode and from Tor Browser by exploiting
hardware performance events (HPEs). In particular, we analyze the browsers'
microarchitectural footprint with the help of advanced Machine Learning
techniques: k-th Nearest Neighbors, Decision Trees, Support Vector Machines,
and in contrast to previous literature also Convolutional Neural Networks. We
profile 40 different websites, 30 of the top Alexa sites and 10 whistleblowing
portals, on two machines featuring an Intel and an ARM processor. By monitoring
retired instructions, cache accesses, and bus cycles for at most 5 seconds, we
manage to classify the selected websites with a success rate of up to 86.3%.
The results show that hardware performance events can clearly undermine the
privacy of web users. We therefore propose mitigation strategies that impede
our attacks and still allow legitimate use of HPEs
Recommended from our members
Improving music genre classification using automatically induced harmony rules
We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 Ă— 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates
Recommended from our members
Improving music genre classification using automatically induced harmony rules
We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 Ă— 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates
Improving the performance of free-text keystroke dynamics authentication by fusion
Free-text keystroke dynamics is invariably hampered by the huge amount of data needed to train the system. This problem has been addressed in this paper by suggesting a system that combines two methods, both of which provide a reduced training requirement for user authentication using free-text keystrokes. The two methods were fused to achieve error rates lower than those produced by each method separately. Two fusion schemes, namely: decision-level fusion and feature-level fusion, were applied. Feature-level fusion was done by concatenating two sets of features before the learning stage. The two sets of features were: a timing feature set and a non-conventional feature set. Moreover, decision-level fusion was used to merge the output of two methods using majority voting. One is Support Vector Machines (SVMs) together with Ant Colony Optimization (ACO) feature selection and the other is decision trees (DTs). Even though the classifiers using the parameters merged at feature level produced low error rates, its results were outperformed by the results achieved by the decision-level fusion scheme. Decision-level fusion was employed to achieve the best performance of 0.00% False Accept Rate (FAR) and 0.00% False Reject Rate (FRR)
- …