8,417 research outputs found
PDE-Foam - a probability-density estimation method using self-adapting phase-space binning
Probability Density Estimation (PDE) is a multivariate discrimination
technique based on sampling signal and background densities defined by event
samples from data or Monte-Carlo (MC) simulations in a multi-dimensional phase
space. In this paper, we present a modification of the PDE method that uses a
self-adapting binning method to divide the multi-dimensional phase space in a
finite number of hyper-rectangles (cells). The binning algorithm adjusts the
size and position of a predefined number of cells inside the multi-dimensional
phase space, minimising the variance of the signal and background densities
inside the cells. The implementation of the binning algorithm PDE-Foam is based
on the MC event-generation package Foam. We present performance results for
representative examples (toy models) and discuss the dependence of the obtained
results on the choice of parameters. The new PDE-Foam shows improved
classification capability for small training samples and reduced classification
time compared to the original PDE method based on range searching.Comment: 19 pages, 11 figures; replaced with revised version accepted for
publication in NIM A and corrected typos in description of Fig. 7 and
On the performance of helper data template protection schemes
The use of biometrics looks promising as it is already being applied in elec- tronic passports, ePassports, on a global scale. Because the biometric data has to be stored as a reference template on either a central or personal storage de- vice, its wide-spread use introduces new security and privacy risks such as (i) identity fraud, (ii) cross-matching, (iii) irrevocability and (iv) leaking sensitive medical information. Mitigating these risks is essential to obtain the accep- tance from the subjects of the biometric systems and therefore facilitating the successful implementation on a large-scale basis. A solution to mitigate these risks is to use template protection techniques. The required protection properties of the stored reference template according to ISO guidelines are (i) irreversibility, (ii) renewability and (iii) unlinkability. A known template protection scheme is the helper data system (HDS). The fun- damental principle of the HDS is to bind a key with the biometric sample with use of helper data and cryptography, as such that the key can be reproduced or released given another biometric sample of the same subject. The identity check is then performed in a secure way by comparing the hash of the key. Hence, the size of the key determines the amount of protection. This thesis extensively investigates the HDS system, namely (i) the the- oretical classication performance, (ii) the maximum key size, (iii) the irre- versibility and unlinkability properties, and (iv) the optimal multi-sample and multi-algorithm fusion method. The theoretical classication performance of the biometric system is deter- mined by assuming that the features extracted from the biometric sample are Gaussian distributed. With this assumption we investigate the in uence of the bit extraction scheme on the classication performance. With use of the the- oretical framework, the maximum size of the key is determined by assuming the error-correcting code to operate on Shannon's bound. We also show three vulnerabilities of HDS that aect the irreversibility and unlinkability property and propose solutions. Finally, we study the optimal level of applying multi- sample and multi-algorithm fusion with the HDS at either feature-, score-, or decision-level
Multi-signal Anomaly Detection for Real-Time Embedded Systems
This thesis presents MuSADET, an anomaly detection framework targeting timing anomalies found in event traces from real-time embedded systems. The method leverages stationary event generators, signal processing, and distance metrics to classify inter-arrival time sequences as normal/anomalous. Experimental evaluation of traces collected from two real-time embedded systems provides empirical evidence of MuSADET’s anomaly detection performance.
MuSADET is appropriate for embedded systems, where many event generators are intrinsically recurrent and generate stationary sequences of timestamp. To find timinganomalies, MuSADET compares the frequency domain features of an unknown trace to a normal model trained from well-behaved executions of the system. Each signal in the analysis trace receives a normal/anomalous score, which can help engineers isolate the source of the anomaly.
Empirical evidence of anomaly detection performed on traces collected from an industrygrade hexacopter and the Controller Area Network (CAN) bus deployed in a real vehicle demonstrates the feasibility of the proposed method. In all case studies, anomaly detection did not require an anomaly model while achieving high detection rates. For some of the studied scenarios, the true positive detection rate goes above 99 %, with false-positive rates below one %. The visualization of classification scores shows that some timing anomalies can propagate to multiple signals within the system. Comparison to the similar method, Signal Processing for Trace Analysis (SiPTA), indicates that MuSADET is superior in detection performance and provides complementary information that can help link anomalies to the process where they occurred
A New Model Averaging Approach in Predicting Credit Risk Default
none2siThe paper introduces a novel approach to ensemble modeling as a weighted model average technique. The proposed idea is prudent, simple to understand, and easy to implement compared to the Bayesian and frequentist approach. The paper provides both theoretical and empirical contributions for assessing credit risk (probability of default) effectively in a new way by creating an ensemble model as a weighted linear combination of machine learning models. The idea can be generalized to any classification problems in other domains where ensemble-type modeling is a subject of interest and is not limited to an unbalanced dataset or credit risk assessment. The results suggest a better forecasting performance compared to the single best well-known machine learning of parametric, non-parametric, and other ensemble models. The scope of our approach can be extended to any further improvement in estimating weights differently that may be beneficial to enhance the performance of the model average as a future research direction.openParitosh Navinchandra Jha; Cucculelli MarcoJha, Paritosh Navinchandra; Cucculelli, Marc
Confidence Bands for Roc Curves
In this paper we study techniques for generating and evaluating
confidence bands on ROC curves. ROC curve evaluation is
rapidly becoming a commonly used evaluation metric in machine
learning, although evaluating ROC curves has thus far been limited
to studying the area under the curve (AUC) or generation of
one-dimensional confidence intervals by freezing one variableâ
the false-positive rate, or threshold on the classification scoring
function. Researchers in the medical field have long been using
ROC curves and have many well-studied methods for analyzing
such curves, including generating confidence intervals as
well as simultaneous confidence bands. In this paper we introduce
these techniques to the machine learning community and
show their empirical fitness on the Covertype data setâa standard
machine learning benchmark from the UCI repository. We
show how some of these methods work remarkably well, others
are too loose, and that existing machine learning methods for generation
of 1-dimensional confidence intervals do not translate well
to generation of simultaneous bandsâtheir bands are too tight.Information Systems Working Papers Serie
- …