259 research outputs found
Classifier combination schemes in speech impediment therapy systems
In the therapy of the hearing impaired one of the key problems is how to deal with the lack of proper auditive feedback which impedes the development of intelligible speech. The effectiveness of the therapy relies heavily on accurate phoneme recognition [1, 4, 17]. Because of the environmental difficulties, simple recognition algorithms may have a weak classification performance, so various techniques such as normalization and classifier combination are applied to increase the recognition accuracy. This paper examines Vocal Tract Length Normalization techniques [5, 13] focusing mainly on the real-time parameter estimation [12], and the majority of classifier combination schemes, including the traditional (Prod, Sum, Min, Max) [7], basic linear (simple, weighted, AHP-based [6] averaging), and some special linear (Bagging, Boosting) combinations. Based on the results we conclude that hybrid combinations can improve the effectiveness of the real-time normalization methods
DeltaPhish: Detecting Phishing Webpages in Compromised Websites
The large-scale deployment of modern phishing attacks relies on the automatic
exploitation of vulnerable websites in the wild, to maximize profit while
hindering attack traceability, detection and blacklisting. To the best of our
knowledge, this is the first work that specifically leverages this adversarial
behavior for detection purposes. We show that phishing webpages can be
accurately detected by highlighting HTML code and visual differences with
respect to other (legitimate) pages hosted within a compromised website. Our
system, named DeltaPhish, can be installed as part of a web application
firewall, to detect the presence of anomalous content on a website after
compromise, and eventually prevent access to it. DeltaPhish is also robust
against adversarial attempts in which the HTML code of the phishing page is
carefully manipulated to evade detection. We empirically evaluate it on more
than 5,500 webpages collected in the wild from compromised websites, showing
that it is capable of detecting more than 99% of phishing webpages, while only
misclassifying less than 1% of legitimate pages. We further show that the
detection rate remains higher than 70% even under very sophisticated attacks
carefully designed to evade our system.Comment: Preprint version of the work accepted at ESORICS 201
Local learning for multi-layer, multi-component predictive system
This study introduces a new multi-layer multi-component ensemble. The components of this ensemble are trained locally on subsets of features for disjoint sets of data. The data instances are assigned to local regions using the similarity of their features pairwise squared correlation. Many ensemble methods encourage diversity among their base predictors by training them on different subsets of data or different subsets of features. In the proposed architecture the local regions contain disjoint sets of data and for this data only the most similar features are selected. The pairwise squared correlations of the features are used to weight the predictions of the ensemble's models. The proposed architecture has been tested on a number of data sets and its performance was compared to five benchmark algorithms. The results showed that the testing accuracy of the developed architecture is comparable to the rotation forest and is better than the other benchmark algorithms
Multi-Class Classification Averaging Fusion for Detecting Steganography
Multiple classifier fusion has the capability of increasing classification accuracy over individual classifier systems. This paper focuses on the development of a multi-class classification fusion based on weighted averaging of posterior class probabilities. This fusion system is applied to the steganography fingerprint domain, in which the classifier identifies the statistical patterns in an image which distinguish one steganography algorithm from another. Specifically we focus on algorithms in which jpeg images provide the cover in order to communicate covertly. The embedding methods targeted are F5, JSteg, Model Based, OutGuess, and StegHide. The developed multi-class steganalvsis system consists of three levels: (1) feature preprocessing in which a projection function maps the input vectors into a separable space, (2) classifier system using an ensemble of classifiers, and (3) two weighted fusion techniques are compared, the first is a well known variance weighted fusion and an Gaussian weighted fusion. Results show that through the novel addition of the classifier fusion step to the multi-class steganalysis system, the classification accuracy is improved by up to 12%
Comparison of Classifier Fusion Methods for Predicting Response to Anti HIV-1 Therapy
BACKGROUND: Analysis of the viral genome for drug resistance mutations is state-of-the-art for guiding treatment selection for human immunodeficiency virus type 1 (HIV-1)-infected patients. These mutations alter the structure of viral target proteins and reduce or in the worst case completely inhibit the effect of antiretroviral compounds while maintaining the ability for effective replication. Modern anti-HIV-1 regimens comprise multiple drugs in order to prevent or at least delay the development of resistance mutations. However, commonly used HIV-1 genotype interpretation systems provide only classifications for single drugs. The EuResist initiative has collected data from about 18,500 patients to train three classifiers for predicting response to combination antiretroviral therapy, given the viral genotype and further information. In this work we compare different classifier fusion methods for combining the individual classifiers. PRINCIPAL FINDINGS: The individual classifiers yielded similar performance, and all the combination approaches considered performed equally well. The gain in performance due to combining methods did not reach statistical significance compared to the single best individual classifier on the complete training set. However, on smaller training set sizes (200 to 1,600 instances compared to 2,700) the combination significantly outperformed the individual classifiers (p<0.01; paired one-sided Wilcoxon test). Together with a consistent reduction of the standard deviation compared to the individual prediction engines this shows a more robust behavior of the combined system. Moreover, using the combined system we were able to identify a class of therapy courses that led to a consistent underestimation (about 0.05 AUC) of the system performance. Discovery of these therapy courses is a further hint for the robustness of the combined system. CONCLUSION: The combined EuResist prediction engine is freely available at http://engine.euresist.org
- …