11,847 research outputs found
Combination of linear classifiers using score function -- analysis of possible combination strategies
In this work, we addressed the issue of combining linear classifiers using
their score functions. The value of the scoring function depends on the
distance from the decision boundary. Two score functions have been tested and
four different combination strategies were investigated. During the
experimental study, the proposed approach was applied to the heterogeneous
ensemble and it was compared to two reference methods -- majority voting and
model averaging respectively. The comparison was made in terms of seven
different quality criteria. The result shows that combination strategies based
on simple average, and trimmed average are the best combination strategies of
the geometrical combination
TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System
Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier
Fair comparison of skin detection approaches on publicly available datasets
Skin detection is the process of discriminating skin and non-skin regions in
a digital image and it is widely used in several applications ranging from hand
gesture analysis to track body parts and face detection. Skin detection is a
challenging problem which has drawn extensive attention from the research
community, nevertheless a fair comparison among approaches is very difficult
due to the lack of a common benchmark and a unified testing protocol. In this
work, we investigate the most recent researches in this field and we propose a
fair comparison among approaches using several different datasets. The major
contributions of this work are an exhaustive literature review of skin color
detection approaches, a framework to evaluate and combine different skin
detector approaches, whose source code is made freely available for future
research, and an extensive experimental comparison among several recent methods
which have also been used to define an ensemble that works well in many
different problems. Experiments are carried out in 10 different datasets
including more than 10000 labelled images: experimental results confirm that
the best method here proposed obtains a very good performance with respect to
other stand-alone approaches, without requiring ad hoc parameter tuning. A
MATLAB version of the framework for testing and of the methods proposed in this
paper will be freely available from https://github.com/LorisNann
Ensembles of probability estimation trees for customer churn prediction
Customer churn prediction is one of the most, important elements tents of a company's Customer Relationship Management, (CRM) strategy In tins study, two strategies are investigated to increase the lift. performance of ensemble classification models, i.e (1) using probability estimation trees (PETs) instead of standard decision trees as base classifiers; and (n) implementing alternative fusion rules based on lift weights lot the combination of ensemble member's outputs Experiments ale conducted lot font popular ensemble strategics on five real-life chin n data sets In general, the results demonstrate how lift performance can be substantially improved by using alternative base classifiers and fusion tides However: the effect vanes lot the (Idol cut ensemble strategies lit particular, the results indicate an increase of lift performance of (1) Bagging by implementing C4 4 base classifiets. (n) the Random Subspace Method (RSM) by using lift-weighted fusion rules, and (in) AdaBoost, by implementing both
Short-Term Forecasting of Passenger Demand under On-Demand Ride Services: A Spatio-Temporal Deep Learning Approach
Short-term passenger demand forecasting is of great importance to the
on-demand ride service platform, which can incentivize vacant cars moving from
over-supply regions to over-demand regions. The spatial dependences, temporal
dependences, and exogenous dependences need to be considered simultaneously,
however, which makes short-term passenger demand forecasting challenging. We
propose a novel deep learning (DL) approach, named the fusion convolutional
long short-term memory network (FCL-Net), to address these three dependences
within one end-to-end learning architecture. The model is stacked and fused by
multiple convolutional long short-term memory (LSTM) layers, standard LSTM
layers, and convolutional layers. The fusion of convolutional techniques and
the LSTM network enables the proposed DL approach to better capture the
spatio-temporal characteristics and correlations of explanatory variables. A
tailored spatially aggregated random forest is employed to rank the importance
of the explanatory variables. The ranking is then used for feature selection.
The proposed DL approach is applied to the short-term forecasting of passenger
demand under an on-demand ride service platform in Hangzhou, China.
Experimental results, validated on real-world data provided by DiDi Chuxing,
show that the FCL-Net achieves better predictive performance than traditional
approaches including both classical time-series prediction models and neural
network based algorithms (e.g., artificial neural network and LSTM). This paper
is one of the first DL studies to forecast the short-term passenger demand of
an on-demand ride service platform by examining the spatio-temporal
correlations.Comment: 39 pages, 10 figure
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
- …