6,804 research outputs found
Supervised cross-modal factor analysis for multiple modal data classification
In this paper we study the problem of learning from multiple modal data for
purpose of document classification. In this problem, each document is composed
two different modals of data, i.e., an image and a text. Cross-modal factor
analysis (CFA) has been proposed to project the two different modals of data to
a shared data space, so that the classification of a image or a text can be
performed directly in this space. A disadvantage of CFA is that it has ignored
the supervision information. In this paper, we improve CFA by incorporating the
supervision information to represent and classify both image and text modals of
documents. We project both image and text data to a shared data space by factor
analysis, and then train a class label predictor in the shared space to use the
class label information. The factor analysis parameter and the predictor
parameter are learned jointly by solving one single objective function. With
this objective function, we minimize the distance between the projections of
image and text of the same document, and the classification error of the
projection measured by hinge loss function. The objective function is optimized
by an alternate optimization strategy in an iterative algorithm. Experiments in
two different multiple modal document data sets show the advantage of the
proposed algorithm over other CFA methods
CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data
mining and machine learning, as most of the real-life datasets are often
imbalanced in nature. Existing learning algorithms maximise the classification
accuracy by correctly classifying the majority class, but misclassify the
minority class. However, the minority class instances are representing the
concept with greater interest than the majority class instances in real-life
applications. Recently, several techniques based on sampling methods
(under-sampling of the majority class and over-sampling the minority class),
cost-sensitive learning methods, and ensemble learning have been used in the
literature for classifying imbalanced datasets. In this paper, we introduce a
new clustering-based under-sampling approach with boosting (AdaBoost)
algorithm, called CUSBoost, for effective imbalanced classification. The
proposed algorithm provides an alternative to RUSBoost (random under-sampling
with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost)
algorithms. We evaluated the performance of CUSBoost algorithm with the
state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost,
SMOTEBoost on 13 imbalance binary and multi-class datasets with various
imbalance ratios. The experimental results show that the CUSBoost is a
promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201
Early hospital mortality prediction using vital signals
Early hospital mortality prediction is critical as intensivists strive to
make efficient medical decisions about the severely ill patients staying in
intensive care units. As a result, various methods have been developed to
address this problem based on clinical records. However, some of the laboratory
test results are time-consuming and need to be processed. In this paper, we
propose a novel method to predict mortality using features extracted from the
heart signals of patients within the first hour of ICU admission. In order to
predict the risk, quantitative features have been computed based on the heart
rate signals of ICU patients. Each signal is described in terms of 12
statistical and signal-based features. The extracted features are fed into
eight classifiers: decision tree, linear discriminant, logistic regression,
support vector machine (SVM), random forest, boosted trees, Gaussian SVM, and
K-nearest neighborhood (K-NN). To derive insight into the performance of the
proposed method, several experiments have been conducted using the well-known
clinical dataset named Medical Information Mart for Intensive Care III
(MIMIC-III). The experimental results demonstrate the capability of the
proposed method in terms of precision, recall, F1-score, and area under the
receiver operating characteristic curve (AUC). The decision tree classifier
satisfies both accuracy and interpretability better than the other classifiers,
producing an F1-score and AUC equal to 0.91 and 0.93, respectively. It
indicates that heart rate signals can be used for predicting mortality in
patients in the ICU, achieving a comparable performance with existing
predictions that rely on high dimensional features from clinical records which
need to be processed and may contain missing information.Comment: 11 pages, 5 figures, preprint of accepted paper in IEEE&ACM CHASE
2018 and published in Smart Health journa
Local Rule-Based Explanations of Black Box Decision Systems
The recent years have witnessed the rise of accurate but obscure decision
systems which hide the logic of their internal decision processes to the users.
The lack of explanations for the decisions of black box systems is a key
ethical issue, and a limitation to the adoption of machine learning components
in socially sensitive and safety-critical contexts. %Therefore, we need
explanations that reveals the reasons why a predictor takes a certain decision.
In this paper we focus on the problem of black box outcome explanation, i.e.,
explaining the reasons of the decision taken on a specific instance. We propose
LORE, an agnostic method able to provide interpretable and faithful
explanations. LORE first leans a local interpretable predictor on a synthetic
neighborhood generated by a genetic algorithm. Then it derives from the logic
of the local interpretable predictor a meaningful explanation consisting of: a
decision rule, which explains the reasons of the decision; and a set of
counterfactual rules, suggesting the changes in the instance's features that
lead to a different outcome. Wide experiments show that LORE outperforms
existing methods and baselines both in the quality of explanations and in the
accuracy in mimicking the black box
ANTIDS: Self-Organized Ant-based Clustering Model for Intrusion Detection System
Security of computers and the networks that connect them is increasingly
becoming of great significance. Computer security is defined as the protection
of computing systems against threats to confidentiality, integrity, and
availability. There are two types of intruders: the external intruders who are
unauthorized users of the machines they attack, and internal intruders, who
have permission to access the system with some restrictions. Due to the fact
that it is more and more improbable to a system administrator to recognize and
manually intervene to stop an attack, there is an increasing recognition that
ID systems should have a lot to earn on following its basic principles on the
behavior of complex natural systems, namely in what refers to
self-organization, allowing for a real distributed and collective perception of
this phenomena. With that aim in mind, the present work presents a
self-organized ant colony based intrusion detection system (ANTIDS) to detect
intrusions in a network infrastructure. The performance is compared among
conventional soft computing paradigms like Decision Trees, Support Vector
Machines and Linear Genetic Programming to model fast, online and efficient
intrusion detection systems.Comment: 13 pages, 3 figures, Swarm Intelligence and Patterns (SIP)- special
track at WSTST 2005, Muroran, JAPA
- …