Search CORE

4 research outputs found

A Vertical and Horizontal Intelligent Dataset Reduction Approach for Cyber-Physical Power Aware Intrusion Detection Systems

Author: Erradi Abdelkarim
Kholidy Hisham A.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

The Cypher Physical Power Systems (CPPS) became vital targets for intruders because of the large volume of high speed heterogeneous data provided from the Wide Area Measurement Systems (WAMS). The Nonnested Generalized Exemplars (NNGE) algorithm is one of the most accurate classification techniques that can work with such data of CPPS. However, NNGE algorithm tends to produce rules that test a large number of input features. This poses some problems for the large volume data and hinders the scalability of any detection system. In this paper, we introduce VHDRA, a Vertical and Horizontal Data Reduction Approach, to improve the classification accuracy and speed of the NNGE algorithm and reduce the computational resource consumption. VHDRA provides the following functionalities: (1) it vertically reduces the dataset features by selecting the most significant features and by reducing the NNGE's hyperrectangles. (2) It horizontally reduces the size of data while preserving original key events and patterns within the datasets using an approach called STEM, State Tracking and Extraction Method. The experiments show that the overall performance of VHDRA using both the vertical and the horizontal reduction reduces the NNGE hyperrectangles by 29.06%, 37.34%, and 26.76% and improves the accuracy of the NNGE by 8.57%, 4.19%, and 3.78% using the Multi-, Binary, and Triple class datasets, respectively.This work was made possible by NPRP Grant # NPRP9-005-1-002 from the Qatar National Research Fund (a member of Qatar Foundation).Scopu

Qatar University Institutional Repository

Fair Causal Feature Selection

Author: Ling Zhaolong
Wu Jingxuan
Wu Xindong
Wu Xingyu
Yu Kui
Zhang Yiwen
Zhou Peng
Publication venue
Publication date: 17/06/2023
Field of study

Causal feature selection has recently received increasing attention in machine learning. Existing causal feature selection algorithms select unique causal features of a class variable as the optimal feature subset. However, a class variable usually has multiple states, and it is unfair to select the same causal features for different states of a class variable. To address this problem, we employ the class-specific mutual information to evaluate the causal information carried by each state of the class attribute, and theoretically analyze the unique relationship between each state and the causal features. Based on this, a Fair Causal Feature Selection algorithm (FairCFS) is proposed to fairly identifies the causal features for each state of the class variable. Specifically, FairCFS uses the pairwise comparisons of class-specific mutual information and the size of class-specific mutual information values from the perspective of each state, and follows a divide-and-conquer framework to find causal features. The correctness and application condition of FairCFS are theoretically proved, and extensive experiments are conducted to demonstrate the efficiency and superiority of FairCFS compared to the state-of-the-art approaches

arXiv.org e-Print Archive

Identifying diagnosis-specific genotype–phenotype associations via joint multitask sparse canonical correlation analysis and classification

Author: Du Lei
Guo Lei
Han Junwei
Liu Fang
Liu Kefei
Risacher Shannon L
Saykin Andrew J
Shen Li
Yao Xiaohui
Publication venue: 'Oxford University Press (OUP)'
Publication date: 13/07/2020
Field of study

Motivation Brain imaging genetics studies the complex associations between genotypic data such as single nucleotide polymorphisms (SNPs) and imaging quantitative traits (QTs). The neurodegenerative disorders usually exhibit the diversity and heterogeneity, originating from which different diagnostic groups might carry distinct imaging QTs, SNPs and their interactions. Sparse canonical correlation analysis (SCCA) is widely used to identify bi-multivariate genotype–phenotype associations. However, most existing SCCA methods are unsupervised, leading to an inability to identify diagnosis-specific genotype–phenotype associations. Results In this article, we propose a new joint multitask learning method, named MT–SCCALR, which absorbs the merits of both SCCA and logistic regression. MT–SCCALR learns genotype–phenotype associations of multiple tasks jointly, with each task focusing on identifying one diagnosis-specific genotype–phenotype pattern. Meanwhile, MT–SCCALR cannot only select relevant SNPs and imaging QTs for each diagnostic group alone, but also allows the selection of those shared by multiple diagnostic groups. We derive an efficient optimization algorithm whose convergence to a local optimum is guaranteed. Compared with two state-of-the-art methods, MT–SCCALR yields better or similar canonical correlation coefficients and classification performances. In addition, it owns much better discriminative canonical weight patterns of great interest than competitors. This demonstrates the power and capability of MTSCCAR in identifying diagnostically heterogeneous genotype–phenotype patterns, which would be helpful to understand the pathophysiology of brain disorders

IUPUIScholarWorks

Class-Specific Feature Sets in Classification

Author: Paul Baggenstoss
Publication venue
Publication date: 01/01/1998
Field of study

The commonly used feature-based classifier implements the maximum aposteriori probability (MAP) of the data class given the features. This requires the joint probability density function (PDF) of the features under each of the class hypotheses. Unfortunately, these PDF's are rarely known and must be estimated from training data. Poor performance results if the amount of training data is insufficient to estimate the high-dimensional feature PDF's. The class-specific theorem is presented in which the MAP decision rule is rewritten as a function of low-dimensional PDF's which may be estimated in practice from far smaller data sets. Necessary conditions include (a) that there exists a low-dimensional feature subset for each class that is a sufficient statistic for the underlying random parameters of each data class, and (b) that there exists at least one point in each parameter space that corresponds to a common PDF. We provide a proof of the theorem supported by an example using synthetic signals. Two orders of magnitude fewer training samples are required by the class-specific approach.

CiteSeerX