237,134 research outputs found
Structure-Aware Classification using Supervised Dictionary Learning
In this paper, we propose a supervised dictionary learning algorithm that
aims to preserve the local geometry in both dimensions of the data. A
graph-based regularization explicitly takes into account the local manifold
structure of the data points. A second graph regularization gives similar
treatment to the feature domain and helps in learning a more robust dictionary.
Both graphs can be constructed from the training data or learned and adapted
along the dictionary learning process. The combination of these two terms
promotes the discriminative power of the learned sparse representations and
leads to improved classification accuracy. The proposed method was evaluated on
several different datasets, representing both single-label and multi-label
classification problems, and demonstrated better performance compared with
other dictionary based approaches
A Multimodal Graph Neural Network Framework of Cancer Molecular Subtype Classification
The recent development of high-throughput sequencing creates a large
collection of multi-omics data, which enables researchers to better investigate
cancer molecular profiles and cancer taxonomy based on molecular subtypes.
Integrating multi-omics data has been proven to be effective for building more
precise classification models. Current multi-omics integrative models mainly
use early fusion by concatenation or late fusion based on deep neural networks.
Due to the nature of biological systems, graphs are a better representation of
bio-medical data. Although few graph neural network (GNN) based multi-omics
integrative methods have been proposed, they suffer from three common
disadvantages. One is most of them use only one type of connection, either
inter-omics or intra-omic connection; second, they only consider one kind of
GNN layer, either graph convolution network (GCN) or graph attention network
(GAT); and third, most of these methods lack testing on a more complex cancer
classification task. We propose a novel end-to-end multi-omics GNN framework
for accurate and robust cancer subtype classification. The proposed model
utilizes multi-omics data in the form of heterogeneous multi-layer graphs that
combines both inter-omics and intra-omic connections from established
biological knowledge. The proposed model incorporates learned graph features
and global genome features for accurate classification. We test the proposed
model on TCGA Pan-cancer dataset and TCGA breast cancer dataset for molecular
subtype and cancer subtype classification, respectively. The proposed model
outperforms four current state-of-the-art baseline models in multiple
evaluation metrics. The comparative analysis of GAT-based models and GCN-based
models reveals that GAT-based models are preferred for smaller graphs with less
information and GCN-based models are preferred for larger graphs with extra
information.Comment: 18 pages, 4 figur
Malware Classification based on Call Graph Clustering
Each day, anti-virus companies receive tens of thousands samples of
potentially harmful executables. Many of the malicious samples are variations
of previously encountered malware, created by their authors to evade
pattern-based detection. Dealing with these large amounts of data requires
robust, automatic detection approaches. This paper studies malware
classification based on call graph clustering. By representing malware samples
as call graphs, it is possible to abstract certain variations away, and enable
the detection of structural similarities between samples. The ability to
cluster similar samples together will make more generic detection techniques
possible, thereby targeting the commonalities of the samples within a cluster.
To compare call graphs mutually, we compute pairwise graph similarity scores
via graph matchings which approximately minimize the graph edit distance. Next,
to facilitate the discovery of similar malware samples, we employ several
clustering algorithms, including k-medoids and DBSCAN. Clustering experiments
are conducted on a collection of real malware samples, and the results are
evaluated against manual classifications provided by human malware analysts.
Experiments show that it is indeed possible to accurately detect malware
families via call graph clustering. We anticipate that in the future, call
graphs can be used to analyse the emergence of new malware families, and
ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding
Agency for Technology and Innovation as part of its ICT SHOK Future Internet
research programme, grant 40212/0
Graph Scaling Cut with L1-Norm for Classification of Hyperspectral Images
In this paper, we propose an L1 normalized graph based dimensionality
reduction method for Hyperspectral images, called as L1-Scaling Cut (L1-SC).
The underlying idea of this method is to generate the optimal projection matrix
by retaining the original distribution of the data. Though L2-norm is generally
preferred for computation, it is sensitive to noise and outliers. However,
L1-norm is robust to them. Therefore, we obtain the optimal projection matrix
by maximizing the ratio of between-class dispersion to within-class dispersion
using L1-norm. Furthermore, an iterative algorithm is described to solve the
optimization problem. The experimental results of the HSI classification
confirm the effectiveness of the proposed L1-SC method on both noisy and
noiseless data.Comment: European Signal Processing Conference 201
Robust Mid-Pass Filtering Graph Convolutional Networks
Graph convolutional networks (GCNs) are currently the most promising paradigm
for dealing with graph-structure data, while recent studies have also shown
that GCNs are vulnerable to adversarial attacks. Thus developing GCN models
that are robust to such attacks become a hot research topic. However, the
structural purification learning-based or robustness constraints-based defense
GCN methods are usually designed for specific data or attacks, and introduce
additional objective that is not for classification. Extra training overhead is
also required in their design. To address these challenges, we conduct in-depth
explorations on mid-frequency signals on graphs and propose a simple yet
effective Mid-pass filter GCN (Mid-GCN). Theoretical analyses guarantee the
robustness of signals through the mid-pass filter, and we also shed light on
the properties of different frequency signals under adversarial attacks.
Extensive experiments on six benchmark graph data further verify the
effectiveness of our designed Mid-GCN in node classification accuracy compared
to state-of-the-art GCNs under various adversarial attack strategies.Comment: Accepted by WWW'2
- …