260 research outputs found
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement
and reduction to overcome the drawbacks associated with the visual BOW model
which has been widely used for image classification. Although very influential
in the literature, the traditional visual BOW model has two distinct drawbacks.
Firstly, for efficiency purposes, the visual vocabulary is commonly constructed
by directly clustering the low-level visual feature vectors extracted from
local keypoints, without considering the high-level semantics of images. That
is, the visual BOW model still suffers from the semantic gap, and thus may lead
to significant performance degradation in more challenging tasks (e.g. social
image classification). Secondly, typically thousands of visual words are
generated to obtain better performance on a relatively large image dataset. Due
to such large vocabulary size, the subsequent image classification may take
sheer amount of time. To overcome the first drawback, we develop a graph-based
method for visual BOW refinement by exploiting the tags (easy to access
although noisy) of social images. More notably, for efficient image
classification, we further reduce the refined visual BOW model to a much
smaller size through semantic spectral clustering. Extensive experimental
results show the promising performance of the proposed framework for visual BOW
refinement and reduction
Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network
Protein-protein interaction (PPI) network analysis presents an essential role in understanding the functional relationship among proteins in a living biological system. Despite the success of current approaches for understanding the PPI network, the large fraction of missing and spurious PPIs and a low coverage of complete PPI network are the sources of major concern. In this paper, based on the diffusion process, we propose a new concept of global geometric affinity and an accompanying computational scheme to filter the uncertain PPIs, namely, reduce the spurious PPIs and recover the missing PPIs in the network. The main concept defines a diffusion process in which all proteins simultaneously participate to define a similarity metric (global geometric affinity (GGA)) to robustly reflect the internal connectivity among proteins. The robustness of the GGA is attributed to propagating the local connectivity to a global representation of similarity among proteins in a diffusion process. The propagation process is extremely fast as only simple matrix products are required in this computation process and thus our method is geared toward applications in high-throughput PPI networks. Furthermore, we proposed two new approaches that determine the optimal geometric scale of the PPI network and the optimal threshold for assigning the PPI from the GGA matrix. Our approach is tested with three protein-protein interaction networks and performs well with significant random noises of deletions and insertions in true PPIs. Our approach has the potential to benefit biological experiments, to better characterize network data sets, and to drive new discoveries
Novel modelling of clustering for enhanced classification performance on gene expression data
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expression data. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches
Construction of embedded fMRI resting state functional connectivity networks using manifold learning
We construct embedded functional connectivity networks (FCN) from benchmark
resting-state functional magnetic resonance imaging (rsfMRI) data acquired from
patients with schizophrenia and healthy controls based on linear and nonlinear
manifold learning algorithms, namely, Multidimensional Scaling (MDS), Isometric
Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore, based on key global
graph-theoretical properties of the embedded FCN, we compare their
classification potential using machine learning techniques. We also assess the
performance of two metrics that are widely used for the construction of FCN
from fMRI, namely the Euclidean distance and the lagged cross-correlation
metric. We show that the FCN constructed with Diffusion Maps and the lagged
cross-correlation metric outperform the other combinations
EPAK: A Computational Intelligence Model for 2-level Prediction of Stock Indices
This paper proposes a new computational intelligence model for predicting univariate time series, called EPAK, and a complex prediction model for stock market index synthesizing all the sector index predictions using EPAK as a kernel. The EPAK model uses a complex nonlinear feature extraction procedure integrating a forward rolling Empirical Mode Decomposition (EMD) for financial time series signal analysis and Principal Component Analysis (PCA) for dimension reduction to generate information-rich features as input to a new two-layer K-Nearest Neighbor (KNN) with Affinity Propagation (AP) clustering for prediction via regression. The EPAK model is then used as a kernel for predicting each of all the sector indices of the stock market. The sector indices predictions are then synthesized via weighted average to generate the prediction of the stock market index, yielding a complex prediction model for the stock market index. The EPAK model and the complex prediction model for stock index are tested on real historical financial time series in Chinese stock index including CSI 300 and ten sector indices, with results confirming the effectiveness of the proposed models
Multi-Source Data Fusion for Cyberattack Detection in Power Systems
Cyberattacks can cause a severe impact on power systems unless detected
early. However, accurate and timely detection in critical infrastructure
systems presents challenges, e.g., due to zero-day vulnerability exploitations
and the cyber-physical nature of the system coupled with the need for high
reliability and resilience of the physical system. Conventional rule-based and
anomaly-based intrusion detection system (IDS) tools are insufficient for
detecting zero-day cyber intrusions in the industrial control system (ICS)
networks. Hence, in this work, we show that fusing information from multiple
data sources can help identify cyber-induced incidents and reduce false
positives. Specifically, we present how to recognize and address the barriers
that can prevent the accurate use of multiple data sources for fusion-based
detection. We perform multi-source data fusion for training IDS in a
cyber-physical power system testbed where we collect cyber and physical side
data from multiple sensors emulating real-world data sources that would be
found in a utility and synthesizes these into features for algorithms to detect
intrusions. Results are presented using the proposed data fusion application to
infer False Data and Command injection-based Man-in- The-Middle (MiTM) attacks.
Post collection, the data fusion application uses time-synchronized merge and
extracts features followed by pre-processing such as imputation and encoding
before training supervised, semi-supervised, and unsupervised learning models
to evaluate the performance of the IDS. A major finding is the improvement of
detection accuracy by fusion of features from cyber, security, and physical
domains. Additionally, we observed the co-training technique performs at par
with supervised learning methods when fed with our features
- โฆ