260 research outputs found

    Image classification by visual bag-of-words refinement and reduction

    Full text link
    This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification. Although very influential in the literature, the traditional visual BOW model has two distinct drawbacks. Firstly, for efficiency purposes, the visual vocabulary is commonly constructed by directly clustering the low-level visual feature vectors extracted from local keypoints, without considering the high-level semantics of images. That is, the visual BOW model still suffers from the semantic gap, and thus may lead to significant performance degradation in more challenging tasks (e.g. social image classification). Secondly, typically thousands of visual words are generated to obtain better performance on a relatively large image dataset. Due to such large vocabulary size, the subsequent image classification may take sheer amount of time. To overcome the first drawback, we develop a graph-based method for visual BOW refinement by exploiting the tags (easy to access although noisy) of social images. More notably, for efficient image classification, we further reduce the refined visual BOW model to a much smaller size through semantic spectral clustering. Extensive experimental results show the promising performance of the proposed framework for visual BOW refinement and reduction

    Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network

    Get PDF
    Protein-protein interaction (PPI) network analysis presents an essential role in understanding the functional relationship among proteins in a living biological system. Despite the success of current approaches for understanding the PPI network, the large fraction of missing and spurious PPIs and a low coverage of complete PPI network are the sources of major concern. In this paper, based on the diffusion process, we propose a new concept of global geometric affinity and an accompanying computational scheme to filter the uncertain PPIs, namely, reduce the spurious PPIs and recover the missing PPIs in the network. The main concept defines a diffusion process in which all proteins simultaneously participate to define a similarity metric (global geometric affinity (GGA)) to robustly reflect the internal connectivity among proteins. The robustness of the GGA is attributed to propagating the local connectivity to a global representation of similarity among proteins in a diffusion process. The propagation process is extremely fast as only simple matrix products are required in this computation process and thus our method is geared toward applications in high-throughput PPI networks. Furthermore, we proposed two new approaches that determine the optimal geometric scale of the PPI network and the optimal threshold for assigning the PPI from the GGA matrix. Our approach is tested with three protein-protein interaction networks and performs well with significant random noises of deletions and insertions in true PPIs. Our approach has the potential to benefit biological experiments, to better characterize network data sets, and to drive new discoveries

    Novel modelling of clustering for enhanced classification performance on gene expression data

    Get PDF
    Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expression data. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches

    Construction of embedded fMRI resting state functional connectivity networks using manifold learning

    Full text link
    We construct embedded functional connectivity networks (FCN) from benchmark resting-state functional magnetic resonance imaging (rsfMRI) data acquired from patients with schizophrenia and healthy controls based on linear and nonlinear manifold learning algorithms, namely, Multidimensional Scaling (MDS), Isometric Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore, based on key global graph-theoretical properties of the embedded FCN, we compare their classification potential using machine learning techniques. We also assess the performance of two metrics that are widely used for the construction of FCN from fMRI, namely the Euclidean distance and the lagged cross-correlation metric. We show that the FCN constructed with Diffusion Maps and the lagged cross-correlation metric outperform the other combinations

    EPAK: A Computational Intelligence Model for 2-level Prediction of Stock Indices

    Get PDF
    This paper proposes a new computational intelligence model for predicting univariate time series, called EPAK, and a complex prediction model for stock market index synthesizing all the sector index predictions using EPAK as a kernel. The EPAK model uses a complex nonlinear feature extraction procedure integrating a forward rolling Empirical Mode Decomposition (EMD) for financial time series signal analysis and Principal Component Analysis (PCA) for dimension reduction to generate information-rich features as input to a new two-layer K-Nearest Neighbor (KNN) with Affinity Propagation (AP) clustering for prediction via regression. The EPAK model is then used as a kernel for predicting each of all the sector indices of the stock market. The sector indices predictions are then synthesized via weighted average to generate the prediction of the stock market index, yielding a complex prediction model for the stock market index. The EPAK model and the complex prediction model for stock index are tested on real historical financial time series in Chinese stock index including CSI 300 and ten sector indices, with results confirming the effectiveness of the proposed models

    Multi-Source Data Fusion for Cyberattack Detection in Power Systems

    Full text link
    Cyberattacks can cause a severe impact on power systems unless detected early. However, accurate and timely detection in critical infrastructure systems presents challenges, e.g., due to zero-day vulnerability exploitations and the cyber-physical nature of the system coupled with the need for high reliability and resilience of the physical system. Conventional rule-based and anomaly-based intrusion detection system (IDS) tools are insufficient for detecting zero-day cyber intrusions in the industrial control system (ICS) networks. Hence, in this work, we show that fusing information from multiple data sources can help identify cyber-induced incidents and reduce false positives. Specifically, we present how to recognize and address the barriers that can prevent the accurate use of multiple data sources for fusion-based detection. We perform multi-source data fusion for training IDS in a cyber-physical power system testbed where we collect cyber and physical side data from multiple sensors emulating real-world data sources that would be found in a utility and synthesizes these into features for algorithms to detect intrusions. Results are presented using the proposed data fusion application to infer False Data and Command injection-based Man-in- The-Middle (MiTM) attacks. Post collection, the data fusion application uses time-synchronized merge and extracts features followed by pre-processing such as imputation and encoding before training supervised, semi-supervised, and unsupervised learning models to evaluate the performance of the IDS. A major finding is the improvement of detection accuracy by fusion of features from cyber, security, and physical domains. Additionally, we observed the co-training technique performs at par with supervised learning methods when fed with our features
    • โ€ฆ
    corecore