Search CORE

1,402 research outputs found

Density based pruning for identification of differentially expressed genes from microarray data

Author: Hu Jianjun
Xu Jia
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Motivation Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes. Results We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change. Conclusions Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website <url>http://mleg.cse.sc.edu/degprune</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Scholar Commons - Institutional Repository of the University of South Carolina

A State-Space Modeling Approach and Subspace Identification Method for Predictive Control of Multi-Zone Buildings with Mixed-Mode Cooling

Author: Hu Jianjun
Karava Panagiota
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2014
Field of study

The paper presents a control-oriented modeling approach for multi-zone buildings with mixed-mode (MM) cooling that incorporates their mode switching behavior. A forward state-space representation with time-varying system matrices is presented and used for establishing a detailed prediction model of a multi-zone MM building. The linear time-variant state-space (LTV-SS) model, which is considered as a true representation of the building, is used for developing data-driven linear time-invariant state-space models based on the subspace identification algorithm. The simplified black-box model can successfully capture the switching behavior of the MM building with the RMSE of 0.64 ºC

Purdue E-Pubs

SeqNLS: Nuclear Localization Signal Prediction Based on Frequent Pattern Mining and Linear Motif Scoring

Author: Hu Jianjun
Lin J.-R.
Publication venue: Scholar Commons
Publication date: 01/01/2013
Field of study

Nuclear localization signals (NLSs) are stretches of residues in proteins mediating their importing into the nucleus. NLSs are known to have diverse patterns, of which only a limited number are covered by currently known NLS motifs. Here we propose a sequential pattern mining algorithm SeqNLS to effectively identify potential NLS patterns without being constrained by the limitation of current knowledge of NLSs. The extracted frequent sequential patterns are used to predict NLS candidates which are then filtered by a linear motif-scoring scheme based on predicted sequence disorder and by the relatively local conservation (IRLC) based masking. The experiment results on the newly curated Yeast and Hybrid datasets show that SeqNLS is effective in detecting potential NLSs. The performance comparison between SeqNLS with and without the linear motif scoring shows that linear motif features are highly complementary to sequence features in discerning NLSs. For the two independent datasets, our SeqNLS not only can consistently find over 50% of NLSs with prediction precision of at least 0.7, but also outperforms other state-of-the-art NLS prediction methods in terms of F1 score or prediction precision with similar or higher recall rates. The web server of the SeqNLS algorithm is available at http://mleg.cse.sc.edu/seqNLS

Scholar Commons - Institutional Repository of the University of South Carolina

ASPIE: A Framework for Active Sensing and Processing of Complex Events in the Internet of Manufacturing Things

Author: Chen Weixing
Hu Jianjun
Hu Jie
Li Shaobo
Publication venue: Scholar Commons
Publication date: 01/03/2018
Field of study

Rapid perception and processing of critical monitoring events are essential to ensure healthy operation of Internet of Manufacturing Things (IoMT)-based manufacturing processes. In this paper, we proposed a framework (active sensing and processing architecture (ASPIE)) for active sensing and processing of critical events in IoMT-based manufacturing based on the characteristics of IoMT architecture as well as its perception model. A relation model of complex events in manufacturing processes, together with related operators and unified XML-based semantic definitions, are developed to effectively process the complex event big data. A template based processing method for complex events is further introduced to conduct complex event matching using the Apriori frequent item mining algorithm. To evaluate the proposed models and methods, we developed a software platform based on ASPIE for a local chili sauce manufacturing company, which demonstrated the feasibility and effectiveness of the proposed methods for active perception and processing of complex events in IoMT-based manufacturing

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Scholar Commons - Institutional Repository of the University of South Carolina

A Hierarchical Feature Extraction Model for Multi-Label Mechanical Patent Classification

Author: Hu Jianjun
Hu Jie
Li Shaobo
Yang Guanci
Publication venue: Scholar Commons
Publication date: 01/01/2018
Field of study

Various studies have focused on feature extraction methods for automatic patent classification in recent years. However, most of these approaches are based on the knowledge from experts in related domains. Here we propose a hierarchical feature extraction model (HFEM) for multi-label mechanical patent classification, which is able to capture both local features of phrases as well as global and temporal semantics. First, a n-gram feature extractor based on convolutional neural networks (CNNs) is designed to extract salient local lexical-level features. Next, a long dependency feature extraction model based on the bidirectional long–short-term memory (BiLSTM) neural network model is proposed to capture sequential correlations from higher-level sequence representations. Then the HFEM algorithm and its hierarchical feature extraction architecture are detailed. We establish the training, validation and test datasets, containing 72,532, 18,133, and 2679 mechanical patent documents, respectively, and then check the performance of HFEMs. Finally, we compared the results of the proposed HFEM and three other single neural network models, namely CNN, long–short-term memory (LSTM), and BiLSTM. The experimental results indicate that our proposed HFEM outperforms the other compared models in both precision and recall

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Scholar Commons - Institutional Repository of the University of South Carolina

Improving Rolling Bearing Fault Diagnosis by DS Evidence Theory Based Fusion Model

Author: Hu Jianjun
Li Shaobo
Yao Xuemei
Publication venue: Scholar Commons
Publication date: 01/01/2017
Field of study

Rolling bearing plays an important role in rotating machinery and its working condition directly affects the equipment efficiency. While dozens of methods have been proposed for real-time bearing fault diagnosis and monitoring, the fault classification accuracy of existing algorithms is still not satisfactory. This work presents a novel algorithm fusion model based on principal component analysis and Dempster-Shafer evidence theory for rolling bearing fault diagnosis. It combines the advantages of the learning vector quantization (LVQ) neural network model and the decision tree model. Experiments under three different spinning bearing speeds and two different crack sizes show that our fusion model has better performance and higher accuracy than either of the base classification models for rolling bearing fault diagnosis, which is achieved via synergic prediction from both types of models

Directory of Open Access Journals

Scholar Commons - Institutional Repository of the University of South Carolina