Search CORE

123,523 research outputs found

A robust model structure selection method for small sample size and multiple datasets problems

Author: Balikhin
Bartels
Bigg
Billings
Billings
Billings
Billings
Boynton
Bustince
Chai
Chen
Chen
Chen
Chen
Christina
Gu
Guo
Haykin
Hua-Liang Wei
Li
Marshall
Robinson
Russell
Solares
Wang
Wang
Wei
Wei
Wei
Wei
Wei
Wei
Wei
Wing
Yin
Yuanlin Gu
Zadeh
Zhang
Zhao
Publication venue: 'Elsevier BV'
Publication date: 07/04/2018
Field of study

In model identification, the existence of uncertainty normally generates negative impact on the accuracy and performance of the identified models, especially when the size of data used is rather small. With a small data set, least squares estimates are biased, the resulting models may not be reliable for further analysis and future use. This study introduces a novel robust model structure selection method for model identification. The proposed method can successfully reduce the model structure uncertainty and therefore improve the model performances. Case studies on simulation data and real data are presented to illustrate how the proposed metric works for robust model identification

Roehampton University Research Repository

Crossref

White Rose Research Online

Superpixel-based Two-view Deterministic Fitting for Multiple-structure Data

Author: AS Brahmachari
DG Lowe
E Serradell
H Isack
H Wang
J Shen
MA Fischler
PE Hart
Pushmeet Kohli
QH Tran
R Achanta
R Tennakoon
S Mittal
Publication venue
Publication date: 01/01/2016
Field of study

This paper proposes a two-view deterministic geometric model fitting method, termed Superpixel-based Deterministic Fitting (SDF), for multiple-structure data. SDF starts from superpixel segmentation, which effectively captures prior information of feature appearances. The feature appearances are beneficial to reduce the computational complexity for deterministic fitting methods. SDF also includes two original elements, i.e., a deterministic sampling algorithm and a novel model selection algorithm. The two algorithms are tightly coupled to boost the performance of SDF in both speed and accuracy. Specifically, the proposed sampling algorithm leverages the grouping cues of superpixels to generate reliable and consistent hypotheses. The proposed model selection algorithm further makes use of desirable properties of the generated hypotheses, to improve the conventional fit-and-remove framework for more efficient and effective performance. The key characteristic of SDF is that it can efficiently and deterministically estimate the parameters of model instances in multi-structure data. Experimental results demonstrate that the proposed SDF shows superiority over several state-of-the-art fitting methods for real images with single-structure and multiple-structure data.Comment: Accepted by European Conference on Computer Vision (ECCV

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach

Author: Aguirre LA
Billings SA
Brown M
Chen S
Cherkassky V
Devijver PA
H.L. Wei
Hansen LK
Ljung L
Ljung L
Montgomery DC
Murray-Smith R
Pearson RK
S.A. Billings
Shao J
Shao J
Stone M
Tsang KM
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance

Crossref

White Rose Research Online

Improved model identification for nonlinear systems using a random subsampling and multifold modelling (RSMM) approach

Author: Billings S.A.
Wei H.L.
Publication venue: Automatic Control and Systems Engineering, University of Sheffield
Publication date: 01/09/2007
Field of study

In nonlinear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of ‘hold-out’ or ‘split-sample’ data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. Firstly, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance

White Rose Research Online

Robust variable screening for regression using factor profiling

Author: Van Aelst Stefan
Wang Yixin
Publication venue
Publication date: 14/11/2018
Field of study

Sure Independence Screening is a fast procedure for variable selection in ultra-high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue, Factor Profiled Sure Independence Screening (FPSIS) models the correlation structure of the predictor variables, assuming that it can be represented by a few latent factors. The correlations can then be profiled out by projecting the data onto the orthogonal complement of the subspace spanned by these factors. However, neither of these methods can handle the presence of outliers in the data. Therefore, we propose a robust screening method which uses a least trimmed squares method to estimate the latent factors and the factor profiled variables. Variable screening is then performed on factor profiled variables by using regression MM-estimators. Different types of outliers in this model and their roles in variable screening are studied. Both simulation studies and a real data analysis show that the proposed robust procedure has good performance on clean data and outperforms the two nonrobust methods on contaminated data

arXiv.org e-Print Archive

Lirias

Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

Author: Chen Yun-Mei
Tian Li
Wu Wen-Ming
Xu Shuang
Yang Li-Jun
Yang Xiao-Hui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is firstly proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table

arXiv.org e-Print Archive

Crossref

Block-diagonal covariance selection for high-dimensional Gaussian graphical models

Author: Devijver Emilie
Gallopin Mélina
Publication venue
Publication date: 11/11/2015
Field of study

Gaussian graphical models are widely utilized to infer and visualize networks of dependencies between continuous variables. However, inferring the graph is difficult when the sample size is small compared to the number of variables. To reduce the number of parameters to estimate in the model, we propose a non-asymptotic model selection procedure supported by strong theoretical guarantees based on an oracle inequality and a minimax lower bound. The covariance matrix of the model is approximated by a block-diagonal matrix. The structure of this matrix is detected by thresholding the sample covariance matrix, where the threshold is selected using the slope heuristic. Based on the block-diagonal structure of the covariance matrix, the estimation problem is divided into several independent problems: subsequently, the network of dependencies between variables is inferred using the graphical lasso algorithm in each block. The performance of the procedure is illustrated on simulated data. An application to a real gene expression dataset with a limited sample size is also presented: the dimension reduction allows attention to be objectively focused on interactions among smaller subsets of genes, leading to a more parsimonious and interpretable modular network.Comment: Accepted in JAS

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

FigShare