127,709 research outputs found

    Feature Selection and Classification Pairwise Combinations for High-dimensional Tumour Biomedical Datasets

    Get PDF
    This paper concerns classification of high-dimensional yet small sample size biomedical data and feature selection aimed at reducing dimensionality of the microarray data. The research presents a comparison of pairwise combinations of six classification strategies, including decision trees, logistic model trees, Bayes network, Na¨ıve Bayes, k-nearest neighbours and sequential minimal optimization algorithm for training support vector machines, as well as seven attribute selection methods: Correlation-based Feature Selection, chi-squared, information gain, gain ratio, symmetrical uncertainty, ReliefF and SVM-RFE (Support Vector Machine-Recursive Feature Elimination). In this paper, SVMRFE feature selection technique combined with SMO classifier has demonstrated its potential ability to accurately and efficiently classify both binary and multiclass high-dimensional sets of tumour specimens

    Structured variable selection in support vector machines

    Get PDF
    When applying the support vector machine (SVM) to high-dimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to perform automatic variable selection. In some problems, there is a natural hierarchical structure among the variables. Thus, in order to have an interpretable SVM classifier, it is important to respect the heredity principle when enforcing the sparsity in the SVM. Many variable selection methods, however, do not respect the heredity principle. In this paper we enforce both sparsity and the heredity principle in the SVM by using the so-called structured variable selection (SVS) framework originally proposed in Yuan, Joseph and Zou (2007). We minimize the empirical hinge loss under a set of linear inequality constraints and a lasso-type penalty. The solution always obeys the desired heredity principle and enjoys sparsity. The new SVM classifier can be efficiently fitted, because the optimization problem is a linear program. Another contribution of this work is to present a nonparametric extension of the SVS framework, and we propose nonparametric heredity SVMs. Simulated and real data are used to illustrate the merits of the proposed method.Comment: Published in at http://dx.doi.org/10.1214/07-EJS125 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Modeling Financial Time Series with Artificial Neural Networks

    Full text link
    Financial time series convey the decisions and actions of a population of human actors over time. Econometric and regressive models have been developed in the past decades for analyzing these time series. More recently, biologically inspired artificial neural network models have been shown to overcome some of the main challenges of traditional techniques by better exploiting the non-linear, non-stationary, and oscillatory nature of noisy, chaotic human interactions. This review paper explores the options, benefits, and weaknesses of the various forms of artificial neural networks as compared with regression techniques in the field of financial time series analysis.CELEST, a National Science Foundation Science of Learning Center (SBE-0354378); SyNAPSE program of the Defense Advanced Research Project Agency (HR001109-03-0001

    European exchange trading funds trading with locally weighted support vector regression

    Get PDF
    In this paper, two different Locally Weighted Support Vector Regression (wSVR) algorithms are generated and applied to the task of forecasting and trading five European Exchange Traded Funds. The trading application covers the recent European Monetary Union debt crisis. The performance of the proposed models is benchmarked against traditional Support Vector Regression (SVR) models. The Radial Basis Function, the Wavelet and the Mahalanobis kernel are explored and tested as SVR kernels. Finally, a novel statistical SVR input selection procedure is introduced based on a principal component analysis and the Hansen, Lunde, and Nason (2011) model confidence test. The results demonstrate the superiority of the wSVR models over the traditional SVRs and of the v-SVR over the ε-SVR algorithms. We note that the performance of all models varies and considerably deteriorates in the peak of the debt crisis. In terms of the kernels, our results do not confirm the belief that the Radial Basis Function is the optimum choice for financial series

    Variable selection and statistical learning for censored data

    Get PDF
    This dissertation focuses on (1) developing an efficient variable selection method for a class of general transformation models; (2) developing a support vector based method for predicting failure times allowing the coarsening at random assumption for the censoring distribution; (3) developing a statistical learning method for predicting recurrent events. In the first topic, we propose a computationally simple method for variable selection in a general class of transformation models with right-censored survival data. The proposed algorithm reduces to maximizing a weighted partial likelihood function within an adaptive lasso framework. We establish the asymptotic properties for the proposed method, including selection consistency and semiparametric efficiency of parameter estimators. We conduct simulation studies to investigate the small-sample performance. We apply the method to data sets from a primary biliary cirrhosis study and the Atherosclerosis Risk in Communities (ARIC) Study, and demonstrate its superior prediction performance as compared to existing risk scores. In the second topic, we develop a novel support vector hazard regression approach for predicting survival outcomes. Our method adapts support vector machines to predict dichotomous outcomes of the counting processes among subjects at risk, and allows censoring times to depend on covariates without modeling the censoring distribution. The formulation can be solved conveniently using any convex quadratic programming package. Theoretically, we show that the decision rule is equivalent to maximizing the discrimination power based on hazard functions, and establish the consistency and learning rate of the predicted risk. Numerical experiments demonstrate a superior performance of the proposed method to existing learning methods. Real data examples from a study of Huntington's disease and the ARIC Study are used to illustrate the proposed method. In the third topic, we adapt support vector machines in the context of the counting process to handle time-varying covariates and predict recurrent events. We conduct extensive simulation studies to compare performances of the proposed method to the Andersen and Gill proportional intensity model for the prediction of multiple recurrences. The extension of theoretical properties is described. We illustrate the proposed method by analyzing the data set from a bladder cancer study.Doctor of Philosoph

    Modelling tourism demand to Spain with machine learning techniques. The impact of forecast horizon on model selection

    Get PDF
    This study assesses the influence of the forecast horizon on the forecasting performance of several machine learning techniques. We compare the fo recastaccuracy of Support Vector Regression (SVR) to Neural Network (NN) models, using a linear model as a benchmark. We focus on international tourism demand to all seventeen regions of Spain. The SVR with a Gaussian radial basis function kernel outperforms the rest of the models for the longest forecast horizons. We also find that machine learning methods improve their forecasting accuracy with respect to linear models as forecast horizons increase. This results shows the suitability of SVR for medium and long term forecasting.Peer ReviewedPostprint (published version

    Clustering of the AKARI NEP deep field 24<i>μ</i>m selected galaxies

    Get PDF
    Aims. We present a method of selection of 24 μm galaxies from the AKARI north ecliptic pole (NEP) deep field down to 150 μJy and measurements of their two-point correlation function. We aim to associate various 24 μm selected galaxy populations with present day galaxies and to investigate the impact of their environment on the direction of their subsequent evolution. Methods. We discuss using of Support Vector Machines (SVM) algorithm applied to infrared photometric data to perform star-galaxy separation, in which we achieve an accuracy higher than 80%. The photometric redshift information, obtained through the CIGALE code, is used to explore the redshift dependence of the correlation function parameter (r0) as well as the linear bias evolution. This parameter relates galaxy distribution to the one of the underlying dark matter. We connect the investigated sources to their potential local descendants through a simplified model of the clustering evolution without interactions. Results. We observe two different populations of star-forming galaxies, at zmed ∼ 0.25, zmed ∼ 0.9. Measurements of total infrared luminosities (LTIR) show that the sample at zmed ∼ 0.25 is composed mostly of local star-forming galaxies, while the sample at zmed ∼ 0.9 is composed of luminous infrared galaxies (LIRGs) with LTIR ∼ 1011.62 L⨀. We find that dark halo mass is not necessarily correlated with the LTIR: for subsamples with LTIR = 1011.15 L⨀ at zmed ∼ 0.7 we observe a higher clustering length (r0 = 6.21 ± 0.78 [h−1Mpc]) than for a subsample with mean LTIR = 1011.84 L⨀ at zmed ∼ 1.1 (r0 = 5.86 ± 0.69 h−1Mpc). We find that galaxies at zmed ∼ 0.9 can be ancestors of present day L∗ early type galaxies, which exhibit a very high r0 ∼ 8h−1 Mpc.</p
    • …
    corecore