76,347 research outputs found
Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models
Mixture of Experts (MoE) are successful models for modeling heterogeneous
data in many statistical learning problems including regression, clustering and
classification. Generally fitted by maximum likelihood estimation via the
well-known EM algorithm, their application to high-dimensional problems is
still therefore challenging. We consider the problem of fitting and feature
selection in MoE models, and propose a regularized maximum likelihood
estimation approach that encourages sparse solutions for heterogeneous
regression data models with potentially high-dimensional predictors. Unlike
state-of-the art regularized MLE for MoE, the proposed modelings do not require
an approximate of the penalty function. We develop two hybrid EM algorithms: an
Expectation-Majorization-Maximization (EM/MM) algorithm, and an EM algorithm
with coordinate ascent algorithm. The proposed algorithms allow to
automatically obtaining sparse solutions without thresholding, and avoid matrix
inversion by allowing univariate parameter updates. An experimental study shows
the good performance of the algorithms in terms of recovering the actual sparse
solutions, parameter estimation, and clustering of heterogeneous regression
data
COMBINATION OF LOGISTIC REGRESSION AND SVM ALGORITHM WITH HYBRID PSO AND GA BASED SELECTION FEATURE IN CORONARY HEART DISEASE CLASSIFICATION
The world's high death rate from heart disease requires early prevention by medical doctors to diagnose heart disease early. The machine learning approach makes it possible to predict the risk of developing heart disease by examining certain values at a low cost. This study will contribute to the development of a combination of Logistic Regression and SVM models that integrate SVM and Logistic Regression algorithms by implementing selection features using hybrid PSO and GA methods. The combination concept of Logistic Regression SVM (LRSVM) applied to this study is to reduce the risk of SVM output errors by interpreting and modifying the output of SVM classifiers by the results of Logistic Regression analysis. The test results showed that LRSVM with pso-GA hybrid-based selection feature achieved better performance for coronary heart disease classification with 99.66% accuracy compared to classification accuracy with SVM algorithm without selection featur
A comparison of four data selection methods for artificial neural networks and support vector machines
The performance of data-driven models such as Artificial Neural Networks and Support Vector Machines relies to a good extent on selecting proper data throughout the design phase. This paper addresses a comparison of four unsupervised data selection methods including random, convex hull based, entropy based and a hybrid data selection method. These methods were evaluated on eight benchmarks in classification and regression problems. For classification, Support Vector Machines were used, while for the regression problems, Multi-Layer Perceptrons were employed. Additionally, for each problem type, a non-dominated set of Radial Basis Functions Neural Networks were designed, benefiting from a Multi Objective Genetic Algorithm. The simulation results showed that the convex hull based method and the hybrid method involving convex hull and entropy, obtain better performance than the other methods, and that MOGA designed RBFNNs always perform better than the other models. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.FCT through IDMEC, under LAETA grant [UID/EMS/50022/2013]info:eu-repo/semantics/publishedVersio
Software defect prediction framework based on hybrid metaheuristic optimization methods
A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two
main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature
selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are
evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes
an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered
classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted
experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density
Intelligent Financial Fraud Detection Practices: An Investigation
Financial fraud is an issue with far reaching consequences in the finance
industry, government, corporate sectors, and for ordinary consumers. Increasing
dependence on new technologies such as cloud and mobile computing in recent
years has compounded the problem. Traditional methods of detection involve
extensive use of auditing, where a trained individual manually observes reports
or transactions in an attempt to discover fraudulent behaviour. This method is
not only time consuming, expensive and inaccurate, but in the age of big data
it is also impractical. Not surprisingly, financial institutions have turned to
automated processes using statistical and computational methods. This paper
presents a comprehensive investigation on financial fraud detection practices
using such data mining methods, with a particular focus on computational
intelligence-based techniques. Classification of the practices based on key
aspects such as detection algorithm used, fraud type investigated, and success
rate have been covered. Issues and challenges associated with the current
practices and potential future direction of research have also been identified.Comment: Proceedings of the 10th International Conference on Security and
Privacy in Communication Networks (SecureComm 2014
Modeling, forecasting and trading the EUR exchange rates with hybrid rolling genetic algorithms: support vector regression forecast combinations
The motivation of this paper is to introduce a hybrid Rolling Genetic Algorithm-Support Vector Regression (RG-SVR) model for optimal parameter selection and feature subset combination. The algorithm is applied to the task of forecasting and trading the EUR/USD, EUR/GBP and EUR/JPY exchange rates. The proposed methodology genetically searches over a feature space (pool of individual forecasts) and then combines the optimal feature subsets (SVR forecast combinations) for each exchange rate. This is achieved by applying a fitness function specialized for financial purposes and adopting a sliding window approach. The individual forecasts are derived from several linear and non-linear models. RG-SVR is benchmarked against genetically and non-genetically optimized SVRs and SVMs models that are dominating the relevant literature, along with the robust ARBF-PSO neural network. The statistical and trading performance of all models is investigated during the period of 1999–2012. As it turns out, RG-SVR presents the best performance in terms of statistical accuracy and trading efficiency for all the exchange rates under study. This superiority confirms the success of the implemented fitness function and training procedure, while it validates the benefits of the proposed algorithm
Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.
Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan
- …