181,361 research outputs found

    An Exponential Lower Bound on the Complexity of Regularization Paths

    Full text link
    For a variety of regularized optimization problems in machine learning, algorithms computing the entire solution path have been developed recently. Most of these methods are quadratic programs that are parameterized by a single parameter, as for example the Support Vector Machine (SVM). Solution path algorithms do not only compute the solution for one particular value of the regularization parameter but the entire path of solutions, making the selection of an optimal parameter much easier. It has been assumed that these piecewise linear solution paths have only linear complexity, i.e. linearly many bends. We prove that for the support vector machine this complexity can be exponential in the number of training points in the worst case. More strongly, we construct a single instance of n input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) = \Theta(2^d) many distinct subsets of support vectors occur as the regularization parameter changes.Comment: Journal version, 28 Pages, 5 Figure

    Solution Path Algorithm for Twin Multi-class Support Vector Machine

    Full text link
    The twin support vector machine and its extensions have made great achievements in dealing with binary classification problems, however, which is faced with some difficulties such as model selection and solving multi-classification problems quickly. This paper is devoted to the fast regularization parameter tuning algorithm for the twin multi-class support vector machine. A new sample dataset division method is adopted and the Lagrangian multipliers are proved to be piecewise linear with respect to the regularization parameters by combining the linear equations and block matrix theory. Eight kinds of events are defined to seek for the starting event and then the solution path algorithm is designed, which greatly reduces the computational cost. In addition, only few points are combined to complete the initialization and Lagrangian multipliers are proved to be 1 as the regularization parameter tends to infinity. Simulation results based on UCI datasets show that the proposed method can achieve good classification performance with reducing the computational cost of grid search method from exponential level to the constant level

    Hybridizing PSO With SA for Optimizing SVR Applied to Software Effort Estimation

    Get PDF
    This study investigates Particle Swarm Optimization (PSO) hybridization with Simulated Annealing (SA) to optimize Support Vector Machine (SVR). The optimized SVR is used for software effort estimation. The optimization of SVR consists of two sub-problems that must be solved simultaneously; the first is input feature selection that influences method accuracy and computing time. The next sub-problem is finding optimal SVR parameter that each parameter gives significant impact to method performance. To deal with a huge number of candidate solutions of the problems, a powerful approach is required. The proposed approach takes advantages of good solution quality from PSO and SA. We introduce SA based acceptance rule to accept new position in PSO. The SA parameter selection is introduced to improve the quality as stochastic algorithm is sensitive to its parameter. The comparative works have been between PSO in quality of solution and computing time. According to the results, the proposed model outperforms PSO SVR in quality of solutio

    Using a multi-objective genetic algorithm for SVM construction

    Get PDF
    Support Vector Machines are kernel machines useful for classification and regression problems. In this paper, they are used for non-linear regression of environmental data. From a structural point of view, Support Vector Machines are particular Artificial Neural Networks and their training paradigm has some positive implications. In fact, the original training approach is useful to overcome the curse of dimensionality and too strict assumptions on statistics of the errors in data. Support Vector Machines and Radial Basis Function Regularised Networks are presented within a common structural framework for non-linear regression in order to emphasise the training strategy for support vector machines and to better explain the multi-objective approach in support vector machines' construction. A support vector machine's performance depends on the kernel parameter, input selection and ε-tube optimal dimension. These will be used as decision variables for the evolutionary strategy based on a Genetic Algorithm, which exhibits the number of support vectors, for the capacity of machine, and the fitness to a validation subset, for the model accuracy in mapping the underlying physical phenomena, as objective functions. The strategy is tested on a case study dealing with groundwater modelling, based on time series (past measured rainfalls and levels) for level predictions at variable time horizons

    An Evolutionary Optimization Algorithm for Automated Classical Machine Learning

    Get PDF
    Machine learning is an evolving branch of computational algorithms that allow computers to learn from experiences, make predictions, and solve different problems without being explicitly programmed. However, building a useful machine learning model is a challenging process, requiring human expertise to perform various proper tasks and ensure that the machine learning\u27s primary objective --determining the best and most predictive model-- is achieved. These tasks include pre-processing, feature selection, and model selection. Many machine learning models developed by experts are designed manually and by trial and error. In other words, even experts need the time and resources to create good predictive machine learning models. The idea of automated machine learning (AutoML) is to automate a machine learning pipeline to release the burden of substantial development costs and manual processes. The algorithms leveraged in these systems have different hyper-parameters. On the other hand, different input datasets have various features. In both cases, the final performance of the model is closely related to the final selected configuration of features and hyper-parameters. That is why they are considered as crucial tasks in the AutoML. The challenges regarding the computationally expensive nature of tuning hyper-parameters and optimally selecting features create significant opportunities for filling the research gaps in the AutoML field. This dissertation explores how to select the features and tune the hyper-parameters of conventional machine learning algorithms efficiently and automatically. To address the challenges in the AutoML area, novel algorithms for hyper-parameter tuning and feature selection are proposed. The hyper-parameter tuning algorithm aims to provide the optimal set of hyper-parameters in three conventional machine learning models (Random Forest, XGBoost and Support Vector Machine) to obtain best scores regarding performance. On the other hand, the feature selection algorithm looks for the optimal subset of features to achieve the highest performance. Afterward, a hybrid framework is designed for both hyper-parameter tuning and feature selection. The proposed framework can discover close to the optimal configuration of features and hyper-parameters. The proposed framework includes the following components: (1) an automatic feature selection component based on artificial bee colony algorithms and machine learning training, and (2) an automatic hyper-parameter tuning component based on artificial bee colony algorithms and machine learning training for faster training and convergence of the learning models. The whole framework has been evaluated using four real-world datasets in different applications. This framework is an attempt to alleviate the challenges of hyper-parameter tuning and feature selection by using efficient algorithms. However, distributed processing, distributed learning, parallel computing, and other big data solutions are not taken into consideration in this framework

    Optimasi Parameter Pada Support Vector Machine Menggunakan Pendekatan Metode Taguchi Untuk Data High-Dimensional

    Get PDF
    Support vector machine (SVM) merupakan salah satu metode unggulan dari machine learning yang memiliki hasil yang baik dalam hal klasifikasi dan prediksi. Prinsip dari metode SVM adalah melatih sekumpulan data klasifikasi dengan suatu algoritma untuk menghasilkan model klasifikasi yang dapat membantu dalam memprediksi kategori dari data baru. SVM memiliki banyak kelebihan dalam hal klasifikasi, namun masih terdapat beberapa kendala diantaranya dalam pemilihan parameter optimal dari SVM. Adapun pengaruh dari pemberian parameter optimal dapat meningkatkan nilai akurasi klasifikasi. Oleh karena itu, penggunaan metode pemilihan parameter optimal seperti grid search, Taguchi dan sebagainya perlu digunakan untuk memperoleh parameter optimal. Permasalahan lainnya terkait dengan banyaknya jumlah fitur yang menyebabkan proses komputasi menjadi kurang efisien sehingga perlu dilakukan pemilihan fitur terbaik. Pada penelitian ini, metode pemilihan parameter yang digunakan adalah metode Taguchi sedangkan metode pemilihan feature-nya menggunakan FCBF yang diterapkan pada data high-dimensional. Hasil yang diperoleh menunjukkan bahwa pemilihan parameter optimal dengan menggunakan pendekatan metode Taguchi memberikan tingkat akurasi yang meningkat secara signifikan dan waktu proses komputasi lebih efisien jika dibandingkan dengan menggunakan metode grid search. ======================================================================================= Support vector machine (SVM) is one of superior machine learning method with great results in classification and prediction. The principle of SVM is as follows: given set of classified data is trained by algorithm to obtain a set of classification models which can help to predict the category of newdata. SVM has some advantage in terms of classification, however still has problems that must be considered, one of them is related to select the optimal parameter of SVM. Effect giving optimal parameters can improve the classification accuracy. Hence, the uses of selection method of optimal parameter as grid search and Taguchi approach is needed to be applied to obtain optimal parameters. In addition, computing process becomes less efficient is caused by large number of features so best feature selection also needed to do. In this research, Method that used to select the optimal parameter is Taguchi Method while for feature selection is FCBF where will applied in high-dimensional data. The results show that selection of optimal parameters were obtained by using Taguchi approach is significantly increase the accuracy rate and make more efficient for computing process when compared by using grid search method

    KLASIFIKASI MASSA PADA CITRA MAMMOGRAM MENGGUNAKAN KOMBINASI SELEKSI FITUR F-SCORE DAN LS-SVM

    Get PDF
    ABSTRAKKanker payudara adalah penyakit yang paling umum diderita oleh perempuan pada banyak negara. Pemeriksaan kanker payudara dapat dilakukan menggunakan citra Mammogram dengan teknologi sistem Computer-Aided Detection (CAD). Analisis CAD yang telah dikembangkan adalah ekstraksi fitur GLCM, reduksi/seleksi fitur, dan SVM. Pada SVM (Support Vector Machine) maupun LS-SVM (Least Square Support Vector Machine) terdapat tiga masalah yang muncul, yaitu: Bagaimana memilih fungsi kernel, berapa jumlah fitur input yang dioptimalkan, dan bagaimana menentukan parameter kernel terbaik. Jumlah fitur dan nilai parameter kernel yang diperlukan saling mempengaruhi, sehingga seleksi fitur diperlukan dalam membangun sistem klasifikasi. Pada penelitian ini bertujuan untuk mengklasifikasi massa pada citra Mammogram berdasarkan dua kelas yaitu kelas kanker jinak dan kelas kanker ganas. Ekstraksi fitur menggunakan Gray Level Co-occurrence Matrix (GLCM). Hasil proses ekstraksi fitur tersebut kemudian diseleksi mengunakan metode F-Score. F-Score diperoleh dengan menghitung nilai diskriminan data hasil ekstraksi fitur di antara data dua kelas pada data training. Nilai F-Score masing-masing fitur kemudian diurutkan secara descending. Hasil pengurutan tersebut digunakan untuk membuat kombinasi fitur. Kombinasi fitur tersebut digunakan sebagai input LS-SVM. Dari hasil uji coba penelitian ini didapatkan, bahwa menggunakan kombinasi seleksi fitur sangat berpengaruh terhadap tingkat akurasi. Akurasi terbaik didapat dengan menggunakan LS-SVM RBF dan SVM RBF baik dengan kombinasi seleksi fitur, maupun tanpa kombinasi seleksi fitur dengan nilai akurasi yaitu 97,5%. Selain itu juga seleksi fitur mampu mengurangi waktu komputasi.Kata Kunci: F-Score, GLCM, kanker payudara, LS-SVM.ABSTRACTBreast cancer is the most common disease suffered by women in many countries. Breast cancer screening can be done using a mammogram image. Computer-aided detection system (CAD). CAD analysis that has been developed is GLCM efficient feature extraction, reduction / feature selection and SVM. In SVM (Support Vector Machine) and LS-SVM (Support Vector Machine Square least) there are three problems that arise, namely; how to choose the kernel function, how many input fea-tures are optimal, and how to determine the best kernel parameters. The number of fea-tures and value required kernel parameters affect each other, so that the selection of the features needed to build a system of classification. In this study aims to classify image of masses on digital mammography based on two classes benign cancer and malignant cancer. Feature extraction using gray level co-occurrence matrix (GLCM). The results of the feature extraction process then selected using the method F-Score. F-Score is obtained by calculating the value of the discriminant feature extraction results data between two classes of data in the data training. Value F-Score of each feature and then sorted in descending order. The sequenc-ing results are used to make the combination of fea-tures. The combination of these features are used as input LS-SVM. From the experiments that use a combination of feature selection affects the accuracy ting-kat. Best accuracy obtained using LS-SVM and SVM RBF RBF with combi-nation or without the combination of feature selection with accuracy value is 97.5%. It also features a selection able to curate the computa-tion time.Keywords: Breast Cancer, F-Score, GLCM, LS-SVM
    • …
    corecore