Search CORE

11 research outputs found

Ensembles of wrappers for automated feature selection in fish age classification

Author: Bermejo Sánchez Sergi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A low variance error boosting algorithm

Author: Hunter Andrew
Wang Ching-Wei
Publication venue: Springer Netherlands
Publication date: 21/02/2009
Field of study

This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered

University of Lincoln Institutional Repository

CiteSeerX

SUPERVISED QR ALGORITHM TO ESTIMATE THE FINANCIAL PRODUCT DATA PROCESS TIME ANALYSIS BASED ON ONLINE PURCHASE DETAILS

Author: Dhananachezhiyan R
Publication venue: JConsort
Publication date: 28/10/2019
Field of study

TheSupervised QR Algorithm is thus used to estimate the financial production data recipe and time fraction based on the online purchase details. The trading algorithm method used before has a very low rating, and the recipe does not clearly explain the financial and time-related processes. For that, are going to use the QR Method in this area. The method of doing this is very helpful for easy analysis of financial and Times related information analysis. One of the most important in an industry is the financial sector.Moreover, analysis is very important to find the problems in them. Analysis can correct various errors in practice. It shows us information on how to raise funds and improve funding in online purchases. Modern variations in his generating and increasing data and creating a parallel account make many changes simultaneously. These are very helpful in adjusting the financial position of the industry to suit the financial situation

Scholar

3-level confidence voting strategy for dynamic fusion-selection of classifier ensembles

Author: Főző Csaba
Papanek-Gáspár Csaba
Publication venue
Publication date: 01/01/2009
Field of study

There are two different stages to consider when constructing multiple classifier systems: The Meta-Classifier Stage that is responsible for the combination logic and basically treats the ensemble members as black boxes, and the Classifier Stage where the functionality of members is in focus. Furthermore, on the upper stage - also called voting strategy stage - the method of combining members can be done by fusion and selection of classifiers. In this paper, we propose a novel procedure for building the meta-classifier stage of MCSs, using an oracle of three-level voting strategy. This is a dynamic, half fusion-half selection type method for ensemble member combination, which is midway between the extremes of fusion and selection. The MCS members are weighted and combined with the help of the oracle, which is founded on a voting strategy of three levels: (1) The Local Implicit Confidence (LIC), (2) The Global Explicit Confidence (GEC), and (3) The Local Explicit Confidence (LEC). The first confidence segment is dependent of classifier construction, via the implicit knowledge gathered simultaneously with training. Since this strongly depends on the internal operation of the classifier, it can not always be obtained, for example, when using some particularly complex classification methods. We used several, known classifier algorithms (Decision Trees, Neural Networks, Logistic Regression, SVM) where it is possible to extract this information. The second goodness index is calculated on the validation partition of the labeled train data. It is used to obtain the general accuracy of a single classifier using a data set independent of the training partition. And finally, the third part of the confidence triplet depends also on the unlabeled objects yet to be classified. Due to this, it can only be calculated in classification time

Crossref

University of Szeged

Robustification of Naïve Bayes Classifier and Its Application for Microarray Gene Expression Data Analysis

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

An investigation into the use of gaussian processes for the analysis of microarray data

Author: SIAH KENG BOON
Publication venue
Publication date: 01/09/2004
Field of study

Master'sMASTER OF ENGINEERIN

ScholarBank@NUS

Feature selection and classification for high-dimensional biological data under cross-validation framework

Author: Zhong Yi
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2018
Field of study

This research focuses on using statistical learning methods on high-dimensional biological data analysis. In our implementation of high-dimensional biological data analysis, we primarily utilize the statistical learning methods in selecting important predictors and to build predictive classification models. Traditionally, cross-validation methods have been used in order to determine the tuning or threshold parameter for the feature selection. We propose improvements over the methods by adding repeated and nested cross validation techniques. Also, several types of machine learning methods such as lasso, support vector machine and random forest have been used by many previous studies. Those methods have their own merits and demerits. We also propose ensemble feature selection out of the results of the three machine learning methods by capturing their strengths in order to find the more stable feature subset and to optimize the prediction accuracy. We utilize DNA microarray gene expression datasets to describe our methods. We have summarized our work in the following order: (1) the structure of high dimensional biological datasets and the statistical methods to analyze such data; (2) several statistical and machine learning algorithms to analyze high-dimensional biological datasets; (3) improved cross-validation and ensemble learning method to achieve better prediction accuracy and (4) examples using the DNA microarray data to describe our metho

KU ScholarWorks

Robust learning and segmentation for secure understanding

Author: Martin Ian Stefan
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2005
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 85-91).This thesis demonstrates methods useful in learning to understand images from only a few examples, but they are by no means limited to this application. Boosting techniques are popular because they learn effective classification functions and identify the most relevant features at the same time. However, in general, they overfit and perform poorly on data sets that contain many features, but few examples. A novel stochastic regularization technique is presented, based on enhancing data sets with corrupted copies of the examples to produce a more robust classifier. This regularization technique enables the gentle boosting algorithm to work well with only a few examples. It is tested on a variety of data sets from various domains, including object recognition and bioinformatics, with convincing results. In the second part of this work, a novel technique for extracting texture edges is introduced, based on the combination of a patch-based approach, and non-param8tric tests of distributions. This technique can reliably detect texture edges using only local information, making it a useful preprocessing step prior to segmentation. Combined with a parametric deformable model, this technique provides smooth boundaries and globally salient structures.by Ian Stefan Martin.M.Eng

DSpace@MIT