Search CORE

8 research outputs found

A comparative evaluation of medium- and large-scale feature selectors for pattern classifiers

Author: Kudo Mineichi
Sklansky Jack
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

summary:Needs of feature selection in medium and large problems increases in many fields including medical and image processing fields. Previous comparative studies of feature selection algorithms are not satisfactory in problem size and in criterion function. In addition, no way has not shown to compare algorithms with different objectives. In this study, we propose a unified way to compare a large variety of algorithms. Our results show that the sequential floating algorithms promises for up to medium problems and genetic algorithms for medium and large problems

Institute of Mathematics AS CR, v. v. i.

Recommended from our members

A niching memetic algorithm for simultaneous clustering and feature selection

Author: Fairhurst M
Liu X
Sheng W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2008
Field of study

Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data

Brunel University Research Archive

IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

Author: Li Ping
Li Xiaoyun
Wu Chengxi
Publication venue
Publication date: 02/04/2020
Field of study

Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Методи, алгоритми і програмне забезпечення для розширення функцій САПР моделей реальних об’єктів в режимі доповненої реальності

Author: Циганок Євген
Publication venue
Publication date: 07/12/2020
Field of study

Об’єкт дослідження: процес демонстрації тривимірної моделі в режимі доповненої реальності. Предмет дослідження: методи демонстрації моделі створеної в САПР Autodesk Inventor в режимі доповненої реальності. Мета магістерської роботи: підвищення ефективності роботи САПР Autodesk Inventor для демонстрації тривимірних об'єктів будь-якої складності, в режимі доповненої реальності. Методи дослідження. При вирішенні поставлених завдань виконано аналіз і наукове узагальнення літературних джерел по вихідним посилам досліджень. Наукова новизна отриманих результатів дипломної роботи визначається тим, шо вперше розроблена система що дозволяє розглядати тривимірні моделі створені в САПР Autodesk Inventor. Практична цінність полягає в тому, що розроблений в рамках роботи додаток дозволяє розглядати створені моделі в режимі доповненої реальності дозволяє демонструвати розробляється продукт на всіх етапах проєктування без необхідності його виготовлення, тим самим зменшуючи витрати на виготовлення і транспортування вироби

eLibrary National Mining University

A new approach of top-down induction of decision trees for knowledge discovery

Author: Lee Jun-Youl
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2008
Field of study

Top-down induction of decision trees is the most popular technique for classification in the field of data mining and knowledge discovery. Quinlan developed the basic induction algorithm of decision trees, ID3 (1984), and extended to C4.5 (1993). There is a lot of research work for dealing with a single attribute decision-making node (so-called the first-order decision) of decision trees. Murphy and Pazzani (1991) addressed about multiple-attribute conditions at decision-making nodes. They show that higher order decision-making generates smaller decision trees and better accuracy. However, there always exist NP-complete combinations of multiple-attribute decision-makings.;We develop a new algorithm of second-order decision-tree inductions (SODI) for nominal attributes. The induction rules of first-order decision trees are combined by \u27AND\u27 logic only, but those of SODI consist of \u27AND\u27, \u27OR\u27, and \u27OTHERWISE\u27 logics. It generates more accurate results and smaller decision trees than any first-order decision tree inductions.;Quinlan used information gains via VC-dimension (Vapnik-Chevonenkis; Vapnik, 1995) for clustering the experimental values for each numerical attribute. However, many researchers have discovered the weakness of the use of VC-dim analysis. Bennett (1997) sophistically applies support vector machines (SVM) to decision tree induction. We suggest a heuristic algorithm (SVMM; SVM for Multi-category) that combines a TDIDT scheme with SVM. In this thesis it will be also addressed how to solve multiclass classification problems.;Our final goal for this thesis is IDSS (Induction of Decision Trees using SODI and SVMM). We will address how to combine SODI and SVMM for the construction of top-down induction of decision trees in order to minimize the generalized penalty cost

Digital Repository @ Iowa State University (ISU)

ProQuest OAI Repository

Predicción y selección de características, mediante análisis local de la fiabilidad, para el mercado de valores y su extensión a problemas de clasificación y regresión

Author: Martín Manso Ricardo
Publication venue
Publication date: 01/01/2017
Field of study

Esta tesis se encuadra dentro del ámbito del Aprendizaje Automático, un área de la Inteligencia Artificial (IA). A lo largo de la misma, se han diseñado y validado experimentalmente, nuevas técnicas de selección de atributos y de clasificación. La motivación para el desarrollo de dichas técnicas, se basa en el deseo de implementar herramientas adecuadas para tratar problemas de selección de atributos y de clasificación en un dominio de especial dificultad: el mercado de valores. Se ha partido de la hipótesis de que los factores que dificultan la clasificación correcta de los datos son, a menudo, una ratio desfavorable entre información y ruido, una alta dimensionalidad, escasez de patrones y desbalanceo del número de patrones de cada clase. Una vez identificados dichos factores, se han diseñado técnicas robustas frente a estos, concretamente un algoritmo de selección de atributos (con diferentes variantes) y un algoritmo de clasificación. Estas técnicas se han validado sobre un exhaustivo conjunto de problemas generados artificialmente y en problemas reales del mercado de valores. Por último, se ha explorado la posibilidad de utilizar las nuevas técnicas de selección de atributos propuestas en problemas convencionales. Para ello, se han validado sobre un conjunto de dominios reales de uso común en Aprendizaje Automático, tanto para clasificación como para regresión.This thesis belongs to Machine Learning, an area of Artificial Intelligence (AI). During its development, new techniques of attribute selection and classification have been designed and validated empirically. The motivation for the development of these techniques is based on the desire to implement adequate tools to deal with feature selection and classification problems in an area of particular difficulty: the stock market. Based on the hypothesis that the factors which make data classification difficult are, frequently, a low ratio between information and noise; high dimensionality, small training samples, and class imbalance. Once these factors have been identified, robust techniques to deal with them were designed, specifically a feature selection algorithm (with different variants) and a classification algorithm. These techniques have been validated over exhaustive synthetic data sets and stock market problems. Finally, the possibility of using the new feature selection techniques were explored in conventional problems. To this end, they were validated using a data set of actual domains, both for classification and regression.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Pedro Isasi Viñuela.- Secretario: David Camacho Fernández.- Vocal: Sonia Schulenbur

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Un nouvel algorithme de sélection de caractéristiques : application à la lecture automatique de l'écriture manuscrite

Author: Grandidier Frédéric
Publication venue: École de technologie supérieure
Publication date
Field of study

La problématique abordée dans cette thèse est celle de la reconnaissance de l'écriture manuscrite hors-ligne, avec pour application industrielle le tri automatique du courrier. En effet le Service de Recherche Technique de La Poste (France) nous a donné pour mandat d'améliorer son système de reconnaissance de l'écriture manuscrite. Une analyse approfondie du système existant a permis de dégager une direction principale de recherche: l'amélioration de la représentation de l'information fournie au système de reconnaissance. Elle est caractérisée par deux ensembles finis de primitives, qui sont comnbinés avant intégration dans le système, au moyen d'un produit cartésien. L'amélioration de la représentation de l'information passe par l'extraction de nouvelles primitives. Dans cette optique, trois nouveaux espaces de représentation ont été développés. L'utilisation d'un algorithme de quantification vectorielle permet de construire plusieurs ensembles de primitives. Afin d'augmenter le pouvoir discriminant de ces dernières, différentes stratégies ont été évaluées: l'analyse discriminante linéaire, la technique de zoning et en association avec cette dernière stratégie de pondération des zones. La combinaison des espaces de représentation et des stratégies d'amélioration a conduit à la construction de plusieurs systèmes de reconnaissance obtenant de meilleures performances que système de base. La technique permettant de combiner les ensembles de primitives dans le système de base ne peut pas être utilisée. Un nouvel algorithme a été développé afin d'intégrer de nouveaux ensembles de primitives. L'idée de base est de remplacer les primitives les moins discriminantes d'un ensemble de départ par de nouvelles. Une stratégie effectuant des regroupements de primitives non-discriminantes permet de décomposer la tâche globale de reconnaissance en sous-problèmes. La définition et la sélection dynamique de nouvelles primitives est alors orientée par cette décomposition. L'application de l'algorithme aboutit à une représentation de l'information améliorée caractérisée par une hiérarchie de primitives. Son déroulement automatique permet une adaptation rapide à de nouvelles données ou à la disponibilité d'un nouvel espace de représentation. Les performances du système de base, utilisant la combinaison de deux ensembles de primitives est de 89,5% lors de l'utilisation d'un lexique de taille 1 000. L'amélioration d'un des deux ensembles conduit à une performance de 94,3%, tout en diminuant de 20% le nombre de primitives utilisées

Espace ÉTS