6 research outputs found

    Data Set Editing by Ordered Projection

    Get PDF
    This paper presents a new approach to data set editing. The algorithm (EOP: Editing by Ordered Projection) has some interesting characteristics: important reduction of the number of examples from the database; lower computational cost (O(mn log n)) with respect to other typical algorithms due to the absence of distance calculations; conservation of the decision boundaries, especially from the point of view of the application of axis-parallel classifiers. The performance of EOP is analysed in two ways: percentage of reduction and classification. EOP has been compared to IB2, ENN and SHRINK concerning the percentage of reduction and the computational cost. In addition, we have analysed the accuracy of k-NN and C4.5 after applying the reduction techniques. An extensive empirical study using databases with continuous attributes from the UCI repository shows that EOP is a valuable preprocessing method for the later application of any axis-parallel learning algorithm.Comisi贸n Interministerial de Ciencia y Tecnolog铆a TIC2001-1143-C03-0

    A Measure for Data Set Editing by Ordered Projections

    Get PDF
    In this paper we study a measure, named weakness of an example, which allows us to establish the importance of an example to find representative patterns for the data set editing problem. Our ap proach consists in reducing the database size without losing information, using algorithm patterns by ordered projections. The idea is to relax the reduction factor with a new parameter, 位, removing all examples of the database whose weakness verify a condition over this 位. We study how to establish this new parameter. Our experiments have been carried out using all databases from UCI-Repository and they show that is possible a size reduction in complex databases without notoriously increase of the error rate

    Facing-up Challenges of Multiobjective Clustering Based on Evolutionary Algorithms: Representations, Scalability and Retrieval Solutions

    Get PDF
    Aquesta tesi es centra en algorismes de clustering multiobjectiu, que estan basats en optimitzar varis objectius simult脿niament obtenint una col鈥ecci贸 de solucions potencials amb diferents compromisos entre objectius. El prop貌sit d'aquesta tesi consisteix en dissenyar i implementar un nou algorisme de clustering multiobjectiu basat en algorismes evolutius per afrontar tres reptes actuals relacionats amb aquest tipus de t猫cniques. El primer repte es centra en definir adequadament l'脿rea de possibles solucions que s'explora per obtenir la millor soluci贸 i que dep猫n de la representaci贸 del coneixement. El segon repte consisteix en escalar el sistema dividint el conjunt de dades original en varis subconjunts per treballar amb menys dades en el proc茅s de clustering. El tercer repte es basa en recuperar la soluci贸 m茅s adequada tenint en compte la qualitat i la forma dels clusters a partir de la regi贸 m茅s interessant de la col鈥ecci贸 de solucions ofertes per l鈥檃lgorisme.Esta tesis se centra en los algoritmos de clustering multiobjetivo, que est谩n basados en optimizar varios objetivos simult谩neamente obteniendo una colecci贸n de soluciones potenciales con diferentes compromisos entre objetivos. El prop贸sito de esta tesis consiste en dise帽ar e implementar un nuevo algoritmo de clustering multiobjetivo basado en algoritmos evolutivos para afrontar tres retos actuales relacionados con este tipo de t茅cnicas. El primer reto se centra en definir adecuadamente el 谩rea de posibles soluciones explorada para obtener la mejor soluci贸n y que depende de la representaci贸n del conocimiento. El segundo reto consiste en escalar el sistema dividiendo el conjunto de datos original en varios subconjuntos para trabajar con menos datos en el proceso de clustering El tercer reto se basa en recuperar la soluci贸n m谩s adecuada seg煤n la calidad y la forma de los clusters a partir de la regi贸n m谩s interesante de la colecci贸n de soluciones ofrecidas por el algoritmo.This thesis is focused on multiobjective clustering algorithms, which are based on optimizing several objectives simultaneously obtaining a collection of potential solutions with different trade卢offs among objectives. The goal of the thesis is to design and implement a new multiobjective clustering technique based on evolutionary algorithms for facing up three current challenges related to these techniques. The first challenge is focused on successfully defining the area of possible solutions that is explored in order to find the best solution, and this depends on the knowledge representation. The second challenge tries to scale-up the system splitting the original data set into several data subsets in order to work with less data in the clustering process. The third challenge is addressed to the retrieval of the most suitable solution according to the quality and shape of the clusters from the most interesting region of the collection of solutions returned by the algorithm
    corecore