1,536 research outputs found

    Nuevos Modelos de Aprendizaje Híbrido para Clasificación y Ordenamiento Multi-Etiqueta

    Get PDF
    En la última década, el aprendizaje multi-etiqueta se ha convertido en una importante tarea de investigación, debido en gran parte al creciente número de problemas reales que contienen datos multi-etiqueta. En esta tesis se estudiaron dos problemas sobre datos multi-etiqueta, la mejora del rendimiento de los algoritmos en datos multi-etiqueta complejos y la mejora del rendimiento de los algoritmos a partir de datos no etiquetados. El primer problema fue tratado mediante métodos de estimación de atributos. Se evaluó la efectividad de los métodos de estimación de atributos propuestos en la mejora del rendimiento de los algoritmos de vecindad, mediante la parametrización de las funciones de distancias empleadas para recuperar los ejemplos más cercanos. Además, se demostró la efectividad de los métodos de estimación en la tarea de selección de atributos. Por otra parte, se desarrolló un algoritmo de vecindad inspirado en el enfoque de clasifcación basada en gravitación de datos. Este algoritmo garantiza un balance adecuado entre eficiencia y efectividad en su solución ante datos multi-etiqueta complejos. El segundo problema fue resuelto mediante técnicas de aprendizaje activo, lo cual permite reducir los costos del etiquetado de datos y del entrenamiento de un mejor modelo. Se propusieron dos estrategias de aprendizaje activo. La primer estrategia resuelve el problema de aprendizaje activo multi-etiqueta de una manera efectiva y eficiente, para ello se combinaron dos medidas que representan la utilidad de un ejemplo no etiquetado. La segunda estrategia propuesta se enfocó en la resolución del problema de aprendizaje activo multi-etiqueta en modo de lotes, para ello se formuló un problema multi-objetivo donde se optimizan tres medidas, y el problema de optimización planteado se resolvió mediante un algoritmo evolutivo. Como resultados complementarios derivados de esta tesis, se desarrolló una herramienta computacional que favorece la implementación de métodos de aprendizaje activo y la experimentación en esta tarea de estudio. Además, se propusieron dos aproximaciones que permiten evaluar el rendimiento de las técnicas de aprendizaje activo de una manera más adecuada y robusta que la empleada comunmente en la literatura. Todos los métodos propuestos en esta tesis han sido evaluados en un marco experimental adecuado, se utilizaron numerosos conjuntos de datos y se compararon los rendimientos de los algoritmos frente a otros métodos del estado del arte. Los resultados obtenidos, los cuales fueron verificados mediante la aplicación de test estadísticos no paramétricos, demuestran la efectividad de los métodos propuestos y de esta manera comprueban las hipótesis planteadas en esta tesis.In the last decade, multi-label learning has become an important area of research due to the large number of real-world problems that contain multi-label data. This doctoral thesis is focused on the multi-label learning paradigm. Two problems were studied, rstly, improving the performance of the algorithms on complex multi-label data, and secondly, improving the performance through unlabeled data. The rst problem was solved by means of feature estimation methods. The e ectiveness of the feature estimation methods proposed was evaluated by improving the performance of multi-label lazy algorithms. The parametrization of the distance functions with a weight vector allowed to recover examples with relevant label sets for classi cation. It was also demonstrated the e ectiveness of the feature estimation methods in the feature selection task. On the other hand, a lazy algorithm based on a data gravitation model was proposed. This lazy algorithm has a good trade-o between e ectiveness and e ciency in the resolution of the multi-label lazy learning. The second problem was solved by means of active learning techniques. The active learning methods allowed to reduce the costs of the data labeling process and training an accurate model. Two active learning strategies were proposed. The rst strategy e ectively solves the multi-label active learning problem. In this strategy, two measures that represent the utility of an unlabeled example were de ned and combined. On the other hand, the second active learning strategy proposed resolves the batch-mode active learning problem, where the aim is to select a batch of unlabeled examples that are informative and the information redundancy is minimal. The batch-mode active learning was formulated as a multi-objective problem, where three measures were optimized. The multi-objective problem was solved through an evolutionary algorithm. This thesis also derived in the creation of a computational framework to develop any active learning method and to favor the experimentation process in the active learning area. On the other hand, a methodology based on non-parametric tests that allows a more adequate evaluation of active learning performance was proposed. All methods proposed were evaluated by means of extensive and adequate experimental studies. Several multi-label datasets from di erent domains were used, and the methods were compared to the most signi cant state-of-the-art algorithms. The results were validated using non-parametric statistical tests. The evidence showed the e ectiveness of the methods proposed, proving the hypotheses formulated at the beginning of this thesis

    An Optimisation-Driven Prediction Method for Automated Diagnosis and Prognosis

    Get PDF
    open access articleThis article presents a novel hybrid classification paradigm for medical diagnoses and prognoses prediction. The core mechanism of the proposed method relies on a centroid classification algorithm whose logic is exploited to formulate the classification task as a real-valued optimisation problem. A novel metaheuristic combining the algorithmic structure of Swarm Intelligence optimisers with the probabilistic search models of Estimation of Distribution Algorithms is designed to optimise such a problem, thus leading to high-accuracy predictions. This method is tested over 11 medical datasets and compared against 14 cherry-picked classification algorithms. Results show that the proposed approach is competitive and superior to the state-of-the-art on several occasions

    Passively mode-locked laser using an entirely centred erbium-doped fiber

    Get PDF
    This paper describes the setup and experimental results for an entirely centred erbium-doped fiber laser with passively mode-locked output. The gain medium of the ring laser cavity configuration comprises a 3 m length of two-core optical fiber, wherein an undoped outer core region of 9.38 μm diameter surrounds a 4.00 μm diameter central core region doped with erbium ions at 400 ppm concentration. The generated stable soliton mode-locking output has a central wavelength of 1533 nm and pulses that yield an average output power of 0.33 mW with a pulse energy of 31.8 pJ. The pulse duration is 0.7 ps and the measured output repetition rate of 10.37 MHz corresponds to a 96.4 ns pulse spacing in the pulse train

    Scalable Multi-label Classification

    Get PDF
    Multi-label classification is relevant to many domains, such as text, image and other media, and bioinformatics. Researchers have already noticed that in multi-label data, correlations exist between labels, and a variety of approaches, drawing inspiration from many spheres of machine learning, have been able to model these correlations. However, data sources from the real world are growing ever larger and the multi-label task is particularly sensitive to this due to the complexity associated with multiple labels and the correlations between them. Consequently, many methods do not scale up to large problems. This thesis deals with scalable multi-label classification: methods which exhibit high predictive performance, but are also able to scale up to larger problems. The first major contribution is the pruned sets method, which is able to model label correlations directly for high predictive performance, but reduces overfitting and complexity over related methods by pruning and subsampling label sets, and can thus scale up to larger datasets. The second major contribution is the classifier chains method, which models correlations with a chain of binary classifiers. The use of binary models allows for scalability to even larger datasets. Pruned sets and classifier chains are robust with respect to both the variety and scale of data that they can deal with, and can be incorporated into other methods. In an ensemble scheme, these methods are able to compete with state-of-the-art methods in terms of predictive performance as well as scale up to large datasets of hundreds of thousands of training examples. This thesis also puts a special emphasis on multi-label evaluation; introducing a new evaluation measure and studying threshold calibration. With one of the largest and most varied collections of multi-label datasets in the literature, extensive experimental evaluation shows the advantage of these methods, both in terms of predictive performance, and computational efficiency and scalability

    Machine learning applied to crime prediction

    Get PDF
    Machine Learning is a cornerstone when it comes to artificial intelligence and big data analysis. It provides powerful algorithms that are capable of recognizing patterns, classifying data, and, basically, learn by themselves to perform a specific task. This field has incredibly grown in popularity these days, however, it still remains unknown for the majority of people, and even for most professionals. This project intends to provide an understandable explanation of what is it, what types are there and what it can be used for, as well as solve a real data classification problem (namely San Francisco crimes classification) using different algorithms, such as K-Nearest Neighbours, Parzen windows and Neural Networks, as an introduction to this field.El "Machine Learning" o aprendizaje máquina es la piedra angular de la inteligencia artificial i el análisis de grandes volúmenes de datos. Provee algoritmos potentes que son capaces de reconocer patrones, clasificar datos, y, básicamente, aprender por ellos mismos a hacer una tarea específica. Este campo ha crecido en popularidad últimamente, pero, aun así, todavía es un gran desconocido para la mayoría de gente, incluidos muchos profesionales del sector. La intención de este proyecto es dar una explicación más inteligible de qué es, qué tipos hay y para qué se puede usar, así como resolver un problema real de clasificación de datos (clasificando los crímenes de la ciudad de San Francisco) usando diversos algoritmos como K-Nearest Neighbours (K vecinos más cercanos), ventanas de Parzen y Redes Neuronales, como introducción a este campo.El "Machine Learning" o aprenentatge màquina és la pedra angular de la intel·ligència artificial i l'anàlisi de grans volums de dades. Proveeix algoritmes potents que són capaços de reconèixer patrons, classificar dades, i, bàsicament, aprendre per ells mateixos a fer una tasca específica. Aquest camp ha crescut en popularitat darrerament, però, tot i això, encara és un gran desconegut per la majoria de gent, inclosos molts professionals del sector. La intenció d'aquest projecte és donar una explicació més intel·ligible de què és, quins tipus hi ha i per a què es pot fer servir, així com solucionar un problema real de classificació de dades (classificant els crims de la ciutat de San Francisco) fent servir diversos algoritmes com K-Nearest Neighbours (K veïns més propers), finestres de Parzen i Xarxes Neuronals, com a introducció a aquest camp

    A triple-random ensemble classification method for mining multi-label data

    Full text link
    This paper presents a triple-random ensemble learning method for handling multi-label classification problems. The proposed method integrates and develops the concepts of random subspace, bagging and random k-label sets ensemble learning methods to form an approach to classify multi-label data. It applies the random subspace method to feature space, label space as well as instance space. The devised subsets selection procedure is executed iteratively. Each multi-label classifier is trained using the randomly selected subsets. At the end of the iteration, optimal parameters are selected and the ensemble MLC classifiers are constructed. The proposed method is implemented and its performance compared against that of popular multi-label classification methods. The experimental results reveal that the proposed method outperforms the examined counterparts in most occasions when tested on six small to larger multi-label datasets from different domains. This demonstrates that the developed method possesses general applicability for various multi-label classification problems.<br /

    Multi-Instance Multi-Label Learning

    Get PDF
    In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.Comment: 64 pages, 10 figures; Artificial Intelligence, 201
    corecore