23 research outputs found

    KFHE-HOMER: A multi-label ensemble classification algorithm exploiting sensor fusion properties of the Kalman filter

    Full text link
    Multi-label classification allows a datapoint to be labelled with more than one class at the same time. In spite of their success in multi-class classification problems, ensemble methods based on approaches other than bagging have not been widely explored for multi-label classification problems. The Kalman Filter-based Heuristic Ensemble (KFHE) is a recent ensemble method that exploits the sensor fusion properties of the Kalman filter to combine several classifier models, and that has been shown to be very effective. This article proposes KFHE-HOMER, an extension of the KFHE ensemble approach to the multi-label domain. KFHE-HOMER sequentially trains multiple HOMER multi-label classifiers and aggregates their outputs using the sensor fusion properties of the Kalman filter. Experiments described in this article show that KFHE-HOMER performs consistently better than existing multi-label methods including existing approaches based on ensembles.Comment: The paper is under consideration at Pattern Recognition Letters, Elsevie

    A Majority Vote Based Classifier Ensemble for Web Service Classification

    Get PDF
    Service oriented architecture is a glue that allows web applications to work in collaboration. It has become a driving force for the service-oriented computing (SOC) paradigm. In heterogeneous environments the SOC paradigm uses web services as the basic building block to support low costs as well as easy and rapid composition of distributed applications. A web service exposes its interfaces using the Web Service Description Language (WSDL). A central repository called universal description, discovery and integration (UDDI) is used by service providers to publish and register their web services. UDDI registries are used by web service consumers to locate the web services they require and metadata associated with them. Manually analyzing WSDL documents is the best approach, but also most expensive. Work has been done on employing various approaches to automate the classification of web services. However, previous research has focused on using a single technique for classification. This research paper focuses on the classification of web services using a majority vote based classifier ensemble technique. The ensemble model overcomes the limitations of conventional techniques by employing the ensemble of three heterogeneous classifiers: Naïve Bayes, decision tree (J48), and Support Vector Machines. We applied tenfold cross-validation to test the efficiency of the model on a publicly available dataset consisting of 3738 real world web services categorized into 5 fields, which yielded an average accuracy of 92 %. The high accuracy is owed to two main factors, i.e., enhanced pre-processing with focused feature selection, and majority based ensemble classification

    Multi-label learning by extended multi-tier stacked ensemble method with label correlated feature subset augmentation

    Get PDF
    Classification is one of the basic and most important operations that can be used in data science and machine learning applications. Multi-label classification is an extension of the multi-class problem where a set of class labels are associated with a particular instance at a time. In a multiclass problem, a single class label is associated with an instance at a time. However, there are many different stacked ensemble methods that have been proposed and because of the complexity associated with the multi-label problems, there is still a lot of scope for improving the prediction accuracy. In this paper, we are proposing the novel extended multi-tier stacked ensemble (EMSTE) method with label correlationby feature subset selection technique and then augmenting those feature subsets while constructing the intermediate dataset for improving the prediction accuracy in the generalization phase of the stacking. The performance effect of the proposed method has been compared with existing methods and showed that our proposed method outperforms the other methods

    Modelos de clasificación multi-etiqueta para datos heterogéneos: un enfoque basado en ensembles

    Get PDF
    In recent years, the multi-label classification task has gained the attention of the scientific community given its ability to solve real-world problems where each instance of the dataset may be associated with several class labels simultaneously. For example, in medical problems each patient may be affected by several diseases at the same time, and in multimedia categorization problems, each item might be related with different tags or topics. Thus, given the nature of these problems, dealing with them as traditional classification problems where just one class label is assigned to each instance, would lead to a lose of information. However, the fact of having more than one label associated with each instance leads to new classification challenges that should be addressed, such as modeling the compound dependencias among labels, the imbalance of the label space, and the high dimensionality of the output space. A large number of methods for multi-label classification has been proposed in the literature, including several ensemble-based methods. Ensemble learning is a technique which is based on combining the outputs of many diverse base models, in order to outperform each of the separate members. In multi-label classification, ensemble methods are those that combine the predictions of several multi-label classifiers, and these methods have shown to outperform simpler multi-label classifiers. Therefore, given its great performance, we focused our research on the study of ensemble-based methods for multi-label classification. The first objective of this dissertation is to perform an thorough review of the state-of-the-art ensembles of multi-label classifiers. Its aim is twofold: I) study different ensembles of multi-label classifiers proposed in the literature, and categorize them according to their characteristics proposing a novel taxonomy; and II) perform an experimental study to find the method or family of methods that performs better depending on the characteristics of the data, as well as provide then some guidelines to select the best method according to the characteristics of a given problem. Since most of the ensemble methods for multi-label classification are based on creating diverse members by randomly selecting instances, input features, or labels, our second and main objective is to propose novel ensemble methods for multi-label classification where the characteristics of the data are taken into account. For this purpose, we first propose an evolutionary algorithm able to build an ensemble of multi-label classifiers, where each of the individuals of the population is an entire ensemble. This approach is able to model the relationships among the labels with a relative low complexity and imbalance of the output space, also considering these characteristics to guide the learning process. Furthermore, it looks for an optimal structure of the ensemble not only considering its predictive performance, but also the number of times that each label appears in it. In this way, all labels are expected to appear a similar number of times in the ensemble, not neglecting any of them regardless of their frequency. Then, we develop a second evolutionary algorithm able to build ensembles of multi-label classifiers, but in this case each individual of the population is a hypothetical member of the ensemble, and not the entire ensemble. The fact of evolving members of the ensemble separately makes the algorithm less computationally complex and able to determine the quality of each member separately. However, a method to select the ensemble members needs to be defined. This process selects those classifiers that are both accurate but also diverse among them to form the ensemble, also controlling that all labels appear a similar number of times in the final ensemble. In all experimental studies, the methods are compared using rigorous experimental setups and statistical tests over many evaluation metrics and reference datasets in multi-label classification. The experiments confirm that the proposed methods obtain significantly better and more consistent performance than the stateof- the-art methods in multi-label classification. Furthermore, the second proposal is proven to be more efficient than the first one, given the use of separate classifiers as individuals.En los últimos años, el paradigma de clasificación multi-etiqueta ha ganado atención en la comunidad científica, dada su habilidad para resolver problemas reales donde cada instancia del conjunto de datos puede estar asociada con varias etiquetas de clase simultáneamente. Por ejemplo, en problemas médicos cada paciente puede estar afectado por varias enfermedades a la vez, o en problemas de categorización multimedia, cada ítem podría estar relacionado con varias etiquetas o temas. Dada la naturaleza de estos problemas, tratarlos como problemas de clasificación tradicional donde cada instancia puede tener asociada únicamente una etiqueta de clase, conllevaría una pérdida de información. Sin embargo, el hecho de tener más de una etiqueta asociada con cada instancia conlleva la aparición de nuevos retos que deben ser abordados, como modelar las dependencias entre etiquetas, el desbalanceo de etiquetas, y la alta dimensionalidad del espacio de salida. En la literatura se han propuesto un gran número de métodos para clasificación multi-etiqueta, incluyendo varios basados en ensembles. El aprendizaje basado en ensembles combina las salidas de varios modelos más simples y diversos entre sí, de cara a conseguir un mejor rendimiento que cada miembro por separado. En clasificación multi-etiqueta, se consideran ensembles aquellos métodos que combinan las predicciones de varios clasificadores multi-etiqueta, y estos métodos han mostrado conseguir un mejor rendimiento que los clasificadores multi-etiqueta sencillos. Por tanto, dado su buen rendimiento, centramos nuestra investigación en el estudio de métodos basados en ensembles para clasificación multi-etiqueta. El primer objetivo de esta tesis el realizar una revisión a fondo del estado del arte en ensembles de clasificadores multi-etiqueta. El objetivo de este estudio es doble: I) estudiar diferentes ensembles de clasificadores multi-etiqueta propuestos en la literatura, y categorizarlos de acuerdo a sus características proponiendo una nueva taxonomía; y II) realizar un estudio experimental para encontrar el método o familia de métodos que obtiene mejores resultados dependiendo de las características de los datos, así como ofrecer posteriormente algunas guías para seleccionar el mejor método de acuerdo a las características de un problema dado. Dado que la mayoría de ensembles para clasificación multi-etiqueta están basados en la creación de miembros diversos seleccionando aleatoriamente instancias, atributos, o etiquetas; nuestro segundo y principal objetivo es proponer nuevos modelos de ensemble para clasificación multi-etiqueta donde se tengan en cuenta las características de los datos. Para ello, primero proponemos un algoritmo evolutivo capaz de generar un ensemble de clasificadores multi-etiqueta, donde cada uno de los individuos de la población es un ensemble completo. Este enfoque es capaz de modelar las relaciones entre etiquetas con una complejidad y desbalanceo de etiquetas relativamente bajos, considerando también estas características para guiar el proceso de aprendizaje. Además, busca una estructura óptima para el ensemble, no solo considerando su capacidad predictiva, pero también teniendo en cuenta el número de veces que aparece cada etiqueta en él. De este modo, se espera que todas las etiquetas aparezcan un número de veces similar en el ensemble, sin despreciar ninguna de ellas independientemente de su frecuencia. Posteriormente, desarrollamos un segundo algoritmo evolutivo capaz de construir ensembles de clasificadores multi-etiqueta, pero donde cada individuo de la población es un hipotético miembro del ensemble, en lugar del ensemble completo. El hecho de evolucionar los miembros del ensemble por separado hace que el algoritmo sea menos complejo y capaz de determinar la calidad de cada miembro por separado. Sin embargo, también es necesario definir un método para seleccionar los miembros que formarán el ensemble. Este proceso selecciona aquellos clasificadores que sean tanto precisos como diversos entre ellos, también controlando que todas las etiquetas aparezcan un número similar de veces en el ensemble final. En todos los estudios experimentales realizados, los métodos han sido comparados utilizando rigurosas configuraciones experimentales y test estadísticos, involucrando varias métricas de evaluación y conjuntos de datos de referencia en clasificación multi-etiqueta. Los experimentos confirman que los métodos propuestos obtienen un rendimiento significativamente mejor y más consistente que los métodos en el estado del arte. Además, se demuestra que el segundo algoritmo propuesto es más eficiente que el primero, dado el uso de individuos representando clasificadores por separado

    Fast solar image classification using deep learning and its importance for automation in solar physics

    Get PDF
    The volume of data being collected in solar physics has exponentially increased over the past decade and with the introduction of the Daniel K. Inouye Solar Telescope (DKIST) we will be entering the age of petabyte solar data. Automated feature detection will be an invaluable tool for post-processing of solar images to create catalogues of data ready for researchers to use. We propose a deep learning model to accomplish this; a deep convolutional neural network is adept at feature extraction and processing images quickly. We train our network using data from Hinode/Solar Optical Telescope (SOT) Hα images of a small subset of solar features with different geometries: filaments, prominences, flare ribbons, sunspots and the quiet Sun (i.e. the absence of any of the other four features). We achieve near perfect performance on classifying unseen images from SOT (≈ 99.9%) in 4.66 seconds. We also for the first time explore transfer learning in a solar context. Transfer learning uses pre-trained deep neural networks to help train new deep learning models i.e. it teaches a new model. We show that our network is robust to changes in resolution by degrading images from SOT resolution (≈0.33′′ at λ=6563 Å) to Solar Dynamics Observatory/Atmospheric Imaging Assembly (SDO/AIA) resolution (≈1.2′′) without a change in performance of our network. However, we also observe where the network fails to generalise to sunspots from SDO/AIA bands 1600/1700 Å due to small-scale brightenings around the sunspots and prominences in SDO/AIA 304 Å due to coronal emission
    corecore