125 research outputs found

    Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification

    Full text link
    Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention from the research community, this is not properly investigated for complex prediction tasks with structurally dependent variables. This is the case of multi-label classification and hierarchical multi-label classification tasks, which may require additional information, possibly coming from the underlying distribution in the descriptive space provided by unlabeled examples, to better face the challenging task of predicting simultaneously multiple class labels. In this paper, we investigate this aspect and propose a (hierarchical) multi-label classification method based on semi-supervised learning of predictive clustering trees. We also extend the method towards ensemble learning and propose a method based on the random forest approach. Extensive experimental evaluation conducted on 23 datasets shows significant advantages of the proposed method and its extension with respect to their supervised counterparts. Moreover, the method preserves interpretability and reduces the time complexity of classical tree-based models

    Disentangling diatom species complexes: does morphometry suffice?

    Get PDF
    [EN] Accurate taxonomic resolution in light microscopy analyses of microalgae is essential to achieve high quality, comparable results in both floristic analyses and biomonitoring studies. A number of closely related diatom taxa have been detected to date co-occurring within benthic diatom assemblages, sharing many morphological, morphometrical and ecological characteristics. In this contribution, we analysed the hypothesis that, where a large sample size (number of individuals) is available, common morphometrical parameters (valve length, width and stria density) are sufficient to achieve a correct identification to the species level. We focused on some common diatom taxa belonging to the genus Gomphonema. More than 400 valves and frustules were photographed in valve view and measured using Fiji software. Several statistical tools (mixture and discriminant analysis, k-means clustering, classification trees, etc.) were explored to test whether mere morphometry, independently of other valve features, leads to correct identifications, when compared to identifications made by experts. In view of the results obtained, morphometry-based determination in diatom taxonomy is discouragedSIThis work was supported by the Spanish Government under the Aqualitas-retos project (grant number CTM2014-51907-C2-2-R-MINECO

    Random forests with random projections of the output space for high dimensional multi-label classification

    Full text link
    We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

    Hierarchical Classification Using Evolutionary Strategy

    Get PDF
    Hierarchical classification is a problem with applications in many areas as protein function prediction where the dates are hierarchically structured. Therefore, it is necessary the development of algorithms able to induce hierarchical classification models. This paper presents experimenters using the algorithm for hierarchical classification called Hierarchical Classification using Evolutionary Strategy (HC-ES). It was tested in eight datasets the G-Protein-Coupled Receptor (GPCR) and Enzyme Commission Codes (EC). The results are compared with other hierarchical classifier using the distance and hF-Measure

    Performance Improvement in Multi-class Classification via Automated Hierarchy Generation and Exploitation through Extended LCPN Schemes

    Full text link
    Hierarchical classification (HC) plays a pivotal role in multi-class classification tasks, where objects are organized into a hierarchical structure. This study explores the performance of HC through a comprehensive analysis that encompasses both hierarchy generation and hierarchy exploitation. This analysis is particularly relevant in scenarios where a predefined hierarchy structure is not readily accessible. Notably, two novel hierarchy exploitation schemes, LCPN+ and LCPN+F, which extend the capabilities of LCPN and combine the strengths of global and local classification, have been introduced and evaluated alongside existing methods. The findings reveal the consistent superiority of LCPN+F, which outperforms other schemes across various datasets and scenarios. Moreover, this research emphasizes not only effectiveness but also efficiency, as LCPN+ and LCPN+F maintain runtime performance comparable to Flat Classification (FC). Additionally, this study underscores the importance of selecting the right hierarchy exploitation scheme to maximize classification performance. This work extends our understanding of HC and establishes a benchmark for future research, fostering advancements in multi-class classification methodologies

    Diatom identification including life cycle stages through morphological and texture descriptors

    Get PDF
    Diatoms are unicellular algae present almost wherever there is water. Diatom identification has many applications in different fields of study, such as ecology, forensic science, etc. In environmental studies, algae can be used as a natural water quality indicator. The diatom life cycle consists of the set of stages that pass through the successive generations of each species from the initial to the senescent cells. Life cycle modeling is a complex process since in general the distribution of the parameter vectors that represent the variations that occur in this process is non-linear and of high dimensionality. In this paper, we propose to characterize the diatom life cycle by the main features that change during the algae life cycle, mainly the contour shape and the texture. Elliptical Fourier Descriptors (EFD) are used to describe the diatom contour while phase congruency and Gabor filters describe the inner ornamentation of the algae. The proposed method has been tested with a small algae dataset (eight different classes and more than 50 samples per type) using supervised and non-supervised classification techniques obtaining accuracy results up to 99% and 98% respectively

    TLMCM Network for Medical Image Hierarchical Multi-Label Classification

    Full text link
    Medical Image Hierarchical Multi-Label Classification (MI-HMC) is of paramount importance in modern healthcare, presenting two significant challenges: data imbalance and \textit{hierarchy constraint}. Existing solutions involve complex model architecture design or domain-specific preprocessing, demanding considerable expertise or effort in implementation. To address these limitations, this paper proposes Transfer Learning with Maximum Constraint Module (TLMCM) network for the MI-HMC task. The TLMCM network offers a novel approach to overcome the aforementioned challenges, outperforming existing methods based on the Area Under the Average Precision and Recall Curve(AU(PRC)‾AU\overline{(PRC)}) metric. In addition, this research proposes two novel accuracy metrics, EMREMR and HammingAccuracyHammingAccuracy, which have not been extensively explored in the context of the MI-HMC task. Experimental results demonstrate that the TLMCM network achieves high multi-label prediction accuracy(80%80\%-90%90\%) for MI-HMC tasks, making it a valuable contribution to healthcare domain applications

    Coherent Hierarchical Multi-Label Classification Networks

    Full text link
    Hierarchical multi-label classification (HMC) is a challenging classification task extending standard multi-label classification problems by imposing a hierarchy constraint on the classes. In this paper, we propose C-HMCNN(h), a novel approach for HMC problems, which, given a network h for the underlying multi-label classification problem, exploits the hierarchy information in order to produce predictions coherent with the constraint and improve performance. We conduct an extensive experimental analysis showing the superior performance of C-HMCNN(h) when compared to state-of-the-art models.Comment: Neural Information Processing Systems 202
    • …
    corecore