125 research outputs found
Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification
Semi-supervised learning (SSL) is a common approach to learning predictive
models using not only labeled examples, but also unlabeled examples. While SSL
for the simple tasks of classification and regression has received a lot of
attention from the research community, this is not properly investigated for
complex prediction tasks with structurally dependent variables. This is the
case of multi-label classification and hierarchical multi-label classification
tasks, which may require additional information, possibly coming from the
underlying distribution in the descriptive space provided by unlabeled
examples, to better face the challenging task of predicting simultaneously
multiple class labels.
In this paper, we investigate this aspect and propose a (hierarchical)
multi-label classification method based on semi-supervised learning of
predictive clustering trees. We also extend the method towards ensemble
learning and propose a method based on the random forest approach. Extensive
experimental evaluation conducted on 23 datasets shows significant advantages
of the proposed method and its extension with respect to their supervised
counterparts. Moreover, the method preserves interpretability and reduces the
time complexity of classical tree-based models
Disentangling diatom species complexes: does morphometry suffice?
[EN] Accurate taxonomic resolution in light microscopy analyses of microalgae is essential to achieve high quality, comparable results in both floristic analyses and biomonitoring studies. A number of closely related diatom taxa have been detected to date co-occurring within benthic diatom assemblages, sharing many morphological, morphometrical and ecological characteristics. In this contribution, we analysed the hypothesis that, where a large sample size (number of individuals) is available, common morphometrical parameters (valve length, width and stria density) are sufficient to achieve a correct identification to the species level. We focused on some common diatom taxa belonging to the genus Gomphonema. More than 400 valves and frustules were photographed in valve view and measured using Fiji software. Several statistical tools (mixture and discriminant analysis, k-means clustering, classification trees, etc.) were explored to test whether mere morphometry, independently of other valve features, leads to correct identifications, when compared to identifications made by experts. In view of the results obtained, morphometry-based determination in diatom taxonomy is discouragedSIThis work was supported by the Spanish Government under the Aqualitas-retos project (grant number CTM2014-51907-C2-2-R-MINECO
Random forests with random projections of the output space for high dimensional multi-label classification
We adapt the idea of random projections applied to the output space, so as to
enhance tree-based ensemble methods in the context of multi-label
classification. We show how learning time complexity can be reduced without
affecting computational complexity and accuracy of predictions. We also show
that random output space projections may be used in order to reach different
bias-variance tradeoffs, over a broad panel of benchmark problems, and that
this may lead to improved accuracy while reducing significantly the
computational burden of the learning stage
Hierarchical Classification Using Evolutionary Strategy
Hierarchical classification is a problem with applications in many areas as protein function prediction where the dates are hierarchically structured. Therefore, it is necessary the development of algorithms able to induce hierarchical classification models. This paper presents experimenters using the algorithm for hierarchical classification called Hierarchical Classification using Evolutionary Strategy (HC-ES). It was tested in eight datasets the G-Protein-Coupled Receptor (GPCR) and Enzyme Commission Codes (EC). The results are compared with other hierarchical classifier using the distance and hF-Measure
Performance Improvement in Multi-class Classification via Automated Hierarchy Generation and Exploitation through Extended LCPN Schemes
Hierarchical classification (HC) plays a pivotal role in multi-class
classification tasks, where objects are organized into a hierarchical
structure. This study explores the performance of HC through a comprehensive
analysis that encompasses both hierarchy generation and hierarchy exploitation.
This analysis is particularly relevant in scenarios where a predefined
hierarchy structure is not readily accessible. Notably, two novel hierarchy
exploitation schemes, LCPN+ and LCPN+F, which extend the capabilities of LCPN
and combine the strengths of global and local classification, have been
introduced and evaluated alongside existing methods. The findings reveal the
consistent superiority of LCPN+F, which outperforms other schemes across
various datasets and scenarios. Moreover, this research emphasizes not only
effectiveness but also efficiency, as LCPN+ and LCPN+F maintain runtime
performance comparable to Flat Classification (FC). Additionally, this study
underscores the importance of selecting the right hierarchy exploitation scheme
to maximize classification performance. This work extends our understanding of
HC and establishes a benchmark for future research, fostering advancements in
multi-class classification methodologies
Diatom identification including life cycle stages through morphological and texture descriptors
Diatoms are unicellular algae present almost wherever there is water. Diatom identification has many applications in different fields of study, such as ecology, forensic science, etc. In environmental studies, algae can be used as a natural water quality indicator. The diatom life cycle consists of the set of stages that pass through the successive generations of each species from the initial to the senescent cells. Life cycle modeling is a complex process since in general the distribution of the parameter vectors that represent the variations that occur in this process is non-linear and of high dimensionality. In this paper, we propose to characterize the diatom life cycle by the main features that change during the algae life cycle, mainly the contour shape and the texture. Elliptical Fourier Descriptors (EFD) are used to describe the diatom contour while phase congruency and Gabor filters describe the inner ornamentation of the algae. The proposed method has been tested with a small algae dataset (eight different classes and more than 50 samples per type) using supervised and non-supervised classification techniques obtaining accuracy results up to 99% and 98% respectively
TLMCM Network for Medical Image Hierarchical Multi-Label Classification
Medical Image Hierarchical Multi-Label Classification (MI-HMC) is of
paramount importance in modern healthcare, presenting two significant
challenges: data imbalance and \textit{hierarchy constraint}. Existing
solutions involve complex model architecture design or domain-specific
preprocessing, demanding considerable expertise or effort in implementation. To
address these limitations, this paper proposes Transfer Learning with Maximum
Constraint Module (TLMCM) network for the MI-HMC task. The TLMCM network offers
a novel approach to overcome the aforementioned challenges, outperforming
existing methods based on the Area Under the Average Precision and Recall
Curve() metric. In addition, this research proposes two
novel accuracy metrics, and , which have not been
extensively explored in the context of the MI-HMC task. Experimental results
demonstrate that the TLMCM network achieves high multi-label prediction
accuracy(-) for MI-HMC tasks, making it a valuable contribution to
healthcare domain applications
Coherent Hierarchical Multi-Label Classification Networks
Hierarchical multi-label classification (HMC) is a challenging classification
task extending standard multi-label classification problems by imposing a
hierarchy constraint on the classes. In this paper, we propose C-HMCNN(h), a
novel approach for HMC problems, which, given a network h for the underlying
multi-label classification problem, exploits the hierarchy information in order
to produce predictions coherent with the constraint and improve performance. We
conduct an extensive experimental analysis showing the superior performance of
C-HMCNN(h) when compared to state-of-the-art models.Comment: Neural Information Processing Systems 202
- …