30,154 research outputs found
Distribution Matching : Semi-Supervised Feature Selection for Biased Labelled Data
In the context of data science and machine learning, feature selection is a widely used technique that focuses on reducing the dimensionality of a dataset. It is commonly used to improve model accuracy by preventing data redundancy and over-fitting, but can also be beneficial in applications such as data compression. The majority of feature selection techniques rely on labelled data. In many real-world scenarios, however, data is only partially labelled and thus requires so-called semi-supervised techniques, which can utilise both labelled and unlabelled data. While unlabelled data is often obtainable in abundance, labelled datasets are smaller and potentially biased. This thesis presents a method called distribution matching, which offers a way to do feature selection in a semi-supervised setup. Distribution matching is a wrapper method, which trains models to select features that best affect model accuracy. It addresses the problem of biased labelled data directly by incorporating unlabelled data into a cost function which approximates expected loss on unseen data. In experiments, the method is shown to successfully minimise the expected loss transparently on a synthetic dataset. Additionally, a comparison with related methods is performed on a more complex EMNIST dataset
Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow
We propose a method to classify cardiac pathology based on a novel approach
to extract image derived features to characterize the shape and motion of the
heart. An original semi-supervised learning procedure, which makes efficient
use of a large amount of non-segmented images and a small amount of images
segmented manually by experts, is developed to generate pixel-wise apparent
flow between two time points of a 2D+t cine MRI image sequence. Combining the
apparent flow maps and cardiac segmentation masks, we obtain a local apparent
flow corresponding to the 2D motion of myocardium and ventricular cavities.
This leads to the generation of time series of the radius and thickness of
myocardial segments to represent cardiac motion. These time series of motion
features are reliable and explainable characteristics of pathological cardiac
motion. Furthermore, they are combined with shape-related features to classify
cardiac pathologies. Using only nine feature values as input, we propose an
explainable, simple and flexible model for pathology classification. On ACDC
training set and testing set, the model achieves 95% and 94% respectively as
classification accuracy. Its performance is hence comparable to that of the
state-of-the-art. Comparison with various other models is performed to outline
some advantages of our model
- …