1,770 research outputs found
Pattern classification approaches for breast cancer identification via MRI: state‐of‐the‐art and vision for the future
Mining algorithms for Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCEMRI)
of breast tissue are discussed. The algorithms are based on recent advances in multidimensional
signal processing and aim to advance current state‐of‐the‐art computer‐aided detection
and analysis of breast tumours when these are observed at various states of development. The topics
discussed include image feature extraction, information fusion using radiomics, multi‐parametric
computer‐aided classification and diagnosis using information fusion of tensorial datasets as well
as Clifford algebra based classification approaches and convolutional neural network deep learning
methodologies. The discussion also extends to semi‐supervised deep learning and self‐supervised
strategies as well as generative adversarial networks and algorithms using generated
confrontational learning approaches. In order to address the problem of weakly labelled tumour
images, generative adversarial deep learning strategies are considered for the classification of
different tumour types. The proposed data fusion approaches provide a novel Artificial Intelligence
(AI) based framework for more robust image registration that can potentially advance the early
identification of heterogeneous tumour types, even when the associated imaged organs are
registered as separate entities embedded in more complex geometric spaces. Finally, the general
structure of a high‐dimensional medical imaging analysis platform that is based on multi‐task
detection and learning is proposed as a way forward. The proposed algorithm makes use of novel
loss functions that form the building blocks for a generated confrontation learning methodology
that can be used for tensorial DCE‐MRI. Since some of the approaches discussed are also based on
time‐lapse imaging, conclusions on the rate of proliferation of the disease can be made possible. The
proposed framework can potentially reduce the costs associated with the interpretation of medical
images by providing automated, faster and more consistent diagnosis
Unsupervised learning of generative topic saliency for person re-identification
(c) 2014. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.© 2014. The copyright of this document resides with its authors. Existing approaches to person re-identification (re-id) are dominated by supervised learning based methods which focus on learning optimal similarity distance metrics. However, supervised learning based models require a large number of manually labelled pairs of person images across every pair of camera views. This thus limits their ability to scale to large camera networks. To overcome this problem, this paper proposes a novel unsupervised re-id modelling approach by exploring generative probabilistic topic modelling. Given abundant unlabelled data, our topic model learns to simultaneously both (1) discover localised person foreground appearance saliency (salient image patches) that are more informative for re-id matching, and (2) remove busy background clutters surrounding a person. Extensive experiments are carried out to demonstrate that the proposed model outperforms existing unsupervised learning re-id methods with significantly simplified model complexity. In the meantime, it still retains comparable re-id accuracy when compared to the state-of-the-art supervised re-id methods but without any need for pair-wise labelled training data
Local wavelet features for statistical object classification and localisation
This article presents a system for texture-based
probabilistic classification and localisation of 3D objects in 2D digital images and discusses selected applications. The objects are described by local feature vectors computed using the wavelet transform. In the training phase, object features are statistically modelled as normal density functions. In the recognition phase, a maximisation algorithm compares the learned density functions
with the feature vectors extracted from a real scene and yields the classes and poses of objects found in it. Experiments carried out on a real dataset of over 40000 images demonstrate the robustness of the system in terms of classification and localisation accuracy. Finally, two important application scenarios are discussed, namely classification of museum artefacts and classification of
metallography images
Advances in deep learning methods for pavement surface crack detection and identification with visible light visual images
Compared to NDT and health monitoring method for cracks in engineering
structures, surface crack detection or identification based on visible light
images is non-contact, with the advantages of fast speed, low cost and high
precision. Firstly, typical pavement (concrete also) crack public data sets
were collected, and the characteristics of sample images as well as the random
variable factors, including environmental, noise and interference etc., were
summarized. Subsequently, the advantages and disadvantages of three main crack
identification methods (i.e., hand-crafted feature engineering, machine
learning, deep learning) were compared. Finally, from the aspects of model
architecture, testing performance and predicting effectiveness, the development
and progress of typical deep learning models, including self-built CNN,
transfer learning(TL) and encoder-decoder(ED), which can be easily deployed on
embedded platform, were reviewed. The benchmark test shows that: 1) It has been
able to realize real-time pixel-level crack identification on embedded
platform: the entire crack detection average time cost of an image sample is
less than 100ms, either using the ED method (i.e., FPCNet) or the TL method
based on InceptionV3. It can be reduced to less than 10ms with TL method based
on MobileNet (a lightweight backbone base network). 2) In terms of accuracy, it
can reach over 99.8% on CCIC which is easily identified by human eyes. On
SDNET2018, some samples of which are difficult to be identified, FPCNet can
reach 97.5%, while TL method is close to 96.1%.
To the best of our knowledge, this paper for the first time comprehensively
summarizes the pavement crack public data sets, and the performance and
effectiveness of surface crack detection and identification deep learning
methods for embedded platform, are reviewed and evaluated.Comment: 15 pages, 14 figures, 11 table
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions
Head-pose estimation has many applications, such as social event analysis,
human-robot and human-computer interaction, driving assistance, and so forth.
Head-pose estimation is challenging because it must cope with changing
illumination conditions, variabilities in face orientation and in appearance,
partial occlusions of facial landmarks, as well as bounding-box-to-face
alignment errors. We propose tu use a mixture of linear regressions with
partially-latent output. This regression method learns to map high-dimensional
feature vectors (extracted from bounding boxes of faces) onto the joint space
of head-pose angles and bounding-box shifts, such that they are robustly
predicted in the presence of unobservable phenomena. We describe in detail the
mapping method that combines the merits of unsupervised manifold learning
techniques and of mixtures of regressions. We validate our method with three
publicly available datasets and we thoroughly benchmark four variants of the
proposed algorithm with several state-of-the-art head-pose estimation methods.Comment: 12 pages, 5 figures, 3 table
Machine learning for automatic analysis of affective behaviour
The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task.
The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances.
The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise.
Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces
- …