331 research outputs found
Why Deep Surgical Models Fail?: Revisiting Surgical Action Triplet Recognition through the Lens of Robustness
Surgical action triplet recognition provides a better understanding of the
surgical scene. This task is of high relevance as it provides to the surgeon
with context-aware support and safety. The current go-to strategy for improving
performance is the development of new network mechanisms. However, the
performance of current state-of-the-art techniques is substantially lower than
other surgical tasks. Why is this happening? This is the question that we
address in this work. We present the first study to understand the failure of
existing deep learning models through the lens of robustness and explainabilty.
Firstly, we study current existing models under weak and strong
perturbations via adversarial optimisation scheme. We then provide the
failure modes via feature based explanations. Our study revels that the key for
improving performance and increasing reliability is in the core and spurious
attributes. Our work opens the door to more trustworthiness and reliability
deep learning models in surgical science
Data-efficient deep representation learning
Current deep learning methods succeed in many data-intensive applications, but they are still not able to produce robust performance due to the lack of training samples. To investigate how to improve the performance of deep learning paradigms when training samples are limited, data-efficient deep representation learning (DDRL) is proposed in this study. DDRL as a sub area of representation learning mainly addresses the following problem: How can the performance of a deep learning method be maintained when the number of training samples is significantly reduced? This is vital for many applications where collecting data is highly costly, such as medical image analysis. Incorporating a certain kind of prior knowledge into the learning paradigm is key to achieving data efficiency.
Deep learning as a sub-area of machine learning can be divided into three parts (locations) in its learning process, namely Data, Optimisation and Model. Integrating prior knowledge into these three locations is expected to bring data efficiency into a learning paradigm, which can dramatically increase the model performance under the condition of limited training data.
In this thesis, we aim to develop novel deep learning methods for achieving data-efficient training, each of which integrates a certain kind of prior knowledge into three different locations respectively. We make the following contributions. First, we propose an iterative solution based on deep learning for medical image segmentation tasks, where dynamical systems are integrated into the segmentation labels in order to improve both performance and data efficiency. The proposed method not only shows a superior performance and better data efficiency compared to the state-of-the-art methods, but also has better interpretability and rotational invariance which are desired for medical imagining applications. Second, we propose a novel training framework which adaptively selects more informative samples for training during the optimization process. The adaptive selection or sampling is performed based on a hardness-aware strategy in the latent space constructed by a generative model.
We show that the proposed framework outperforms a random sampling method, which demonstrates effectiveness of the proposed framework. Thirdly, we propose a deep neural network model which produces the segmentation maps in a coarse-to-fine manner. The proposed architecture is a sequence of computational blocks containing a number of convolutional layers in which each block provides its successive block with a coarser segmentation map as a reference. Such mechanisms enable us to train the network with limited training samples and produce more interpretable results.Open Acces
Example-based explanations with adversarial attacks for respiratory sound analysis
Respiratory sound classification is an important tool for remote screening of respiratory-related diseases such as pneumonia, asthma, and COVID-19. To facilitate the interpretability of classification results, especially ones based on deep learning, many explanation methods have been proposed using prototypes. However, existing explanation techniques often assume that the data is non-biased and the prediction results can be explained by a set of prototypical examples. In this work, we develop a unified example-based explanation method for selecting both representative data (prototypes) and outliers (criticisms). In particular, we propose a novel application of adversarial attacks to generate an explanation spectrum of data instances via an iterative fast gradient sign method. Such unified explanation can avoid over-generalisation and bias by allowing human experts to assess the model mistakes case by case. We performed a wide range of quantitative and qualitative evaluations to show that our approach generates effective and understandable explanation and is robust with many deep learning models
Data Analytics for Uncovering Fraudulent Behaviour in Elite Sports
Sports officials around the world are facing societal challenges due to the unfair nature of fraudulent practices performed by unscrupulous athletes. Recently, sample swapping has been raised as a potential practice where some athletes exchange their doped sample with a clean one to evade a positive test. The current detection method for such cases includes laboratory testing like DNA analysis. However, these methods are costly and time-consuming, which goes beyond the budgetary limits of anti-doping organisations. Therefore, there is a need to explore alternative methods to improve decision-making. We presented a data analytical methodology that supports anti-doping decision-makers on the task of athlete disambiguation. Our proposed model helps identify the swapped sample, which outperforms the current state-of-the-art method and different baseline models. The evaluation on real-world sample swapping cases shows promising results that help advance the research on the application of data analytics in the context of anti-doping analysis
Learning video embedding space with Natural Language Supervision
The recent success of the CLIP model has shown its potential to be applied to
a wide range of vision and language tasks. However this only establishes
embedding space relationship of language to images, not to the video domain. In
this paper, we propose a novel approach to map video embedding space to natural
langugage. We propose a two-stage approach that first extracts visual features
from each frame of a video using a pre-trained CNN, and then uses the CLIP
model to encode the visual features for the video domain, along with the
corresponding text descriptions. We evaluate our method on two benchmark
datasets, UCF101 and HMDB51, and achieve state-of-the-art performance on both
tasks
Towards markerless orthopaedic navigation with intuitive Optical See-through Head-mounted displays
The potential of image-guided orthopaedic navigation to improve surgical outcomes has been well-recognised during the last two decades. According to the tracked pose of target bone, the anatomical information and preoperative plans are updated and displayed to surgeons, so that they can follow the guidance to reach the goal with higher accuracy, efficiency and reproducibility. Despite their success, current orthopaedic navigation systems have two main limitations: for target tracking, artificial markers have to be drilled into the bone and calibrated manually to the bone, which introduces the risk of additional harm to patients and increases operating complexity; for guidance visualisation, surgeons have to shift their attention from the patient to an external 2D monitor, which is disruptive and can be mentally stressful.
Motivated by these limitations, this thesis explores the development of an intuitive, compact and reliable navigation system for orthopaedic surgery. To this end, conventional marker-based tracking is replaced by a novel markerless tracking algorithm, and the 2D display is replaced by a 3D holographic Optical see-through (OST) Head-mounted display (HMD) precisely calibrated to a user's perspective.
Our markerless tracking, facilitated by a commercial RGBD camera, is achieved through deep learning-based bone segmentation followed by real-time pose registration. For robust segmentation, a new network is designed and efficiently augmented by a synthetic dataset. Our segmentation network outperforms the state-of-the-art regarding occlusion-robustness, device-agnostic behaviour, and target generalisability. For reliable pose registration, a novel Bounded Iterative Closest Point (BICP) workflow is proposed. The improved markerless tracking can achieve a clinically acceptable error of 0.95 deg and 2.17 mm according to a phantom test.
OST displays allow ubiquitous enrichment of perceived real world with contextually blended virtual aids through semi-transparent glasses. They have been recognised as a suitable visual tool for surgical assistance, since they do not hinder the surgeon's natural eyesight and require no attention shift or perspective conversion. The OST calibration is crucial to ensure locational-coherent surgical guidance.
Current calibration methods are either human error-prone or hardly applicable to commercial devices. To this end, we propose an offline camera-based calibration method that is highly accurate yet easy to implement in commercial products, and an online alignment-based refinement that is user-centric and robust against user error. The proposed methods are proven to be superior to other similar State-of-
the-art (SOTA)s regarding calibration convenience and display accuracy.
Motivated by the ambition to develop the world's first markerless OST navigation system, we integrated the developed markerless tracking and calibration scheme into a complete navigation workflow designed for femur drilling tasks during knee replacement surgery. We verify the usability of our designed OST system with an experienced orthopaedic surgeon by a cadaver study. Our test validates the potential of the proposed markerless navigation system for surgical assistance, although further improvement is required for clinical acceptance.Open Acces
BridgeHand2Vec Bridge Hand Representation
Contract bridge is a game characterized by incomplete information, posing an
exciting challenge for artificial intelligence methods. This paper proposes the
BridgeHand2Vec approach, which leverages a neural network to embed a bridge
player's hand (consisting of 13 cards) into a vector space. The resulting
representation reflects the strength of the hand in the game and enables
interpretable distances to be determined between different hands. This
representation is derived by training a neural network to estimate the number
of tricks that a pair of players can take. In the remainder of this paper, we
analyze the properties of the resulting vector space and provide examples of
its application in reinforcement learning, and opening bid classification.
Although this was not our main goal, the neural network used for the
vectorization achieves SOTA results on the DDBP2 problem (estimating the number
of tricks for two given hands)
- …