872 research outputs found

    Gesture Recognition in Robotic Surgery: a Review

    Get PDF
    OBJECTIVE: Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. METHODS: An article search was performed on 5 bibliographic databases with combinations of the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. RESULTS: A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. CONCLUSION: The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. SIGNIFICANCE: This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field

    Automatic Segmentation of Cells of Different Types in Fluorescence Microscopy Images

    Get PDF
    Recognition of different cell compartments, types of cells, and their interactions is a critical aspect of quantitative cell biology. This provides a valuable insight for understanding cellular and subcellular interactions and mechanisms of biological processes, such as cancer cell dissemination, organ development and wound healing. Quantitative analysis of cell images is also the mainstay of numerous clinical diagnostic and grading procedures, for example in cancer, immunological, infectious, heart and lung disease. Computer automation of cellular biological samples quantification requires segmenting different cellular and sub-cellular structures in microscopy images. However, automating this problem has proven to be non-trivial, and requires solving multi-class image segmentation tasks that are challenging owing to the high similarity of objects from different classes and irregularly shaped structures. This thesis focuses on the development and application of probabilistic graphical models to multi-class cell segmentation. Graphical models can improve the segmentation accuracy by their ability to exploit prior knowledge and model inter-class dependencies. Directed acyclic graphs, such as trees have been widely used to model top-down statistical dependencies as a prior for improved image segmentation. However, using trees, a few inter-class constraints can be captured. To overcome this limitation, polytree graphical models are proposed in this thesis that capture label proximity relations more naturally compared to tree-based approaches. Polytrees can effectively impose the prior knowledge on the inclusion of different classes by capturing both same-level and across-level dependencies. A novel recursive mechanism based on two-pass message passing is developed to efficiently calculate closed form posteriors of graph nodes on polytrees. Furthermore, since an accurate and sufficiently large ground truth is not always available for training segmentation algorithms, a weakly supervised framework is developed to employ polytrees for multi-class segmentation that reduces the need for training with the aid of modeling the prior knowledge during segmentation. Generating a hierarchical graph for the superpixels in the image, labels of nodes are inferred through a novel efficient message-passing algorithm and the model parameters are optimized with Expectation Maximization (EM). Results of evaluation on the segmentation of simulated data and multiple publicly available fluorescence microscopy datasets indicate the outperformance of the proposed method compared to state-of-the-art. The proposed method has also been assessed in predicting the possible segmentation error and has been shown to outperform trees. This can pave the way to calculate uncertainty measures on the resulting segmentation and guide subsequent segmentation refinement, which can be useful in the development of an interactive segmentation framework

    Computer-Assisted Algorithms for Ultrasound Imaging Systems

    Get PDF
    Ultrasound imaging works on the principle of transmitting ultrasound waves into the body and reconstructs the images of internal organs based on the strength of the echoes. Ultrasound imaging is considered to be safer, economical and can image the organs in real-time, which makes it widely used diagnostic imaging modality in health-care. Ultrasound imaging covers the broad spectrum of medical diagnostics; these include diagnosis of kidney, liver, pancreas, fetal monitoring, etc. Currently, the diagnosis through ultrasound scanning is clinic-centered, and the patients who are in need of ultrasound scanning has to visit the hospitals for getting the diagnosis. The services of an ultrasound system are constrained to hospitals and did not translate to its potential in remote health-care and point-of-care diagnostics due to its high form factor, shortage of sonographers, low signal to noise ratio, high diagnostic subjectivity, etc. In this thesis, we address these issues with an objective of making ultrasound imaging more reliable to use in point-of-care and remote health-care applications. To achieve the goal, we propose (i) computer-assisted algorithms to improve diagnostic accuracy and assist semi-skilled persons in scanning, (ii) speckle suppression algorithms to improve the diagnostic quality of ultrasound image, (iii) a reliable telesonography framework to address the shortage of sonographers, and (iv) a programmable portable ultrasound scanner to operate in point-of-care and remote health-care applications

    When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs)

    Get PDF
    Registration is the process that computes the transformation that aligns sets of data. Commonly, a registration process can be divided into four main steps: target selection, feature extraction, feature matching, and transform computation for the alignment. The accuracy of the result depends on multiple factors, the most significant are the quantity of input data, the presence of noise, outliers and occlusions, the quality of the extracted features, real-time requirements and the type of transformation, especially those ones defined by multiple parameters, like non-rigid deformations. Recent advancements in machine learning could be a turning point in these issues, particularly with the development of deep learning (DL) techniques, which are helping to improve multiple computer vision problems through an abstract understanding of the input data. In this paper, a review of deep learning-based registration methods is presented. We classify the different papers proposing a framework extracted from the traditional registration pipeline to analyse the new learning-based proposal strengths. Deep Registration Networks (DRNs) try to solve the alignment task either replacing part of the traditional pipeline with a network or fully solving the registration problem. The main conclusions extracted are, on the one hand, 1) learning-based registration techniques cannot always be clearly classified in the traditional pipeline. 2) These approaches allow more complex inputs like conceptual models as well as the traditional 3D datasets. 3) In spite of the generality of learning, the current proposals are still ad hoc solutions. Finally, 4) this is a young topic that still requires a large effort to reach general solutions able to cope with the problems that affect traditional approaches.Comment: Submitted to Pattern Recognitio

    Advanced Human Activity Recognition through Data Augmentation and Feature Concatenation of Micro-Doppler Signatures

    Get PDF
    Developing accurate classification models for radar-based Human Activity Recognition (HAR), capable of solving real-world problems, depends heavily on the amount of available data. In this paper, we propose a simple, effective, and generalizable data augmentation strategy along with preprocessing for micro-Doppler signatures to enhance recognition performance. By leveraging the decomposition properties of the Discrete Wavelet Transform (DWT), new samples are generated with distinct characteristics that do not overlap with those of the original samples. The micro-Doppler signatures are projected onto the DWT space for the decomposition process using the Haar wavelet. The returned decomposition components are used in different configurations to generate new data. Three new samples are obtained from a single spectrogram, which increases the amount of training data without creating duplicates. Next, the augmented samples are processed using the Sobel filter. This step allows each sample to be expanded into three representations, including the gradient in the x-direction (Dx), y-direction (Dy), and both x- and y-directions (Dxy). These representations are used as input for training a three-input convolutional neural network-long short-term memory support vector machine (CNN-LSTM-SVM) model. We have assessed the feasibility of our solution by evaluating it on three datasets containing micro-Doppler signatures of human activities, including Frequency Modulated Continuous Wave (FMCW) 77 GHz, FMCW 24 GHz, and Impulse Radio Ultra-Wide Band (IR-UWB) 10 GHz datasets. Several experiments have been carried out to evaluate the model\u27s performance with the inclusion of additional samples. The model was trained from scratch only on the augmented samples and tested on the original samples. Our augmentation approach has been thoroughly evaluated using various metrics, including accuracy, precision, recall, and F1-score. The results demonstrate a substantial improvement in the recognition rate and effectively alleviate the overfitting effect. Accuracies of 96.47%, 94.27%, and 98.18% are obtained for the FMCW 77 GHz, FMCW 24 GHz, and IR- UWB 10 GHz datasets, respectively. The findings of the study demonstrate the utility of DWT to enrich micro-Doppler training samples to improve HAR performance. Furthermore, the processing step was found to be efficient in enhancing the classification accuracy, achieving 96.78%, 96.32%, and 100% for the FMCW 77 GHz, FMCW 24 GHz, and IR-UWB 10 GHz datasets, respectively

    Application of Analogical Reasoning for Use in Visual Knowledge Extraction

    Get PDF
    There is a continual push to make Artificial Intelligence (AI) as human-like as possible; however, this is a difficult task because of its inability to learn beyond its current comprehension. Analogical reasoning (AR) has been proposed as one method to achieve this goal. Current literature lacks a technical comparison on psychologically-inspired and natural-language-processing-produced AR algorithms with consistent metrics on multiple-choice word-based analogy problems. Assessment is based on “correctness” and “goodness” metrics. There is not a one-size-fits-all algorithm for all textual problems. As contribution in visual AR, a convolutional neural network (CNN) is integrated with the AR vector space model, Global Vectors (GloVe), in the proposed, Image Recognition Through Analogical Reasoning Algorithm (IRTARA). Given images outside of the CNN’s training data, IRTARA produces contextual information by leveraging semantic information from GloVe. IRTARA’s quality of results is measured by definition, AR, and human factors evaluation methods, which saw consistency at the extreme ends. The research shows the potential for AR to facilitate more a human-like AI through its ability to understand concepts beyond its foundational knowledge in both a textual and visual problem space
    corecore