1,142 research outputs found

    Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features

    Full text link
    Recognizing the phases of a laparoscopic surgery (LS) operation form its video constitutes a fundamental step for efficient content representation, indexing and retrieval in surgical video databases. In the literature, most techniques focus on phase segmentation of the entire LS video using hand-crafted visual features, instrument usage signals, and recently convolutional neural networks (CNNs). In this paper we address the problem of phase recognition of short video shots (10s) of the operation, without utilizing information about the preceding/forthcoming video frames, their phase labels or the instruments used. We investigate four state-of-the-art CNN architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature extraction via transfer learning. Visual saliency was employed for selecting the most informative region of the image as input to the CNN. Video shot representation was based on two temporal pooling mechanisms. Most importantly, we investigate the role of 'elapsed time' (from the beginning of the operation), and we show that inclusion of this feature can increase performance dramatically (69% vs. 75% mean accuracy). Finally, a long short-term memory (LSTM) network was trained for video shot classification based on the fusion of CNN features with 'elapsed time', increasing the accuracy to 86%. Our results highlight the prominent role of visual saliency, long-range temporal recursion and 'elapsed time' (a feature so far ignored), for surgical phase recognition.Comment: 6 pages, 4 figures, 6 table

    Saliency-based approaches for multidimensional explainability of deep networks

    Get PDF
    In deep learning, visualization techniques extract the salient patterns exploited by deep networks to perform a task (e.g. image classification) focusing on single images. These methods allow a better understanding of these complex models, empowering the identification of the most informative parts of the input data. Beyond the deep network understanding, visual saliency is useful for many quantitative reasons and applications, both in the 2D and 3D domains, such as the analysis of the generalization capabilities of a classifier and autonomous navigation. In this thesis, we describe an approach to cope with the interpretability problem of a convolutional neural network and propose our ideas on how to exploit the visualization for applications like image classification and active object recognition. After a brief overview on common visualization methods producing attention/saliency maps, we will address two separate points: firstly, we will describe how visual saliency can be effectively used in the 2D domain (e.g. RGB images) to boost image classification performances: as a matter of fact, visual summaries, i.e. a compact representation of an ensemble of saliency maps, can be used to improve the classification accuracy of a network through summary-driven specializations. Then, we will present a 3D active recognition system that allows to consider different views of a target object, overcoming the single-view hypothesis of classical object recognition, making the classification problem much easier in principle. Here we adopt such attention maps in a quantitative fashion, by building a 3D dense saliency volume which fuses together saliency maps obtained from different viewpoints, obtaining a continuous proxy on which parts of an object are more discriminative for a given classifier. Finally, we will show how to inject this representations in a real world application, so that an agent (e.g. robot) can move knowing the capabilities of its classifier

    Towards Greener Solutions for Steering Angle Prediction

    Full text link
    In this paper, we investigate the two most popular families of deep neural architectures (i.e., ResNets and Inception nets) for the autonomous driving task of steering angle prediction. This work provides preliminary evidence that Inception architectures can perform as well or better than ResNet architectures with less complexity for the autonomous driving task. Primary motivation includes support for further research in smaller, more efficient neural network architectures such that can not only accomplish complex tasks, such as steering angle predictions, but also produce less carbon emissions, or, more succinctly, neural networks that are more environmentally friendly. We look at various sizes of ResNet and InceptionNet models to compare results. Our derived models can achieve state-of-the-art results in terms of steering angle MSE

    Machine Learning for Multimedia Communications

    Get PDF
    Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise

    U-Capkidnets++-: A Novel Hybrid Capsule Networks with Optimized Deep Feed Forward Networks for an Effective Classification of Kidney Tumours Using CT Kidney Images

    Get PDF
    Chronic Kidney Diseases (CKD) has become one among the world wide health crisis and needs the associated efforts to prevent the complete organ damage. A considerable research effort has been put forward onto the effective seperation and classification of kidney tumors from the kidney CT Images. Emerging machine learning along with deep learning algorithms have waved the novel paths of tumor detections. But these methods are proved to be laborious and its success rate is purely depends on the previous experiences. To achieve the better classification and segmentation of tumors, this paper proposes the hybrid ensemble of visual capsule networks in U-NET deep learning architecture and w deep feed-forward extreme learning machines. The proposed framework incorporates the data-preprocessing powerful data augmentation, saliency tumor segmentation (STS) followed by the classification phase. Furthermore, classification levels are constructed based upon the feed forward extreme learning machines (FFELM) to enhance the effectiveness of the suggested model .The extensive experimentation has been conducted to evaluate the efficacy of the recommended structure and matched with the other prevailing hybrid deep learning model. Experimentation demonstrates that the suggested model has showed the superior predominance over the other models and exhibited DICE co-efficient of kidney tumors as high as 0.96 and accuracy of 97.5 %respectively

    ICKSC :An Efficient Methodology for Predicting Kidney Stone From CT Kidney Image Dataset using Conventional Neural Networks

    Get PDF
    Chronic Kidney Diseases (CKD) has become one among the world wide health crisis and needs the associated efforts to prevent the complete organ damage. A considerable research effort has been put forward onto the effective separation and classification of kidney Stones from the kidney CT Images. Emerging machine learning along with deep learning algorithms have waved the novel paths of kidney stone detections. But these methods are proved to be laborious and its success rate is purely depends on the previous experiences. To achieve the better classification of kidney stone, this paper proposes a novel Intelligent CNN based Kidney Stone Classification (ICKSC) system which is based on transfer learning mechanism and incorporates 8 Layered CNN, densenet169_model, mobilenetv2_model, vgg19_model and xception_model. The extensive experimentation has been conducted to evaluate the efficacy of the recommended structure and matched with the other prevailing hybrid deep learning model. Experimentation demonstrates that the suggested model has showed the superior predominance over the other models and exhibited better performance in terms of training loss, accuracy, recall and precision
    corecore