1,883 research outputs found

    Downstream Task Self-Supervised Learning for Object Recognition and Tracking

    Get PDF
    This dissertation addresses three limitations of deep learning methods in image and video understanding-based machine vision applications. Firstly, although deep convolutional neural networks (CNNs) are efficient for image recognition applications such as object detection and segmentation, they perform poorly under perspective distortions. In real-world applications, the camera perspective is a common problem that we can address by annotating large amounts of data, thus limiting the applicability of the deep learning models. Secondly, the typical approach for single-camera tracking problems is to use separate motion and appearance models, which are expensive in terms of computations and training data requirements. Finally, conventional multi-camera video understanding techniques use supervised learning algorithms to determine temporal relationships among objects. In large-scale applications, these methods are also limited by the requirement of extensive manually annotated data and computational resources.To address these limitations, we develop an uncertainty-aware self-supervised learning (SSL) technique that captures a model\u27s instance or semantic segmentation uncertainty from overhead images and guides the model to learn the impact of the new perspective on object appearance. The test-time data augmentation-based pseudo-label refinement technique continuously trains a model until convergence on new perspective images. The proposed method can be applied for both self-supervision and semi-supervision, thus increasing the effectiveness of a deep pre-trained model in new domains. Extensive experiments demonstrate the effectiveness of the SSL technique in both object detection and semantic segmentation problems. In video understanding applications, we introduce simultaneous segmentation and tracking as an unsupervised spatio-temporal latent feature clustering problem. The jointly learned multi-task features leverage the task-dependent uncertainty to generate discriminative features in multi-object videos. Experiments have shown that the proposed tracker outperforms several state-of-the-art supervised methods. Finally, we proposed an unsupervised multi-camera tracklet association (MCTA) algorithm to track multiple objects in real-time. MCTA leverages the self-supervised detector model for single-camera tracking and solves the multi-camera tracking problem using multiple pair-wise camera associations modeled as a connected graph. The graph optimization method generates a global solution for partially or fully overlapping camera networks

    MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features

    Full text link
    Identifying moving objects is an essential capability for autonomous systems, as it provides critical information for pose estimation, navigation, collision avoidance, and static map construction. In this paper, we present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation, which segments moving objects with appearance and motion features in the bird's eye view (BEV) domain. Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency. Specifically, we learn appearance features with a simplified PointNet and compute motion features through the height differences of consecutive frames of point clouds projected onto vertical columns in the polar BEV coordinate system. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the spatio-temporal information from appearance and motion features. Our approach achieves state-of-the-art performance on the SemanticKITTI-MOS benchmark. Furthermore, to demonstrate the practical effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a solid-state LiDAR, which features non-repetitive scanning patterns and a small field of view

    Using selfsupervised algorithms for video analysis and scene detection

    Get PDF
    With the increasing available audiovisual content, well-ordered and effective management of video is desired, and therefore, automatic, and accurate solutions for video indexing and retrieval are needed. Self-supervised learning algorithms with 3D convolutional neural networks are a promising solution for these tasks, thanks to its independence from human-annotations and its suitability to identify spatio-temporal features. This work presents a self-supervised algorithm for the analysis of video shots, accomplished by a two-stage implementation: 1- An algorithm that generates pseudo-labels for 20-frame samples with different automatically generated shot transitions (Hardcuts/Cropcuts, Dissolves, Fades in/out, Wipes) and 2- A fully convolutional 3D trained network with an overall achieved accuracy greater than 97% in the testing set. The model implemented is based in [5], improving the detection of large smooth transitions by implementing a larger temporal context. The transitions detected occur centered in the 10th and 11th frames of a 20-frame input window

    Learning spatio-temporal representations with a dual-stream 3-D residual network for nondriving activity recognition

    Get PDF
    Accurate recognition of non-driving activity (NDA) is important for the design of intelligent Human Machine Interface to achieve a smooth and safe control transition in the conditionally automated driving vehicle. However, some characteristics of such activities like limited-extent movement and similar background pose a challenge to the existing 3D convolutional neural network (CNN) based action recognition methods. In this paper, we propose a dual-stream 3D residual network, named D3D ResNet, to enhance the learning of spatio-temporal representation and improve the activity recognition performance. Specifically, a parallel 2-stream structure is introduced to focus on the learning of short-time spatial representation and small-region temporal representation. A 2-feed driver behaviour monitoring framework is further build to classify 4 types of NDAs and 2 types of driving behaviour based on the drivers head and hand movement. A novel NDA dataset has been constructed for the evaluation, where the proposed D3D ResNet achieves 83.35% average accuracy, at least 5% above three selected state-of-the-art methods. Furthermore, this study investigates the spatio-temporal features learned in the hidden layer through the saliency map, which explains the superiority of the proposed model on the selected NDAs

    Automated assessment of echocardiographic image quality using deep convolutional neural networks

    Get PDF
    Myocardial ischemia tops the list of causes of death around the globe, but its diagnosis and early detection thrives on clinical echocardiography. Although echocardiography presents a huge advantage of a non-intrusive, low-cost point of care diagnosis, its image quality is inherently subjective with strong dependence on operators’ experience level and acquisition skill. In some countries, echo specialists are mandated to supplementary years of training to achieve ‘gold standard’ free-hand acquisition skill without which exacerbates the reliability of echocardiogram and increases possibility for misdiagnosis. These drawbacks pose significant challenges to adopting echocardiography as authoritative modalities for cardiac diagnosis. However, the prevailing and currently adopted solution is to manually carry out quality evaluation where an echocardiography specialist visually inspects several acquired images to make clinical decisions of its perceived quality and prognosis. This is a lengthening process and laced with variability of opinion consequently affection diagnostic responses. The goal of the research is to provide a multi-discipline, state-of-the-art solution that allows objective quality assessment of echocardiogram and to guarantee the reliability of clinical quantification processes. Computer graphic processing unit simulations, medical imaging analysis and deep convolutional neural network models were employed to achieve this goal. From a finite pool of echocardiographic patient datasets, 1650 random samples of echocardiogram cine-loops from different patients with age ranges from 17 and 85 years, who had undergone echocardiography between 2010 and 2020 were evaluated. We defined a set of pathological and anatomical criteria of image quality by which apical-four and parasternal long axis frames can be evaluated with feasibility for real-time optimization. The selected samples were annotated for multivariate model developments and validation of predicted quality score per frame. The outcome presents a robust artificial intelligence algorithm that indicate frames’ quality rating, real-time visualisation of element of quality and updates quality optimization in real-time. A prediction errors of 0.052, 0.062, 0.069, 0.056 for visibility, clarity, depth-gain, and foreshortening attributes were achieved, respectively. The model achieved a combined error rate of 3.6% with average prediction speed of 4.24 ms per frame. The novel method established a superior approach to two-dimensional image quality estimation, assessment, and clinical adequacy on acquisition of echocardiogram prior to quantification and diagnosis of myocardial infarction

    Geometric Model Checking of Continuous Space

    Get PDF
    Topological Spatial Model Checking is a recent paradigm where model checking techniques are developed for the topological interpretation of Modal Logic. The Spatial Logic of Closure Spaces, SLCS, extends Modal Logic with reachability connectives that, in turn, can be used for expressing interesting spatial properties, such as "being near to" or "being surrounded by". SLCS constitutes the kernel of a solid logical framework for reasoning about discrete space, such as graphs and digital images, interpreted as quasi discrete closure spaces. Following a recently developed geometric semantics of Modal Logic, we propose an interpretation of SLCS in continuous space, admitting a geometric spatial model checking procedure, by resorting to models based on polyhedra. Such representations of space are increasingly relevant in many domains of application, due to recent developments of 3D scanning and visualisation techniques that exploit mesh processing. We introduce PolyLogicA, a geometric spatial model checker for SLCS formulas on polyhedra and demonstrate feasibility of our approach on two 3D polyhedral models of realistic size. Finally, we introduce a geometric definition of bisimilarity, proving that it characterises logical equivalence

    A Novel Method to Verify Multilevel Computational Models of Biological Systems Using Multiscale Spatio-Temporal Meta Model Checking

    Get PDF
    Insights gained from multilevel computational models of biological systems can be translated into real-life applications only if the model correctness has been verified first. One of the most frequently employed in silico techniques for computational model verification is model checking. Traditional model checking approaches only consider the evolution of numeric values, such as concentrations, over time and are appropriate for computational models of small scale systems (e.g. intracellular networks). However for gaining a systems level understanding of how biological organisms function it is essential to consider more complex large scale biological systems (e.g. organs). Verifying computational models of such systems requires capturing both how numeric values and properties of (emergent) spatial structures (e.g. area of multicellular population) change over time and across multiple levels of organization, which are not considered by existing model checking approaches. To address this limitation we have developed a novel approximate probabilistic multiscale spatio-temporal meta model checking methodology for verifying multilevel computational models relative to specifications describing the desired/expected system behaviour. The methodology is generic and supports computational models encoded using various high-level modelling formalisms because it is defined relative to time series data and not the models used to generate it. In addition, the methodology can be automatically adapted to case study specific types of spatial structures and properties using the spatio-temporal meta model checking concept. To automate the computational model verification process we have implemented the model checking approach in the software tool Mule (http://mule.modelchecking.org). Its applicability is illustrated against four systems biology computational models previously published in the literature encoding the rat cardiovascular system dynamics, the uterine contractions of labour, the Xenopus laevis cell cycle and the acute inflammation of the gut and lung. Our methodology and software will enable computational biologists to efficiently develop reliable multilevel computational models of biological systems
    • …
    corecore