256 research outputs found

    Sparse optical flow regularisation for real-time visual tracking

    Get PDF
    Optical flow can greatly improve the robustness of visual tracking algorithms. While dense optical flow algorithms have various applications, they can not be used for real-time solutions without resorting to GPU calculations. Furthermore, most optical flow algorithms fail in challenging lighting environments due to the violation of the brightness constraint. We propose a simple but effective iterative regularisation scheme for real-time, sparse optical flow algorithms, that is shown to be robust to sudden illumination changes and can handle large displacements. The algorithm proves to outperform well known techniques in real life video sequences, while being much faster to calculate. Our solution increases the robustness of a real-time particle filter based tracking application, consuming only a fraction of the available CPU power. Furthermore, a new and realistic optical flow dataset with annotated ground truth is created and made freely available for research purposes

    Neighborhood Structure-Based Model for Multilingual Arbitrarily-Oriented Text Localization in Images/Videos

    Get PDF
    The text matter in an image or a video provides more important clue and semantic information of the particular event in the actual situation. Text localization task stands an interesting and challenging research-oriented process in the zone of image processing due to irregular alignments, brightness, degradation, and complexbackground. The multilingual textual information has different types of geometrical shapes and it makes further complex to locate the text information. In this work, an effective model is presented to locate the multilingual arbitrary oriented text. The proposed method developed a neighborhood structure model to locate the text region. Initially, the maxmin cluster is applied along with 3X3 sliding window to sharpen the text region. The neighborhood structure creates the boundary for every component using normal deviation calculated from the sharpened image. Finally, the double stroke structure model is employed to locate the accurate text region. The presented model is analyzed on five standard datasets such as NUS, arbitrarily oriented text, Hua's, MRRC and real-time video dataset with performance metrics such as recall, precision, and f-measure

    CASPNet++: Joint Multi-Agent Motion Prediction

    Full text link
    The prediction of road users' future motion is a critical task in supporting advanced driver-assistance systems (ADAS). It plays an even more crucial role for autonomous driving (AD) in enabling the planning and execution of safe driving maneuvers. Based on our previous work, Context-Aware Scene Prediction Network (CASPNet), an improved system, CASPNet++, is proposed. In this work, we focus on further enhancing the interaction modeling and scene understanding to support the joint prediction of all road users in a scene using spatiotemporal grids to model future occupancy. Moreover, an instance-based output head is introduced to provide multi-modal trajectories for agents of interest. In extensive quantitative and qualitative analysis, we demonstrate the scalability of CASPNet++ in utilizing and fusing diverse environmental input sources such as HD maps, Radar detection, and Lidar segmentation. Tested on the urban-focused prediction dataset nuScenes, CASPNet++ reaches state-of-the-art performance. The model has been deployed in a testing vehicle, running in real-time with moderate computational resources.Comment: 8 pages, 6 figure

    Video foreground extraction for mobile camera platforms

    Get PDF
    Foreground object detection is a fundamental task in computer vision with many applications in areas such as object tracking, event identification, and behavior analysis. Most conventional foreground object detection methods work only in a stable illumination environments using fixed cameras. In real-world applications, however, it is often the case that the algorithm needs to operate under the following challenging conditions: drastic lighting changes, object shape complexity, moving cameras, low frame capture rates, and low resolution images. This thesis presents four novel approaches for foreground object detection on real-world datasets using cameras deployed on moving vehicles.The first problem addresses passenger detection and tracking tasks for public transport buses investigating the problem of changing illumination conditions and low frame capture rates. Our approach integrates a stable SIFT (Scale Invariant Feature Transform) background seat modelling method with a human shape model into a weighted Bayesian framework to detect passengers. To deal with the problem of tracking multiple targets, we employ the Reversible Jump Monte Carlo Markov Chain tracking algorithm. Using the SVM classifier, the appearance transformation models capture changes in the appearance of the foreground objects across two consecutives frames under low frame rate conditions. In the second problem, we present a system for pedestrian detection involving scenes captured by a mobile bus surveillance system. It integrates scene localization, foreground-background separation, and pedestrian detection modules into a unified detection framework. The scene localization module performs a two stage clustering of the video data.In the first stage, SIFT Homography is applied to cluster frames in terms of their structural similarity, and the second stage further clusters these aligned frames according to consistency in illumination. This produces clusters of images that are differential in viewpoint and lighting. A kernel density estimation (KDE) technique for colour and gradient is then used to construct background models for each image cluster, which is further used to detect candidate foreground pixels. Finally, using a hierarchical template matching approach, pedestrians can be detected.In addition to the second problem, we present three direct pedestrian detection methods that extend the HOG (Histogram of Oriented Gradient) techniques (Dalal and Triggs, 2005) and provide a comparative evaluation of these approaches. The three approaches include: a) a new histogram feature, that is formed by the weighted sum of both the gradient magnitude and the filter responses from a set of elongated Gaussian filters (Leung and Malik, 2001) corresponding to the quantised orientation, which we refer to as the Histogram of Oriented Gradient Banks (HOGB) approach; b) the codebook based HOG feature with branch-and-bound (efficient subwindow search) algorithm (Lampert et al., 2008) and; c) the codebook based HOGB approach.In the third problem, a unified framework that combines 3D and 2D background modelling is proposed to detect scene changes using a camera mounted on a moving vehicle. The 3D scene is first reconstructed from a set of videos taken at different times. The 3D background modelling identifies inconsistent scene structures as foreground objects. For the 2D approach, foreground objects are detected using the spatio-temporal MRF algorithm. Finally, the 3D and 2D results are combined using morphological operations.The significance of these research is that it provides basic frameworks for automatic large-scale mobile surveillance applications and facilitates many higher-level applications such as object tracking and behaviour analysis

    ESKNet-An enhanced adaptive selection kernel convolution for breast tumors segmentation

    Full text link
    Breast cancer is one of the common cancers that endanger the health of women globally. Accurate target lesion segmentation is essential for early clinical intervention and postoperative follow-up. Recently, many convolutional neural networks (CNNs) have been proposed to segment breast tumors from ultrasound images. However, the complex ultrasound pattern and the variable tumor shape and size bring challenges to the accurate segmentation of the breast lesion. Motivated by the selective kernel convolution, we introduce an enhanced selective kernel convolution for breast tumor segmentation, which integrates multiple feature map region representations and adaptively recalibrates the weights of these feature map regions from the channel and spatial dimensions. This region recalibration strategy enables the network to focus more on high-contributing region features and mitigate the perturbation of less useful regions. Finally, the enhanced selective kernel convolution is integrated into U-net with deep supervision constraints to adaptively capture the robust representation of breast tumors. Extensive experiments with twelve state-of-the-art deep learning segmentation methods on three public breast ultrasound datasets demonstrate that our method has a more competitive segmentation performance in breast ultrasound images.Comment: 12 pages, 8 figure

    Spectral-Spatial Graph Reasoning Network for Hyperspectral Image Classification

    Full text link
    In this paper, we propose a spectral-spatial graph reasoning network (SSGRN) for hyperspectral image (HSI) classification. Concretely, this network contains two parts that separately named spatial graph reasoning subnetwork (SAGRN) and spectral graph reasoning subnetwork (SEGRN) to capture the spatial and spectral graph contexts, respectively. Different from the previous approaches implementing superpixel segmentation on the original image or attempting to obtain the category features under the guide of label image, we perform the superpixel segmentation on intermediate features of the network to adaptively produce the homogeneous regions to get the effective descriptors. Then, we adopt a similar idea in spectral part that reasonably aggregating the channels to generate spectral descriptors for spectral graph contexts capturing. All graph reasoning procedures in SAGRN and SEGRN are achieved through graph convolution. To guarantee the global perception ability of the proposed methods, all adjacent matrices in graph reasoning are obtained with the help of non-local self-attention mechanism. At last, by combining the extracted spatial and spectral graph contexts, we obtain the SSGRN to achieve a high accuracy classification. Extensive quantitative and qualitative experiments on three public HSI benchmarks demonstrate the competitiveness of the proposed methods compared with other state-of-the-art approaches

    Understanding High Resolution Aerial Imagery Using Computer Vision Techniques

    Get PDF
    Computer vision can make important contributions to the analysis of remote sensing satellite or aerial imagery. However, the resolution of early satellite imagery was not sufficient to provide useful spatial features. The situation is changing with the advent of very-high-spatial-resolution (VHR) imaging sensors. This change makes it possible to use computer vision techniques to perform analysis of man-made structures. Meanwhile, the development of multi-view imaging techniques allows the generation of accurate point clouds as ancillary knowledge. This dissertation aims at developing computer vision and machine learning algorithms for high resolution aerial imagery analysis in the context of application problems including debris detection, building detection and roof condition assessment. High resolution aerial imagery and point clouds were provided by Pictometry International for this study. Debris detection after natural disasters such as tornadoes, hurricanes or tsunamis, is needed for effective debris removal and allocation of limited resources. Significant advances in aerial image acquisition have greatly enabled the possibilities for rapid and automated detection of debris. In this dissertation, a robust debris detection algorithm is proposed. Large scale aerial images are partitioned into homogeneous regions by interactive segmentation. Debris areas are identified based on extracted texture features. Robust building detection is another important part of high resolution aerial imagery understanding. This dissertation develops a 3D scene classification algorithm for building detection using point clouds derived from multi-view imagery. Point clouds are divided into point clusters using Euclidean clustering. Individual point clusters are identified based on extracted spectral and 3D structural features. The inspection of roof condition is an important step in damage claim processing in the insurance industry. Automated roof condition assessment from remotely sensed images is proposed in this dissertation. Initially, texture classification and a bag-of-words model were applied to assess the roof condition using features derived from the whole rooftop. However, considering the complexity of residential rooftop, a more sophisticated method is proposed to divide the task into two stages: 1) roof segmentation, followed by 2) classification of segmented roof regions. Deep learning techniques are investigated for both segmentation and classification. A deep learned feature is proposed and applied in a region merging segmentation algorithm. A fine-tuned deep network is adopted for roof segment classification and found to achieve higher accuracy than traditional methods using hand-crafted features. Contributions of this study include the development of algorithms for debris detection using 2D images and building detection using 3D point clouds. For roof condition assessment, the solutions to this problem are explored in two directions: features derived from the whole rooftop and features extracted from each roof segments. Through our research, roof segmentation followed by segments classification was found to be a more promising method and the workflow processing developed and tested. Deep learning techniques are also investigated for both roof segmentation and segments classification. More unsupervised feature extraction techniques using deep learning can be explored in future work
    • …
    corecore