49 research outputs found

    An integrated background model for video surveillance based on primal sketch and 3D scene geometry

    Get PDF
    This paper presents a novel integrated background model for video surveillance. Our model uses a primal sketch representation for image appearance and 3D scene geometry to capture the ground plane and major surfaces in the scene. The primal sketch model divides the background image into three types of regions — flat, sketchable and textured. The three types of regions are modeled respectively by mixture of Gaussians, image primitives and LBP histograms. We calibrate the camera and recover important planes such as ground, horizontal surfaces, walls, stairs in the 3D scene, and use geometric information to predict the sizes and locations of foreground blobs to further reduce false alarms. Compared with the state-of-theart background modeling methods, our approach is more effective, especially for indoor scenes where shadows, highlights and reflections of moving objects and camera exposure adjusting usually cause problems. Experiment results demonstrate that our approach improves the performance of background/foreground separation at pixel level, and the integrated video surveillance system at the object and trajectory level. 1

    Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs

    Full text link
    Transformer models have made tremendous progress in various fields in recent years. In the field of computer vision, vision transformers (ViTs) also become strong alternatives to convolutional neural networks (ConvNets), yet they have not been able to replace ConvNets since both have their own merits. For instance, ViTs are good at extracting global features with attention mechanisms while ConvNets are more efficient in modeling local relationships due to their strong inductive bias. A natural idea that arises is to combine the strengths of both ConvNets and ViTs to design new structures. In this paper, we propose a new basic neural network operator named position-aware circular convolution (ParC) and its accelerated version Fast-ParC. The ParC operator can capture global features by using a global kernel and circular convolution while keeping location sensitiveness by employing position embeddings. Our Fast-ParC further reduces the O(n2) time complexity of ParC to O(n log n) using Fast Fourier Transform. This acceleration makes it possible to use global convolution in the early stages of models with large feature maps, yet still maintains the overall computational cost comparable with using 3x3 or 7x7 kernels. The proposed operation can be used in a plug-and-play manner to 1) convert ViTs to pure-ConvNet architecture to enjoy wider hardware support and achieve higher inference speed; 2) replacing traditional convolutions in the deep stage of ConvNets to improve accuracy by enlarging the effective receptive field. Experiment results show that our ParC op can effectively enlarge the receptive field of traditional ConvNets, and adopting the proposed op benefits both ViTs and ConvNet models on all three popular vision tasks, image classification, objectComment: 19 pages, 8 figures, 11 tables. A preliminary version of this paper has been published in ECCV 2022 and it can be find in arXiv:2203.0395

    Million-scale Object Detection with Large Vision Model

    Full text link
    Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system. Such a system would have the potential to solve a wide range of vision tasks simultaneously, without being restricted to a specific problem or data domain. This is crucial for practical, real-world computer vision applications. In this study, we focus on the million-scale multi-domain universal object detection problem, which presents several challenges, including cross-dataset category label duplication, label conflicts, and the need to handle hierarchical taxonomies. Furthermore, there is an ongoing challenge in the field to find a resource-efficient way to leverage large pre-trained vision models for million-scale cross-dataset object detection. To address these challenges, we introduce our approach to label handling, hierarchy-aware loss design, and resource-efficient model training using a pre-trained large model. Our method was ranked second in the object detection track of the Robust Vision Challenge 2022 (RVC 2022). We hope that our detailed study will serve as a useful reference and alternative approach for similar problems in the computer vision community. The code is available at https://github.com/linfeng93/Large-UniDet.Comment: This paper is revised by ChatGP

    Integrating 3D and 2D Representations for View Invariant Object Recognition

    No full text
    This thesis presents representations and corresponding algorithms which learn models to recognize objects in the full continuous view space. Particularly, we propose to integrate the 3D object-centered representations with 2D viewer-centered representations, which fills in the representation gap between the sparse and simple 3D shapes and their view variant appearances observed as image pixels. Towards this goal, this thesis studies the following models and corresponding algorithms:1. A mixed model and a pursuit algorithm that integrates 3D object primitives and 2D image primitives according to their information contributions measured as information gains. This proposed measure is consistently used in subsequent models, and also provides a numerical answer to the debates over object-centered representation and viewer-centered representation.2. A 2D compositional image model and a sum-max data structure which groups the 2D image primitives to represent middle level image structures, such as line segments, curves and corners. This middle level image model can be used to find sparse representations of natural images, and connects the low level 2D image representations to 3D object representations.3. A 3D hierarchical compositional object model and an AND-OR tree structure which represents a huge number of possible 3D object templates using a limited number of nodes. This AND-OR tree hierarchically quantizes the infinite and continuous space of object geometry and appearance, and decomposes the 3D object representation into 3D panels, whose appearance on images are further decomposed into active curves and the 2D primitives. Though with multiple hierarchies, learning and inference can be done efficiently by dynamic programming,which is essentially composed of layers of sum and max operations

    Influence of the carbon atom numbers and the structure of collectors on the coal fines flotation

    No full text
    This paper highlighted measuring the contact angle, the wetting heat and the IR spectrum before and after interactions between eight alkanes and coals to investigate the influence of the alkane structure and the number of carbon atoms on the coal surface hydrophobicity. The coal fines flotation tests were conducted with alkanes as collectors. The results show that after different alkanes acted on the coal samples, when the carbon numbers are same, both the contact angle and the wetting heat of isoparaffins were higher than those of the normal alkane. When the number of carbon atoms was 12, the contact angle and the wetting heat reached the maximum values after alkanes interacted with coals. The yield of the clean coal, the combustible material recovery and flotation perfect indicators of the isododecane were higher than those of the dodecane by 3.62, 4.08 and 0.96% respectively under the same condition of flotation. The ash content of the clean coal was also higher than that of the dodecane by 0.91%. This study shows that the collecting ability of the isoparaffin was superior to that of normal alkanes, while the flotation selectivity of the normal alkane was better than that of the isoparaffin

    Learning a Probabilistic Model Mixing 3D and 2D Primitives for View Invariant Object Recognition

    No full text
    This paper presents a method learning mixed templates for view invariant object recognition. The template is composed of 3D and 2D primitives which are stick-like elements defined in 3D and 2D spaces respectively. The primitives are allowed to perturb within a local range to account fo

    Learning 3D Object Templates by Quantizing Geometry and Appearance Spaces

    No full text
    corecore