49 research outputs found
An integrated background model for video surveillance based on primal sketch and 3D scene geometry
This paper presents a novel integrated background model for video surveillance. Our model uses a primal sketch representation for image appearance and 3D scene geometry to capture the ground plane and major surfaces in the scene. The primal sketch model divides the background image into three types of regions — flat, sketchable and textured. The three types of regions are modeled respectively by mixture of Gaussians, image primitives and LBP histograms. We calibrate the camera and recover important planes such as ground, horizontal surfaces, walls, stairs in the 3D scene, and use geometric information to predict the sizes and locations of foreground blobs to further reduce false alarms. Compared with the state-of-theart background modeling methods, our approach is more effective, especially for indoor scenes where shadows, highlights and reflections of moving objects and camera exposure adjusting usually cause problems. Experiment results demonstrate that our approach improves the performance of background/foreground separation at pixel level, and the integrated video surveillance system at the object and trajectory level. 1
Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs
Transformer models have made tremendous progress in various fields in recent
years. In the field of computer vision, vision transformers (ViTs) also become
strong alternatives to convolutional neural networks (ConvNets), yet they have
not been able to replace ConvNets since both have their own merits. For
instance, ViTs are good at extracting global features with attention mechanisms
while ConvNets are more efficient in modeling local relationships due to their
strong inductive bias. A natural idea that arises is to combine the strengths
of both ConvNets and ViTs to design new structures. In this paper, we propose a
new basic neural network operator named position-aware circular convolution
(ParC) and its accelerated version Fast-ParC. The ParC operator can capture
global features by using a global kernel and circular convolution while keeping
location sensitiveness by employing position embeddings. Our Fast-ParC further
reduces the O(n2) time complexity of ParC to O(n log n) using Fast Fourier
Transform. This acceleration makes it possible to use global convolution in the
early stages of models with large feature maps, yet still maintains the overall
computational cost comparable with using 3x3 or 7x7 kernels. The proposed
operation can be used in a plug-and-play manner to 1) convert ViTs to
pure-ConvNet architecture to enjoy wider hardware support and achieve higher
inference speed; 2) replacing traditional convolutions in the deep stage of
ConvNets to improve accuracy by enlarging the effective receptive field.
Experiment results show that our ParC op can effectively enlarge the receptive
field of traditional ConvNets, and adopting the proposed op benefits both ViTs
and ConvNet models on all three popular vision tasks, image classification,
objectComment: 19 pages, 8 figures, 11 tables. A preliminary version of this paper
has been published in ECCV 2022 and it can be find in arXiv:2203.0395
Million-scale Object Detection with Large Vision Model
Over the past few years, there has been growing interest in developing a
broad, universal, and general-purpose computer vision system. Such a system
would have the potential to solve a wide range of vision tasks simultaneously,
without being restricted to a specific problem or data domain. This is crucial
for practical, real-world computer vision applications. In this study, we focus
on the million-scale multi-domain universal object detection problem, which
presents several challenges, including cross-dataset category label
duplication, label conflicts, and the need to handle hierarchical taxonomies.
Furthermore, there is an ongoing challenge in the field to find a
resource-efficient way to leverage large pre-trained vision models for
million-scale cross-dataset object detection. To address these challenges, we
introduce our approach to label handling, hierarchy-aware loss design, and
resource-efficient model training using a pre-trained large model. Our method
was ranked second in the object detection track of the Robust Vision Challenge
2022 (RVC 2022). We hope that our detailed study will serve as a useful
reference and alternative approach for similar problems in the computer vision
community. The code is available at https://github.com/linfeng93/Large-UniDet.Comment: This paper is revised by ChatGP
Integrating 3D and 2D Representations for View Invariant Object Recognition
This thesis presents representations and corresponding algorithms which learn models to recognize objects in the full continuous view space. Particularly, we propose to integrate the 3D object-centered representations with 2D viewer-centered representations, which fills in the representation gap between the sparse and simple 3D shapes and their view variant appearances observed as image pixels. Towards this goal, this thesis studies the following models and corresponding algorithms:1. A mixed model and a pursuit algorithm that integrates 3D object primitives and 2D image primitives according to their information contributions measured as information gains. This proposed measure is consistently used in subsequent models, and also provides a numerical answer to the debates over object-centered representation and viewer-centered representation.2. A 2D compositional image model and a sum-max data structure which groups the 2D image primitives to represent middle level image structures, such as line segments, curves and corners. This middle level image model can be used to find sparse representations of natural images, and connects the low level 2D image representations to 3D object representations.3. A 3D hierarchical compositional object model and an AND-OR tree structure which represents a huge number of possible 3D object templates using a limited number of nodes. This AND-OR tree hierarchically quantizes the infinite and continuous space of object geometry and appearance, and decomposes the 3D object representation into 3D panels, whose appearance on images are further decomposed into active curves and the 2D primitives. Though with multiple hierarchies, learning and inference can be done efficiently by dynamic programming,which is essentially composed of layers of sum and max operations
Influence of the carbon atom numbers and the structure of collectors on the coal fines flotation
This paper highlighted measuring the contact angle, the wetting heat and the IR spectrum before and after interactions between eight alkanes and coals to investigate the influence of the alkane structure and the number of carbon atoms on the coal surface hydrophobicity. The coal fines flotation tests were conducted with alkanes as collectors. The results show that after different alkanes acted on the coal samples, when the carbon numbers are same, both the contact angle and the wetting heat of isoparaffins were higher than those of the normal alkane. When the number of carbon atoms was 12, the contact angle and the wetting heat reached the maximum values after alkanes interacted with coals. The yield of the clean coal, the combustible material recovery and flotation perfect indicators of the isododecane were higher than those of the dodecane by 3.62, 4.08 and 0.96% respectively under the same condition of flotation. The ash content of the clean coal was also higher than that of the dodecane by 0.91%. This study shows that the collecting ability of the isoparaffin was superior to that of normal alkanes, while the flotation selectivity of the normal alkane was better than that of the isoparaffin
Learning a Probabilistic Model Mixing 3D and 2D Primitives for View Invariant Object Recognition
This paper presents a method learning mixed templates for view invariant object recognition. The template is composed of 3D and 2D primitives which are stick-like elements defined in 3D and 2D spaces respectively. The primitives are allowed to perturb within a local range to account fo