1,511 research outputs found
Socially Constrained Structural Learning for Groups Detection in Crowd
Modern crowd theories agree that collective behavior is the result of the
underlying interactions among small groups of individuals. In this work, we
propose a novel algorithm for detecting social groups in crowds by means of a
Correlation Clustering procedure on people trajectories. The affinity between
crowd members is learned through an online formulation of the Structural SVM
framework and a set of specifically designed features characterizing both their
physical and social identity, inspired by Proxemic theory, Granger causality,
DTW and Heat-maps. To adhere to sociological observations, we introduce a loss
function (G-MITRE) able to deal with the complexity of evaluating group
detection performances. We show our algorithm achieves state-of-the-art results
when relying on both ground truth trajectories and tracklets previously
extracted by available detector/tracker systems
On affine rigidity
We define the notion of affine rigidity of a hypergraph and prove a variety
of fundamental results for this notion. First, we show that affine rigidity can
be determined by the rank of a specific matrix which implies that affine
rigidity is a generic property of the hypergraph.Then we prove that if a graph
is is -vertex-connected, then it must be "generically neighborhood
affinely rigid" in -dimensional space. This implies that if a graph is
-vertex-connected then any generic framework of its squared graph must
be universally rigid.
Our results, and affine rigidity more generally, have natural applications in
point registration and localization, as well as connections to manifold
learning.Comment: Updated abstrac
PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?
Most (3D) multi-object tracking methods rely on appearance-based cues for
data association. By contrast, we investigate how far we can get by only
encoding geometric relationships between objects in 3D space as cues for
data-driven data association. We encode 3D detections as nodes in a graph,
where spatial and temporal pairwise relations among objects are encoded via
localized polar coordinates on graph edges. This representation makes our
geometric relations invariant to global transformations and smooth trajectory
changes, especially under non-holonomic motion. This allows our graph neural
network to learn to effectively encode temporal and spatial interactions and
fully leverage contextual and motion cues to obtain final scene interpretation
by posing data association as edge classification. We establish a new
state-of-the-art on nuScenes dataset and, more importantly, show that our
method, PolarMOT, generalizes remarkably well across different locations
(Boston, Singapore, Karlsruhe) and datasets (nuScenes and KITTI).Comment: ECCV 2022, 17 pages, 5 pages of supplementary, 3 figure
Universal Denoising Networks : A Novel CNN Architecture for Image Denoising
We design a novel network architecture for learning discriminative image
models that are employed to efficiently tackle the problem of grayscale and
color image denoising. Based on the proposed architecture, we introduce two
different variants. The first network involves convolutional layers as a core
component, while the second one relies instead on non-local filtering layers
and thus it is able to exploit the inherent non-local self-similarity property
of natural images. As opposed to most of the existing deep network approaches,
which require the training of a specific model for each considered noise level,
the proposed models are able to handle a wide range of noise levels using a
single set of learned parameters, while they are very robust when the noise
degrading the latent image does not match the statistics of the noise used
during training. The latter argument is supported by results that we report on
publicly available images corrupted by unknown noise and which we compare
against solutions obtained by competing methods. At the same time the
introduced networks achieve excellent results under additive white Gaussian
noise (AWGN), which are comparable to those of the current state-of-the-art
network, while they depend on a more shallow architecture with the number of
trained parameters being one order of magnitude smaller. These properties make
the proposed networks ideal candidates to serve as sub-solvers on restoration
methods that deal with general inverse imaging problems such as deblurring,
demosaicking, superresolution, etc.Comment: Camera ready paper to appear in the Proceedings of CVPR 201
Learning to Divide and Conquer for Online Multi-Target Tracking
Online Multiple Target Tracking (MTT) is often addressed within the
tracking-by-detection paradigm. Detections are previously extracted
independently in each frame and then objects trajectories are built by
maximizing specifically designed coherence functions. Nevertheless, ambiguities
arise in presence of occlusions or detection errors. In this paper we claim
that the ambiguities in tracking could be solved by a selective use of the
features, by working with more reliable features if possible and exploiting a
deeper representation of the target only if necessary. To this end, we propose
an online divide and conquer tracker for static camera scenes, which partitions
the assignment problem in local subproblems and solves them by selectively
choosing and combining the best features. The complete framework is cast as a
structural learning task that unifies these phases and learns tracker
parameters from examples. Experiments on two different datasets highlights a
significant improvement of tracking performances (MOTA +10%) over the state of
the art
Least squares optimization: From theory to practice
Nowadays, Nonlinear Least-Squares embodies the foundation of many Robotics and Computer Vision systems. The research community deeply investigated this topic in the last few years, and this resulted in the development of several open-source solvers to approach constantly increasing classes of problems. In this work, we propose a unified methodology to design and develop efficient Least-Squares Optimization algorithms, focusing on the structures and patterns of each specific domain. Furthermore, we present a novel open-source optimization system that addresses problems transparently with a different structure and designed to be easy to extend. The system is written in modern C++ and runs efficiently on embedded systemsWe validated our approach by conducting comparative experiments on several problems using standard datasets. The results show that our system achieves state-of-the-art performances in all tested scenarios
Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines
Tackling pattern recognition problems in areas such as computer vision,
bioinformatics, speech or text recognition is often done best by taking into
account task-specific statistical relations between output variables. In
structured prediction, this internal structure is used to predict multiple
outputs simultaneously, leading to more accurate and coherent predictions.
Structural support vector machines (SSVMs) are nonprobabilistic models that
optimize a joint input-output function through margin-based learning. Because
SSVMs generally disregard the interplay between unary and interaction factors
during the training phase, final parameters are suboptimal. Moreover, its
factors are often restricted to linear combinations of input features, limiting
its generalization power. To improve prediction accuracy, this paper proposes:
(i) Joint inference and learning by integration of back-propagation and
loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM
factors to neural networks that form highly nonlinear functions of input
features. Image segmentation benchmark results demonstrate improvements over
conventional SSVM training methods in terms of accuracy, highlighting the
feasibility of end-to-end SSVM training with neural factors
- …