During the last decade, visual sensors have become ubiquitous. One or more cameras
can be found in devices ranging from smartphones to unmanned aerial vehicles and
autonomous cars. During the same time, we have witnessed the emergence of large
scale networks ranging from sensor networks to robotic swarms.
Assume multiple visual sensors perceive the same scene from different viewpoints. In
order to achieve consistent perception, the problem of correspondences between ob-
served features must be first solved. Then, it is often necessary to perform distributed
localization, i.e. to estimate the pose of each agent with respect to a global reference
frame. Having everything set in the same coordinate system and everything having
the same meaning for all agents, operation of the agents and interpretation of the
jointly observed scene become possible.
The questions we address in this thesis are the following: first, can a group of visual
sensors agree on what they see, in a decentralized fashion? This is the problem of
collaborative data association. Then, based on what they see, can the visual sensors
agree on where they are, in a decentralized fashion as well? This is the problem of
cooperative localization.
The contributions of this work are five-fold. We are the first to address the problem
of consistent multiway matching in a decentralized setting. Secondly, we propose
an efficient decentralized dynamical systems approach for computing any number of
smallest eigenvalues and the associated eigenvectors of a weighted graph with global
convergence guarantees with direct applications in group synchronization problems,
e.g. permutations or rotations synchronization. Thirdly, we propose a state-of-the
art framework for decentralized collaborative localization for mobile agents under
the presence of unknown cross-correlations by solving a minimax optimization prob-
lem to account for the missing information. Fourthly, we are the first to present an
approach to the 3-D rotation localization of a camera sensor network from relative
bearing measurements. Lastly, we focus on the case of a group of three visual sensors.
We propose a novel Riemannian geometric representation of the trifocal tensor which
relates projections of points and lines in three overlapping views. The aforemen-
tioned representation enables the use of the state-of-the-art optimization methods on
Riemannian manifolds and the use of robust averaging techniques for estimating the
trifocal tensor