281 research outputs found
Compassionately Conservative Normalized Cuts for Image Segmentation
Image segmentation is a process used in computer vision to partition an image into regions with similar characteristics. One category of image segmentation algorithms is graph-based, where pixels in an image are represented by vertices in a graph and the similarity between pixels is represented by weighted edges. A segmentation of the image can be found by cutting edges between dissimilar groups of pixels in the graph, leaving different clusters or partitions of the data.
A popular graph-based method for segmenting images is the Normalized Cuts (NCuts) algorithm, which quantifies the cost for graph partitioning in a way that biases clusters or segments that are balanced towards having lower values than unbalanced partitionings. This bias is so strong, however, that the NCuts algorithm avoids any singleton partitions, even when vertices are weakly connected to the rest of the graph. For this reason, we propose the Compassionately Conservative Normalized Cut (CCNCut) objective function, which strikes a better compromise between the desire to avoid too many singleton partitions and the notion that all partitions should be balanced.
We demonstrate how CCNCut minimization can be relaxed into the problem of computing Piecewise Flat Embeddings (PFE) and provide an overview of, as well as two efficiency improvements to, the Splitting Orthogonality Constraint (SOC) algorithm previously used to approximate PFE. We then present a new algorithm for computing PFE based on iteratively minimizing a sequence of reweighted Rayleigh quotients (IRRQ) and run a series of experiments to compare CCNCut-based image segmentation via SOC and IRRQ to NCut-based image segmentation on the BSDS500 dataset. Our results indicate that CCNCut-based image segmentation yields more accurate results with respect to ground truth than NCut-based segmentation, and IRRQ is less sensitive to initialization than SOC
Steklov Spectral Geometry for Extrinsic Shape Analysis
We propose using the Dirichlet-to-Neumann operator as an extrinsic
alternative to the Laplacian for spectral geometry processing and shape
analysis. Intrinsic approaches, usually based on the Laplace-Beltrami operator,
cannot capture the spatial embedding of a shape up to rigid motion, and many
previous extrinsic methods lack theoretical justification. Instead, we consider
the Steklov eigenvalue problem, computing the spectrum of the
Dirichlet-to-Neumann operator of a surface bounding a volume. A remarkable
property of this operator is that it completely encodes volumetric geometry. We
use the boundary element method (BEM) to discretize the operator, accelerated
by hierarchical numerical schemes and preconditioning; this pipeline allows us
to solve eigenvalue and linear problems on large-scale meshes despite the
density of the Dirichlet-to-Neumann discretization. We further demonstrate that
our operators naturally fit into existing frameworks for geometry processing,
making a shift from intrinsic to extrinsic geometry as simple as substituting
the Laplace-Beltrami operator with the Dirichlet-to-Neumann operator.Comment: Additional experiments adde
Recommended from our members
Mathematical Imaging and Surface Processing
Within the last decade image and geometry processing have become increasingly rigorous with solid foundations in mathematics. Both areas are research fields at the intersection of different mathematical disciplines, ranging from geometry and calculus of variations to PDE analysis and numerical analysis. The workshop brought together scientists from all these areas and a fruitful interplay took place. There was a lively exchange of ideas between geometry and image processing applications areas, characterized in a number of ways in this workshop. For example, optimal transport, first applied in computer vision is now used to define a distance measure between 3d shapes, spectral analysis as a tool in image processing can be applied in surface classification and matching, and so on. We have also seen the use of Riemannian geometry as a powerful tool to improve the analysis of multivalued images.
This volume collects the abstracts for all the presentations covering this wide spectrum of tools and application domains
Differential geometric regularization for supervised learning of classifiers
We study the problem of supervised learning for both binary and multiclass classification from a unified geometric perspective. In particular, we propose a geometric regularization technique to find the submanifold corresponding to an estimator of the class probability P(y|\vec x). The regularization term measures the volume of this submanifold, based on the intuition that overfitting produces rapid local oscillations and hence large volume of the estimator. This technique can be applied to regularize any classification function that satisfies two requirements: firstly, an estimator of the class probability can be obtained; secondly, first and second derivatives of the class probability estimator can be calculated. In experiments, we apply our regularization technique to standard loss functions for classification, our RBF-based implementation compares favorably to widely used regularization methods for both binary and multiclass classification.http://proceedings.mlr.press/v48/baia16.pdfPublished versio
Geometry-Aware Neighborhood Search for Learning Local Models for Image Reconstruction
Local learning of sparse image models has proven to be very effective to
solve inverse problems in many computer vision applications. To learn such
models, the data samples are often clustered using the K-means algorithm with
the Euclidean distance as a dissimilarity metric. However, the Euclidean
distance may not always be a good dissimilarity measure for comparing data
samples lying on a manifold. In this paper, we propose two algorithms for
determining a local subset of training samples from which a good local model
can be computed for reconstructing a given input test sample, where we take
into account the underlying geometry of the data. The first algorithm, called
Adaptive Geometry-driven Nearest Neighbor search (AGNN), is an adaptive scheme
which can be seen as an out-of-sample extension of the replicator graph
clustering method for local model learning. The second method, called
Geometry-driven Overlapping Clusters (GOC), is a less complex nonadaptive
alternative for training subset selection. The proposed AGNN and GOC methods
are evaluated in image super-resolution, deblurring and denoising applications
and shown to outperform spectral clustering, soft clustering, and geodesic
distance based subset selection in most settings.Comment: 15 pages, 10 figures and 5 table
Efficient 3D Semantic Segmentation with Superpoint Transformer
We introduce a novel superpoint-based transformer architecture for efficient
semantic segmentation of large-scale 3D scenes. Our method incorporates a fast
algorithm to partition point clouds into a hierarchical superpoint structure,
which makes our preprocessing 7 times faster than existing superpoint-based
approaches. Additionally, we leverage a self-attention mechanism to capture the
relationships between superpoints at multiple scales, leading to
state-of-the-art performance on three challenging benchmark datasets: S3DIS
(76.0% mIoU 6-fold validation), KITTI-360 (63.5% on Val), and DALES (79.6%).
With only 212k parameters, our approach is up to 200 times more compact than
other state-of-the-art models while maintaining similar performance.
Furthermore, our model can be trained on a single GPU in 3 hours for a fold of
the S3DIS dataset, which is 7x to 70x fewer GPU-hours than the best-performing
methods. Our code and models are accessible at
github.com/drprojects/superpoint_transformer.Comment: Accepted at ICCV 2023. Camera-ready version with Appendix. Code
available at github.com/drprojects/superpoint_transforme
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Combining Features and Semantics for Low-level Computer Vision
Visual perception of depth and motion plays a significant role in understanding and navigating the environment.
Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving.
The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information.
Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions.
Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects.
Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem Verständnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die Schätzung der Bewegung von Videokameras sind von größter Bedeutung für Anwendungen, wie das autonome Fahren.
Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfür sind reflektierende und texturlose Oberflächen oder große Bewegungen, bei denen herkömmliche lokale Methoden häufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige Lichtverhältnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern.
Für die binokulare Stereo Schätzung schlagen wir zuallererst vor zusammenhängende Bereiche mit objektklassen-spezifischen Disparitäts Vorschlägen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spärlichen Disparitätsschätzung und semantischen Segmentierung des Bildes erhalten. Die Disparitäts Vorschläge kodieren die Tatsache, dass die Gegenstände bestimmter Kategorien nicht willkürlich geformt sind, sondern typischerweise regelmäßige Strukturen aufweisen. Wir integrieren sie für die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen.
Zweitens nutzen wir für die 3D-Rekonstruktion die Tatsache, dass mit der Größe der rekonstruierten Fläche auch die Wahrscheinlichkeit steigt, Objekte von ähnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders für Szenen im Freien, in denen Gebäude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber ähnlichkeit in der Form aufweisen. Wir nutzen diese ähnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, während fehlende Flächen vervollständigt werden, da Objekte ähnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstädtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen Abhängigkeiten zwischen Objekten.
Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, präsentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusätzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. Für das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm für eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes für das Feature Matching
- …