46 research outputs found
A Method Based on Total Variation for Network Modularity Optimization using the MBO Scheme
The study of network structure is pervasive in sociology, biology, computer
science, and many other disciplines. One of the most important areas of network
science is the algorithmic detection of cohesive groups of nodes called
"communities". One popular approach to find communities is to maximize a
quality function known as {\em modularity} to achieve some sort of optimal
clustering of nodes. In this paper, we interpret the modularity function from a
novel perspective: we reformulate modularity optimization as a minimization
problem of an energy functional that consists of a total variation term and an
balance term. By employing numerical techniques from image processing
and compressive sensing -- such as convex splitting and the
Merriman-Bence-Osher (MBO) scheme -- we develop a variational algorithm for the
minimization problem. We present our computational results using both synthetic
benchmark networks and real data.Comment: 23 page
Early Vision Optimization: Parametric Models, Parallelization and Curvature
Early vision is the process occurring before any semantic interpretation of an image takes place. Motion estimation, object segmentation and detection are all parts of early vision, but recognition is not. Many of these tasks are formulated as optimization problems and one of the key factors for the success of recent methods is that they seek to compute globally optimal solutions. This thesis is concerned with improving the efficiency and extending the applicability of the current state of the art. This is achieved by introducing new methods of computing solutions to image segmentation and other problems of early vision. The first part studies parametric problems where model parameters are estimated in addition to an image segmentation. For a small number of parameters these problems can still be solved optimally. In the second part the focus is shifted toward curvature regularization, i.e. when the commonly used length and area regularization is replaced by curvature in two and three dimensions. These problems can be discretized over a mesh and special attention is given to the mesh geometry. Specifically, hexagonal meshes are compared to square ones and a method for generating adaptive methods is introduced and evaluated. The framework is then extended to curvature regularization of surfaces. Thirdly, fast methods for finding minimal graph cuts and solving related problems on modern parallel hardware are developed and extensively evaluated. Finally, the thesis is concluded with two applications to early vision problems: heart segmentation and image registration
Variational methods and its applications to computer vision
Many computer vision applications such as image segmentation can be formulated in a ''variational'' way as energy minimization problems. Unfortunately, the computational task of minimizing these energies is usually difficult as it generally involves non convex functions in a space with thousands of dimensions and often the associated combinatorial problems are NP-hard to solve. Furthermore, they are ill-posed inverse problems and therefore are extremely sensitive to perturbations (e.g. noise). For this reason in order to compute a physically reliable approximation from given noisy data, it is necessary to incorporate into the mathematical model appropriate regularizations that require complex computations.
The main aim of this work is to describe variational segmentation methods that are particularly effective for curvilinear structures. Due to their complex geometry, classical regularization techniques cannot be adopted because they lead to the loss of most of low contrasted details. In contrast, the proposed method not only better preserves curvilinear structures, but also reconnects some parts that may have been disconnected by noise. Moreover, it can be easily extensible to graphs and successfully applied to different types of data such as medical imagery (i.e. vessels, hearth coronaries etc), material samples (i.e. concrete) and satellite signals (i.e. streets, rivers etc.). In particular, we will show results and performances about an implementation targeting new generation of High Performance Computing (HPC) architectures where different types of coprocessors cooperate. The involved dataset consists of approximately 200 images of cracks, captured in three different tunnels by a robotic machine designed for the European ROBO-SPECT project.Open Acces
A Two-Stage Image Segmentation Method Using a Convex Variant of the Mumford--Shah Model and Thresholding
The Mumford–Shah model is one of the most important image segmentation models and has been studied extensively in the last twenty years. In this paper, we propose a two-stage segmentation method based on the Mumford–Shah model. The first stage of our method is to find a smooth
solution g to a convex variant of the Mumford–Shah model. Once g is obtained, then in the second stage the segmentation is done by thresholding g into different phases. The thresholds can be given by the users or can be obtained automatically using any clustering methods. Because of the convexity of the model, g can be solved efficiently by techniques like the split-Bregman algorithm or the Chambolle–Pock method. We prove that our method is convergent and that the solution g is always unique. In our method, there is no need to specify the number of segments K (K ≥ 2) before finding g. We can obtain any K-phase segmentations by choosing (K − 1) thresholds after g is found in the first stage, and in the second stage there is no need to recompute g if the thresholds are changed to reveal different segmentation features in the image.Experimental results show that our two-stage method performs better than many standard two-phase or multiphase segmentation methods for very general images, including antimass, tubular, MRI, noisy, and blurry images
Combinatorial Solutions for Shape Optimization in Computer Vision
This thesis aims at solving so-called shape optimization problems, i.e. problems where the shape of some real-world entity is sought, by applying combinatorial algorithms. I present several advances in this field, all of them based on energy minimization. The addressed problems will become more intricate in the course of the thesis, starting from problems that are solved globally, then turning to problems where so far no global solutions are known. The first two chapters treat segmentation problems where the considered grouping criterion is directly derived from the image data. That is, the respective data terms do not involve any parameters to estimate. These problems will be solved globally. The first of these chapters treats the problem of unsupervised image segmentation where apart from the image there is no other user input. Here I will focus on a contour-based method and show how to integrate curvature regularity into a ratio-based optimization framework. The arising optimization problem is reduced to optimizing over the cycles in a product graph. This problem can be solved globally in polynomial, effectively linear time. As a consequence, the method does not depend on initialization and translational invariance is achieved. This is joint work with Daniel Cremers and Simon Masnou. I will then proceed to the integration of shape knowledge into the framework, while keeping translational invariance. This problem is again reduced to cycle-finding in a product graph. Being based on the alignment of shape points, the method actually uses a more sophisticated shape measure than most local approaches and still provides global optima. It readily extends to tracking problems and allows to solve some of them in real-time. I will present an extension to highly deformable shape models which can be included in the global optimization framework. This method simultaneously allows to decompose a shape into a set of deformable parts, based only on the input images. This is joint work with Daniel Cremers. In the second part segmentation is combined with so-called correspondence problems, i.e. the underlying grouping criterion is now based on correspondences that have to be inferred simultaneously. That is, in addition to inferring the shapes of objects, one now also tries to put into correspondence the points in several images. The arising problems become more intricate and are no longer optimized globally. This part is divided into two chapters. The first chapter treats the topic of real-time motion segmentation where objects are identified based on the observations that the respective points in the video will move coherently. Rather than pre-estimating motion, a single energy functional is minimized via alternating optimization. The main novelty lies in the real-time capability, which is achieved by exploiting a fast combinatorial segmentation algorithm. The results are furthermore improved by employing a probabilistic data term. This is joint work with Daniel Cremers. The final chapter presents a method for high resolution motion layer decomposition and was developed in combination with Daniel Cremers and Thomas Pock. Layer decomposition methods support the notion of a scene model, which allows to model occlusion and enforce temporal consistency. The contributions are twofold: from a practical point of view the proposed method allows to recover fine-detailed layer images by minimizing a single energy. This is achieved by integrating a super-resolution method into the layer decomposition framework. From a theoretical viewpoint the proposed method introduces layer-based regularity terms as well as a graph cut-based scheme to solve for the layer domains. The latter is combined with powerful continuous convex optimization techniques into an alternating minimization scheme. Lastly I want to mention that a significant part of this thesis is devoted to the recent trend of exploiting parallel architectures, in particular graphics cards: many combinatorial algorithms are easily parallelized. In Chapter 3 we will see a case where the standard algorithm is hard to parallelize, but easy for the respective problem instances
Recommended from our members
Visual object discovery and understanding
Learning to recognize objects is a fundamental and essential step in human perception and understanding of the world. Accordingly, research of object discovery across diverse modalities plays a pivotal role in the context of computer vision. This field not only contributes significantly to enhancing our understanding of visual information but also offers a plethora of potential applications, like augmented reality, e-commerce, and robotics, particularly in industrial manipulation scenarios.
We first address the task of discovering objects from still images regardless of any predefined categories. We introduce a novel variational relaxation approach tailored to the task. By framing it as an optimization problem for piecewise-constant segmentation, this technique enables direct training of a fully convolutional network (FCN) for predicting object labels on each pixel. Applying our approach to the instance segmentation task achieved results almost as good as mask R-CNN without depending on a two-stage framework. Note that the training of the network does not depend on the category label, enabling our approach to discover objects unbounded by predefined categories.
Next, we extend our exploration to video sequences, focusing on the task of unsupervised video object segmentation. Here, we aim to discover and track objects within videos. Noticing that single-frame object proposals often fail to obtain a good proposal due to motion blur, occlusion, and other reasons, our approach involves refining key frame proposals using a Multi-proposal graph constructed from proposals initially generated in nearby frames and then propagated to the key frame. We then compute the maximal cliques within this graph, which contains proposals that represent the same object. Pixel-level voting is performed within each clique to generate the key frame proposals that could be better than any of the single-frame proposals. Then a semi-supervised VOS algorithm subsequently tracks these key frame proposals across the entire video, showcasing the potential for precise and robust object tracking in dynamic visual environments.
We further explore into the domain of Vision-Language, where we seek to identify objects associated with a specific textual context. In this multifaceted context, we tackle the intricate challenge of content moderation (CM), which assesses multimodal user-generated content to detect material that is illegal, harmful, or insulting. We present a novel CM model to address the asymmetric in semantics between vision and language. Our model features an innovative asymmetric fusion architecture that not only fuses the common knowledge in both modalities but also leverages the unique information present in each modality. Additionally, we introduce a novel cross-modality contrastive loss to capture knowledge that arises exclusively in multimodal context, which is crucial for addressing harmful intent that may emerge at the intersection of these modalities