7 research outputs found

    Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping

    Full text link
    This paper presents a novel real-time method for tracking salient closed boundaries from video image sequences. This method operates on a set of straight line segments that are produced by line detection. The tracking scheme is coherently integrated into a perceptual grouping framework in which the visual tracking problem is tackled by identifying a subset of these line segments and connecting them sequentially to form a closed boundary with the largest saliency and a certain similarity to the previous one. Specifically, we define a new tracking criterion which combines a grouping cost and an area similarity constraint. The proposed criterion makes the resulting boundary tracking more robust to local minima. To achieve real-time tracking performance, we use Delaunay Triangulation to build a graph model with the detected line segments and then reduce the tracking problem to finding the optimal cycle in this graph. This is solved by our newly proposed closed boundary candidates searching algorithm called "Bidirectional Shortest Path (BDSP)". The efficiency and robustness of the proposed method are tested on real video sequences as well as during a robot arm pouring experiment.Comment: 7 pages, 8 figures, The 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017) submission ID 103

    Automatic salient object segmentation based on context and shape prior

    Full text link

    Superedge grouping for object localization by combining appearance and shape information

    Full text link

    Processing boundary and region features for perception

    Get PDF
    A fundamental task for any visual system is the accurate detection of objects from background information, for example, defining fruit from foliage or a predator in a forest. This is commonly referred to as figure-ground segregation, which occurs when the visual system locates differences in visual features across an image, such as colour or texture. Combinations of feature contrast define an object from its surrounds, though the exact nature of that combination is still debated. Two processes are likely to contribute to object conspicuity, the pooling of features within an object's bounds relative to those in the background ('region' contrast) and detecting feature contrast at the boundary itself ('boundary' contrast). Investigations of the relative contributions of these two processes to perception have produced sometimes contradictory findings, some of which can be explained by the methodology adopted in those studies. For example, results from several studies adopting search-based methodologies have advocated nonlinear interaction of the boundary and region processes, whereas results from more subjective methods have indicated a linear combination. This thesis aims to compare search and subjective methodologies to determine how visual features (region and boundary) interact, highlight limitations of these metrics, and then unpack the contributions of boundary and region processes in greater detail. The first and second experiments investigated the relative contributions of boundary strength, regional orientation, and regional spatial frequency to object conspicuity. This was achieved via a comparison of search and subjective methodologies, which, as mentioned, have previously produced conflicting results in this domain. The results advocated a relatively strong contribution of boundary features compared to region-based features, and replicated the apparent incongruence between findings from search-based and subjective metrics. Results from the search task suggest nonlinear interaction and those from the subjective task suggest linear interaction. A unifying model that reconciles these seemingly contradicting findings (and those in the literature) is then presented, which considers the effect of metric sensitivity and performance ceilings in the paradigms employed. In light of the findings from the first and second experiments that suggest a stronger contribution of boundary information to object conspicuity, the third and fourth experiments investigated boundary features in more detail. Anecdotal reports from observers in the earlier experiments suggest that the conspicuity of boundaries is modulated by information in the background, regardless of boundary structure. As such, the relative contributions of boundary-background contrast and boundary composition were investigated using a novel stimulus generation technique that enables their effective isolation. A novel metric for boundary composition that correlates well with perception is also outlined. Results for those experiments suggested a significant contribution of both sources of boundary information, though advocate a critical role for boundary-background contrast. The final experiment explored the contribution of region-based information to object conspicuity in more detail, specifically how higher-order image structure, such as the components of complex texture, contribute to conspicuity. A state-of-the-art texture synthesis model, which reproduces textures via mechanisms that mimic processes in the human visual system, is evaluated respect to its perceptual applicability. Previous evaluations of this synthesis model are extended via a novel approach that enables the isolation of the model's parameters (which simulate physiological mechanisms) for independent examination. An alternative metric for the efficacy of the model is also presented

    Green Function and Electromagnetic Potential for Computer Vision and Convolutional Neural Network Applications

    Get PDF
    RÉSUMÉ Pour les problèmes de vision machine (CV) avancées, tels que la classification, la segmentation de scènes et la détection d’objets salients, il est nécessaire d’extraire le plus de caractéristiques possibles des images. Un des outils les plus utilisés pour l’extraction de caractéristiques est l’utilisation d’un noyau de convolution, où chacun des noyaux est spécialisé pour l’extraction d’une caractéristique donnée. Ceci a mené au développement récent des réseaux de neurones convolutionnels (CNN) qui permet d’optimiser des milliers de noyaux à la fois, faisant du CNN la norme pour l’analyse d’images. Toutefois, une limitation importante du CNN est que les noyaux sont petits (généralement de taille 3x3 à 7x7), ce qui limite l’interaction longue-distance des caractéristiques. Une autre limitation est que la fusion des caractéristiques se fait par des additions pondérées et des opérations de mise en commun (moyennes et maximums locaux). En effet, ces opérations ne permettent pas de fusionner des caractéristiques du domaine spatial avec des caractéristiques puisque ces caractéristiques occupent des positions éloignées sur l’image. L’objectif de cette thèse est de développer des nouveaux noyaux de convolutions basés sur l’électromagnétisme (EM) et les fonctions de Green (GF) pour être utilisés dans des applications de vision machine (CV) et dans des réseaux de neurones convolutionnels (CNN). Ces nouveaux noyaux sont au moins aussi grands que l’image. Ils évitent donc plusieurs des limitations des CNN standards puisqu’ils permettent l’interaction longue-distance entre les pixels de limages. De plus, ils permettent de fusionner les caractéristiques du domaine spatial avec les caractéristiques du domaine du gradient. Aussi, étant donné tout champ vectoriel, les nouveaux noyaux permettent de trouver le champ vectoriel conservatif le plus rapproché du champ initial, ce qui signifie que le nouveau champ devient lisse, irrotationnel et conservatif (intégrable par intégrale curviligne). Pour répondre à cet objectif, nous avons d’abord développé des noyaux convolutionnels symétriques et asymétriques basés sur les propriétés des EM et des GF et résultant en des noyaux qui sont invariants en résolution et en rotation. Ensuite, nous avons développé la première méthode qui permet de déterminer la probabilité d’inclusion dans des contours partiels, permettant donc d’extrapoler des contours fins en des régions continues couvrant l’espace 2D. De plus, la présente thèse démontre que les noyaux basés sur les GF sont les solveurs optimaux du gradient et du Laplacien.----------ABSTRACT For advanced computer vision (CV) tasks such as classification, scene segmentation, and salient object detection, extracting features from images is mandatory. One of the most used tools for feature extraction is the convolutional kernel, with each kernel being specialized for specific feature detection. In recent years, the convolutional neural network (CNN) became the standard method of feature detection since it allowed to optimize thousands of kernels at the same time. However, a limitation of the CNN is that all the kernels are small (usually between 3x3 and 7x7), which limits the receptive field. Another limitation is that feature merging is done via weighted additions and pooling, which cannot be used to merge spatial-domain features with gradient-domain features since they are not located at the same pixel coordinate. The objective of this thesis is to develop electromagnetic (EM) convolutions and Green’s functions (GF) convolutions to be used in Computer Vision and convolutional neural networks (CNN). These new kernels do not have the limitations of the standard CNN kernels since they allow an unlimited receptive field and interaction between any pixel in the image by using kernels bigger than the image. They allow merging spatial domain features with gradient domain features by integrating any vector field. Additionally, they can transform any vector field of features into its least-error conservative field, meaning that the field of features becomes smooth, irrotational and conservative (line-integrable). At first, we developed different symmetrical and asymmetrical convolutional kernel based on EM and GF that are both resolution and rotation invariant. Then we developed the first method of determining the probability of being inside partial edges, which allow extrapolating thin edge features into the full 2D space. Furthermore, the current thesis proves that GF kernels are the least-error gradient and Laplacian solvers, and they are empirically demonstrated to be faster than the fastest competing method and easier to implement. Consequently, using the fast gradient solver, we developed the first method that directly combines edges with saliency maps in the gradient domain, then solves the gradient to go back to the saliency domain. The improvement of the saliency maps over the F-measure is on average 6.6 times better than the nearest competing algorithm on a selected dataset. Then, to improve the saliency maps further, we developed the DSS-GIS model which combines edges with salient regions deep inside the network

    Edge Grouping Combining Boundary and Region Information

    Get PDF
    This paper introduces a new edge-grouping method to detect perceptually salient structures in noisy images. Specifically, we define a new grouping cost function in a ratio form, where the numerator measures the boundary proximity of the resulting structure and the denominator measures the area of the resulting structure. This area term introduces a preference towards detecting larger-size structures and, therefore, makes the resulting edge grouping more robust to image noise. To find the optimal edge grouping with the minimum grouping cost, we develop a special graph model with two different kinds of edges and then reduce the grouping problem to finding a special kind of cycle in this graph with a minimum cost in ratio form. This optimal cycle-finding problem can be solved in polynomial time by a previously developed graph algorithm. We implement this edge-grouping method, test it on both synthetic data and real images, and compare its performance against several available edge-grouping and edge-linking methods. Furthermore, we discuss several extensions of the proposed method, including the incorporation of the well-known grouping cues of continuity and intensity homogeneity, introducing a factor to balance the contributions from the boundary and region information, and the prevention of detecting self-intersecting boundaries
    corecore