5 research outputs found

    Quadtree-based eigendecomposition for pose estimation in the presence of occlusion and background clutter

    Get PDF
    Includes bibliographical references (pages 29-30).Eigendecomposition-based techniques are popular for a number of computer vision problems, e.g., object and pose estimation, because they are purely appearance based and they require few on-line computations. Unfortunately, they also typically require an unobstructed view of the object whose pose is being detected. The presence of occlusion and background clutter precludes the use of the normalizations that are typically applied and significantly alters the appearance of the object under detection. This work presents an algorithm that is based on applying eigendecomposition to a quadtree representation of the image dataset used to describe the appearance of an object. This allows decisions concerning the pose of an object to be based on only those portions of the image in which the algorithm has determined that the object is not occluded. The accuracy and computational efficiency of the proposed approach is evaluated on 16 different objects with up to 50% of the object being occluded and on images of ships in a dockyard

    Three-dimensional scene recovery for measuring sighting distances of rail track assets from monocular forward facing videos

    Get PDF
    Rail track asset sighting distance must be checked regularly to ensure the continued and safe operation of rolling stock. Methods currently used to check asset line-of-sight involve manual labour or laser systems. Video cameras and computer vision techniques provide one possible route for cheaper, automated systems. Three categories of computer vision method are identified for possible application: two-dimensional object recognition, two-dimensional object tracking and three-dimensional scene recovery. However, presented experimentation shows recognition and tracking methods produce less accurate asset line-of-sight results for increasing asset-camera distance. Regarding three-dimensional scene recovery, evidence is presented suggesting a relationship between image feature and recovered scene information. A novel framework which learns these relationships is proposed. Learnt relationships from recovered image features probabilistically limit the search space of future features, improving efficiency. This framework is applied to several scene recovery methods and is shown (on average) to decrease computation by two-thirds for a possible, small decrease in accuracy of recovered scenes. Asset line-of-sight results computed from recovered three-dimensional terrain data are shown to be more accurate than two-dimensional methods, not effected by increasing asset-camera distance. Finally, the analysis of terrain in terms of effect on asset line-of-sight is considered. Terrain elements, segmented using semantic information, are ranked with a metric combining a minimum line-of-sight blocking distance and the growth required to achieve this minimum distance. Since this ranking measure is relative, it is shown how an approximation of the terrain data can be applied, decreasing computation time. Further efficiency increases are found by decomposing the problem into a set of two-dimensional problems and applying binary search techniques. The combination of the research elements presented in this thesis provide efficient methods for automatically analysing asset line-of-sight and the impact of the surrounding terrain, from captured monocular video.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Fouille de graphes pour le suivi d’objets dans les vidéos

    Get PDF
    Detecting and following the main objects of a video is necessary to describe its content in order to, for example, allow for a relevant indexation of the multimedia content by the search engines. Current object tracking approaches either require the user to select the targets to follow, or rely on pre-trained classifiers to detect particular classes of objects such as pedestrians or car for example. Since those methods rely on user intervention or prior knowledge of the content to process, they cannot be applied automatically on amateur videos such as the ones found on YouTube. To solve this problem, we build upon the hypothesis that, in videos with a moving background, the main objects should appear more frequently than the background. Moreover, in a video, the topology of the visual elements composing an object is supposed consistent from one frame to another. We represent each image of the videos with plane graphs modeling their topology. Then, we search for substructures appearing frequently in the database of plane graphs thus created to represent each video. Our contributions cover both fields of graph mining and object tracking. In the first field, our first contribution is to present an efficient plane graph mining algorithm, named PLAGRAM. This algorithm exploits the planarity of the graphs and a new strategy to extend the patterns. The next contributions consist in the introduction of spatio-temporal constraints into the mining process to exploit the fact that, in a video, the motion of objects is small from on frame to another. Thus, we constrain the occurrences of a same pattern to be close in space and time by limiting the number of frames and the spatial distance separating them. We present two new algorithms, DYPLAGRAM which makes use of the temporal constraint to limit the number of extracted patterns, and DYPLAGRAM_ST which efficiently mines frequent spatio-temporal patterns from the datasets representing the videos. In the field of object tracking, our contributions consist in two approaches using the spatio-temporal patterns to track the main objects in videos. The first one is based on a search of the shortest path in a graph connecting the spatio-temporal patterns, while the second one uses a clustering approach to regroup them in order to follow the objects for a longer period of time. We also present two industrial applications of our methodDétecter et suivre les objets principaux d’une vidéo est une étape nécessaire en vue d’en décrire le contenu pour, par exemple, permettre une indexation judicieuse des données multimédia par les moteurs de recherche. Les techniques de suivi d’objets actuelles souffrent de défauts majeurs. En effet, soit elles nécessitent que l’utilisateur désigne la cible a suivre, soit il est nécessaire d’utiliser un classifieur pré-entraîné à reconnaitre une classe spécifique d’objets, comme des humains ou des voitures. Puisque ces méthodes requièrent l’intervention de l’utilisateur ou une connaissance a priori du contenu traité, elles ne sont pas suffisamment génériques pour être appliquées aux vidéos amateurs telles qu’on peut en trouver sur YouTube. Pour résoudre ce problème, nous partons de l’hypothèse que, dans le cas de vidéos dont l’arrière-plan n’est pas fixe, celui-ci apparait moins souvent que les objets intéressants. De plus, dans une vidéo, la topologie des différents éléments visuels composant un objet est supposée consistante d’une image a l’autre. Nous représentons chaque image par un graphe plan modélisant sa topologie. Ensuite, nous recherchons des motifs apparaissant fréquemment dans la base de données de graphes plans ainsi créée pour représenter chaque vidéo. Cette approche nous permet de détecter et suivre les objets principaux d’une vidéo de manière non supervisée en nous basant uniquement sur la fréquence des motifs. Nos contributions sont donc réparties entre les domaines de la fouille de graphes et du suivi d’objets. Dans le premier domaine, notre première contribution est de présenter un algorithme de fouille de graphes plans efficace, appelé PLAGRAM. Cet algorithme exploite la planarité des graphes et une nouvelle stratégie d’extension des motifs. Nous introduisons ensuite des contraintes spatio-temporelles au processus de fouille afin d’exploiter le fait que, dans une vidéo, les objets se déplacent peu d’une image a l’autre. Ainsi, nous contraignons les occurrences d’un même motif a être proches dans l’espace et dans le temps en limitant le nombre d’images et la distance spatiale les séparant. Nous présentons deux nouveaux algorithmes, DYPLAGRAM qui utilise la contrainte temporelle pour limiter le nombre de motifs extraits, et DYPLAGRAM_ST qui extrait efficacement des motifs spatio-temporels fréquents depuis les bases de données représentant les vidéos. Dans le domaine du suivi d’objets, nos contributions consistent en deux approches utilisant les motifs spatio-temporels pour suivre les objets principaux dans les vidéos. La première est basée sur une recherche du chemin de poids minimum dans un graphe connectant les motifs spatio-temporels tandis que l’autre est basée sur une méthode de clustering permettant de regrouper les motifs pour suivre les objets plus longtemps. Nous présentons aussi deux applications industrielles de notre méthod
    corecore