117 research outputs found

    Robust Object Detection with Real-Time Fusion of Multiview Foreground Silhouettes

    Get PDF

    Robust moving object detection by information fusion from multiple cameras

    Get PDF
    Moving object detection is an essential process before tracking and event recognition in video surveillance can take place. To monitor a wider field of view and avoid occlusions in pedestrian tracking, multiple cameras are usually used and homography can be employed to associate multiple camera views. Foreground regions detected from each of the multiple camera views are projected into a virtual top view according to the homography for a plane. The intersection regions of the foreground projections indicate the locations of moving objects on that plane. The homography mapping for a set of parallel planes at different heights can increase the robustness of the detection. However, homography mapping is very time consuming and the intersections of non-corresponding foreground regions can cause false-positive detections. In this thesis, a real-time moving object detection algorithm using multiple cameras is proposed. Unlike the pixelwise homography mapping which projects binary foreground images, the approach used in the research described in this thesis was to approximate the contour of each foreground region with a polygon and only transmit and project the polygon vertices. The foreground projections are rebuilt from the projected polygons in the reference view. The experimental results have shown that this method can be run in real time and generate results similar to those using foreground images. To identify the false-positive detections, both geometrical information and colour cues are utilized. The former is a height matching algorithm based on the geometry between the camera views. The latter is a colour matching algorithm based on the Mahalanobis distance of the colour distributions of two foreground regions. Since the height matching is uncertain in the scenarios with the adjacent pedestrian and colour matching cannot handle occluded pedestrians, the two algorithms are combined to improve the robustness of the foreground intersection classification. The robustness of the proposed algorithm is demonstrated in real-world image sequences

    Pose estimation system based on monocular cameras

    Get PDF
    Our world is full of wonders. It is filled with mysteries and challenges, which through the ages inspired and called for the human civilization to grow itself, either philosophically or sociologically. In time, humans reached their own physical limitations; nevertheless, we created technology to help us overcome it. Like the ancient uncovered land, we are pulled into the discovery and innovation of our time. All of this is possible due to a very human characteristic - our imagination. The world that surrounds us is mostly already discovered, but with the power of computer vision (CV) and augmented reality (AR), we are able to live in multiple hidden universes alongside our own. With the increasing performance and capabilities of the current mobile devices, AR is what we dream it can be. There are still many obstacles, but this future is already our reality, and with the evolving technologies closing the gap between the real and the virtual world, soon it will be possible for us to surround ourselves into other dimensions, or fuse them with our own. This thesis focuses on the development of a system to predict the camera’s pose estimation in the real-world regarding to the virtual world axis. The work was developed as a sub-module integrated on the M5SAR project: Mobile Five Senses Augmented Reality System for Museums, aiming to a more immerse experience with the total or partial replacement of the environments’ surroundings. It is based mainly on man-made buildings indoors and their typical rectangular cuboid shape. With the possibility of knowing the user’s camera direction, we can then superimpose dynamic AR content, inviting the user to explore the hidden worlds. The M5SAR project introduced a new way to explore the existent historical museums by exploring the human’s five senses: hearing, smell, taste, touch, vision. With this innovative technology, the user is able to enhance their visitation and immerse themselves into a virtual world blended with our reality. A mobile device application was built containing an innovating framework: MIRAR - Mobile Image Recognition based Augmented Reality - containing object recognition, navigation, and additional AR information projection in order to enrich the users’ visit, providing an intuitive and compelling information regarding the available artworks, exploring the hearing and vision senses. A device specially designed was built to explore the additional three senses: smell, taste and touch which, when attached to a mobile device, either smartphone or tablet, would pair with it and automatically react in with the offered narrative related to the artwork, immersing the user with a sensorial experience. As mentioned above, the work presented on this thesis is relative to a sub-module of the MIRAR regarding environment detection and the superimposition of AR content. With the main goal being the full replacement of the walls’ contents, and with the possibility of keeping the artwork visible or not, it presented an additional challenge with the limitation of using only monocular cameras. Without the depth information, any 2D image of an environment, to a computer doesn’t represent the tridimensional layout of the real-world dimensions. Nevertheless, man-based building tends to follow a rectangular approach to divisions’ constructions, which allows for a prediction to where the vanishing point on any environment image may point, allowing the reconstruction of an environment’s layout from a 2D image. Furthermore, combining this information with an initial localization through an improved image recognition to retrieve the camera’s spatial position regarding to the real-world coordinates and the virtual-world, alas, pose estimation, allowed for the possibility of superimposing specific localized AR content over the user’s mobile device frame, in order to immerse, i.e., a museum’s visitor into another era correlated to the present artworks’ historical period. Through the work developed for this thesis, it was also presented a better planar surface in space rectification and retrieval, a hybrid and scalable multiple images matching system, a more stabilized outlier filtration applied to the camera’s axis, and a continuous tracking system that works with uncalibrated cameras and is able to achieve particularly obtuse angles and still maintain the surface superimposition. Furthermore, a novelty method using deep learning models for semantic segmentation was introduced for indoor layout estimation based on monocular images. Contrary to the previous developed methods, there is no need to perform geometric calculations to achieve a near state of the art performance with a fraction of the parameters required by similar methods. Contrary to the previous work presented on this thesis, this method performs well even in unseen and cluttered rooms if they follow the Manhattan assumption. An additional lightweight application to retrieve the camera pose estimation is presented using the proposed method.O nosso mundo estΓ‘ repleto de maravilhas. EstΓ‘ cheio de mistΓ©rios e desafios, os quais, ao longo das eras, inspiraram e impulsionaram a civilização humana a evoluir, seja filosΓ³fica ou sociologicamente. Eventualmente, os humanos foram confrontados com os seus limites fΓ­sicos; desta forma, criaram tecnologias que permitiram superΓ‘-los. Assim como as terras antigas por descobrir, somos impulsionados Γ  descoberta e inovação da nossa era, e tudo isso Γ© possΓ­vel graΓ§as a uma caracterΓ­stica marcadamente humana: a nossa imaginação. O mundo que nos rodeia estΓ‘ praticamente todo descoberto, mas com o poder da visΓ£o computacional (VC) e da realidade aumentada (RA), podemos viver em mΓΊltiplos universos ocultos dentro do nosso. Com o aumento da performance e das capacidades dos dispositivos mΓ³veis da atualidade, a RA pode ser exatamente aquilo que sonhamos. Continuam a existir muitos obstΓ‘culos, mas este futuro jΓ‘ Γ© o nosso presente, e com a evolução das tecnologias a fechar o fosso entre o mundo real e o mundo virtual, em breve serΓ‘ possΓ­vel cercarmo-nos de outras dimensΓ΅es, ou fundi-las dentro da nossa. Esta tese foca-se no desenvolvimento de um sistema de predição para a estimação da pose da cΓ’mara no mundo real em relação ao eixo virtual do mundo. Este trabalho foi desenvolvido como um sub-mΓ³dulo integrado no projeto M5SAR: Mobile Five Senses Augmented Reality System for Museums, com o objetivo de alcanΓ§ar uma experiΓͺncia mais imersiva com a substituição total ou parcial dos limites do ambiente. Dedica-se ao interior de edifΓ­cios de arquitetura humana e a sua tΓ­pica forma de retΓ’ngulo cuboide. Com a possibilidade de saber a direção da cΓ’mara do dispositivo, podemos entΓ£o sobrepor conteΓΊdo dinΓ’mico de RA, num convite ao utilizador para explorar os mundos ocultos. O projeto M5SAR introduziu uma nova forma de explorar os museus histΓ³ricos existentes atravΓ©s da exploração dos cinco sentidos humanos: a audição, o cheiro, o paladar, o toque e a visΓ£o. Com essa tecnologia inovadora, o utilizador pode engrandecer a sua visita e mergulhar num mundo virtual mesclado com a nossa realidade. Uma aplicação para dispositivo mΓ³vel foi criada, contendo uma estrutura inovadora: MIRAR - Mobile Image Recognition based Augmented Reality - a possuir o reconhecimento de objetos, navegação e projeção de informação de RA adicional, de forma a enriquecer a visita do utilizador, a fornecer informação intuitiva e interessante em relação Γ s obras de arte disponΓ­veis, a explorar os sentidos da audição e da visΓ£o. Foi tambΓ©m desenhado um dispositivo para exploração em particular dos trΓͺs outros sentidos adicionais: o cheiro, o toque e o sabor. Este dispositivo, quando afixado a um dispositivo mΓ³vel, como um smartphone ou tablet, emparelha e reage com este automaticamente com a narrativa relacionada Γ  obra de arte, a imergir o utilizador numa experiΓͺncia sensorial. Como jΓ‘ referido, o trabalho apresentado nesta tese Γ© relativo a um sub-mΓ³dulo do MIRAR, relativamente Γ  deteção do ambiente e a sobreposição de conteΓΊdo de RA. Sendo o objetivo principal a substituição completa dos conteΓΊdos das paredes, e com a possibilidade de manter as obras de arte visΓ­veis ou nΓ£o, foi apresentado um desafio adicional com a limitação do uso de apenas cΓ’maras monoculares. Sem a informação relativa Γ  profundidade, qualquer imagem bidimensional de um ambiente, para um computador isso nΓ£o se traduz na dimensΓ£o tridimensional das dimensΓ΅es do mundo real. No entanto, as construçáes de origem humana tendem a seguir uma abordagem retangular Γ s divisΓ΅es dos edifΓ­cios, o que permite uma predição de onde poderΓ‘ apontar o ponto de fuga de qualquer ambiente, a permitir a reconstrução da disposição de uma divisΓ£o atravΓ©s de uma imagem bidimensional. Adicionalmente, ao combinar esta informação com uma localização inicial atravΓ©s de um reconhecimento por imagem refinado, para obter a posição espacial da cΓ’mara em relação Γ s coordenadas do mundo real e do mundo virtual, ou seja, uma estimativa da pose, foi possΓ­vel alcanΓ§ar a possibilidade de sobrepor conteΓΊdo de RA especificamente localizado sobre a moldura do dispositivo mΓ³vel, de maneira a imergir, ou seja, colocar o visitante do museu dentro de outra era, relativa ao perΓ­odo histΓ³rico da obra de arte em questΓ£o. Ao longo do trabalho desenvolvido para esta tese, tambΓ©m foi apresentada uma melhor superfΓ­cie planar na recolha e retificação espacial, um sistema de comparação de mΓΊltiplas imagens hΓ­brido e escalΓ‘vel, um filtro de outliers mais estabilizado, aplicado ao eixo da cΓ’mara, e um sistema de tracking contΓ­nuo que funciona com cΓ’maras nΓ£o calibradas e que consegue obter Γ’ngulos particularmente obtusos, continuando a manter a sobreposição da superfΓ­cie. Adicionalmente, um algoritmo inovador baseado num modelo de deep learning para a segmentação semΓ’ntica foi introduzido na estimativa do traΓ§ado com base em imagens monoculares. Ao contrΓ‘rio de mΓ©todos previamente desenvolvidos, nΓ£o Γ© necessΓ‘rio realizar cΓ‘lculos geomΓ©tricos para obter um desempenho prΓ³ximo ao state of the art e ao mesmo tempo usar uma fração dos parΓ’metros requeridos para mΓ©todos semelhantes. Inversamente ao trabalho previamente apresentado nesta tese, este mΓ©todo apresenta um bom desempenho mesmo em divisΓ΅es sem vista ou obstruΓ­das, caso sigam a mesma premissa Manhattan. Uma leve aplicação adicional para obter a posição da cΓ’mara Γ© apresentada usando o mΓ©todo proposto

    Algorithms for trajectory integration in multiple views

    Get PDF
    PhDThis thesis addresses the problem of deriving a coherent and accurate localization of moving objects from partial visual information when data are generated by cameras placed in di erent view angles with respect to the scene. The framework is built around applications of scene monitoring with multiple cameras. Firstly, we demonstrate how a geometric-based solution exploits the relationships between corresponding feature points across views and improves accuracy in object location. Then, we improve the estimation of objects location with geometric transformations that account for lens distortions. Additionally, we study the integration of the partial visual information generated by each individual sensor and their combination into one single frame of observation that considers object association and data fusion. Our approach is fully image-based, only relies on 2D constructs and does not require any complex computation in 3D space. We exploit the continuity and coherence in objects' motion when crossing cameras' elds of view. Additionally, we work under the assumption of planar ground plane and wide baseline (i.e. cameras' viewpoints are far apart). The main contributions are: i) the development of a framework for distributed visual sensing that accounts for inaccuracies in the geometry of multiple views; ii) the reduction of trajectory mapping errors using a statistical-based homography estimation; iii) the integration of a polynomial method for correcting inaccuracies caused by the cameras' lens distortion; iv) a global trajectory reconstruction algorithm that associates and integrates fragments of trajectories generated by each camera

    Geometric uncertainty models for correspondence problems in digital image processing

    Get PDF
    Many recent advances in technology rely heavily on the correct interpretation of an enormous amount of visual information. All available sources of visual data (e.g. cameras in surveillance networks, smartphones, game consoles) must be adequately processed to retrieve the most interesting user information. Therefore, computer vision and image processing techniques gain significant interest at the moment, and will do so in the near future. Most commonly applied image processing algorithms require a reliable solution for correspondence problems. The solution involves, first, the localization of corresponding points -visualizing the same 3D point in the observed scene- in the different images of distinct sources, and second, the computation of consistent geometric transformations relating correspondences on scene objects. This PhD presents a theoretical framework for solving correspondence problems with geometric features (such as points and straight lines) representing rigid objects in image sequences of complex scenes with static and dynamic cameras. The research focuses on localization uncertainty due to errors in feature detection and measurement, and its effect on each step in the solution of a correspondence problem. Whereas most other recent methods apply statistical-based models for spatial localization uncertainty, this work considers a novel geometric approach. Localization uncertainty is modeled as a convex polygonal region in the image space. This model can be efficiently propagated throughout the correspondence finding procedure. It allows for an easy extension toward transformation uncertainty models, and to infer confidence measures to verify the reliability of the outcome in the correspondence framework. Our procedure aims at finding reliable consistent transformations in sets of few and ill-localized features, possibly containing a large fraction of false candidate correspondences. The evaluation of the proposed procedure in practical correspondence problems shows that correct consistent correspondence sets are returned in over 95% of the experiments for small sets of 10-40 features contaminated with up to 400% of false positives and 40% of false negatives. The presented techniques prove to be beneficial in typical image processing applications, such as image registration and rigid object tracking

    Fast human behavior analysis for scene understanding

    Get PDF
    Human behavior analysis has become an active topic of great interest and relevance for a number of applications and areas of research. The research in recent years has been considerably driven by the growing level of criminal behavior in large urban areas and increase of terroristic actions. Also, accurate behavior studies have been applied to sports analysis systems and are emerging in healthcare. When compared to conventional action recognition used in security applications, human behavior analysis techniques designed for embedded applications should satisfy the following technical requirements: (1) Behavior analysis should provide scalable and robust results; (2) High-processing efficiency to achieve (near) real-time operation with low-cost hardware; (3) Extensibility for multiple-camera setup including 3-D modeling to facilitate human behavior understanding and description in various events. The key to our problem statement is that we intend to improve behavior analysis performance while preserving the efficiency of the designed techniques, to allow implementation in embedded environments. More specifically, we look into (1) fast multi-level algorithms incorporating specific domain knowledge, and (2) 3-D configuration techniques for overall enhanced performance. If possible, we explore the performance of the current behavior-analysis techniques for improving accuracy and scalability. To fulfill the above technical requirements and tackle the research problems, we propose a flexible behavior-analysis framework consisting of three processing-layers: (1) pixel-based processing (background modeling with pixel labeling), (2) object-based modeling (human detection, tracking and posture analysis), and (3) event-based analysis (semantic event understanding). In Chapter 3, we specifically contribute to the analysis of individual human behavior. A novel body representation is proposed for posture classification based on a silhouette feature. Only pure binary-shape information is used for posture classification without texture/color or any explicit body models. To this end, we have studied an efficient HV-PCA shape-based descriptor with temporal modeling, which achieves a posture-recognition accuracy rate of about 86% and outperforms other existing proposals. As our human motion scheme is efficient and achieves a fast performance (6-8 frames/second), it enables a fast surveillance system or further analysis of human behavior. In addition, a body-part detection approach is presented. The color and body ratio are combined to provide clues for human body detection and classification. The conventional assumption of up-right body posture is not required. Afterwards, we design and construct a specific framework for fast algorithms and apply them in two applications: tennis sports analysis and surveillance. Chapter 4 deals with tennis sports analysis and presents an automatic real-time system for multi-level analysis of tennis video sequences. First, we employ a 3-D camera model to bridge the pixel-level, object-level and scene-level of tennis sports analysis. Second, a weighted linear model combining the visual cues in the real-world domain is proposed to identify various events. The experimentally found event extraction rate of the system is about 90%. Also, audio signals are combined to enhance the scene analysis performance. The complete proposed application is efficient enough to obtain a real-time or near real-time performance (2-3 frames/second for 720Γ—576 resolution, and 5-7 frames/second for 320Γ—240 resolution, with a P-IV PC running at 3GHz). Chapter 5 addresses surveillance and presents a full real-time behavior-analysis framework, featuring layers at pixel, object, event and visualization level. More specifically, this framework captures the human motion, classifies its posture, infers the semantic event exploiting interaction modeling, and performs the 3-D scene reconstruction. We have introduced our system design based on a specific software architecture, by employing the well-known "4+1" view model. In addition, human behavior analysis algorithms are directly designed for real-time operation and embedded in an experimental runtime AV content-analysis architecture. This executable system is designed to be generic for multiple streaming applications with component-based architectures. To evaluate the performance, we have applied this networked system in a single-camera setup. The experimental platform operates with two Pentium Quadcore engines (2.33 GHz) and 4-GB memory. Performance evaluations have shown that this networked framework is efficient and achieves a fast performance (13-15 frames/second) for monocular video sequences. Moreover, a dual-camera setup is tested within the behavior-analysis framework. After automatic camera calibration is conducted, the 3-D reconstruction and communication among different cameras are achieved. The extra view in the multi-camera setup improves the human tracking and event detection in case of occlusion. This extension of multiple-view fusion improves the event-based semantic analysis by 8.3-16.7% in accuracy rate. The detailed studies of two experimental intelligent applications, i.e., tennis sports analysis and surveillance, have proven their value in several extensive tests in the framework of the European Candela and Cantata ITEA research programs, where our proposed system has demonstrated competitive performance with respect to accuracy and efficiency
    • …
    corecore