183 research outputs found

    Radar-based Application of Pedestrian and Cyclist Micro-Doppler Signatures for Automotive Safety Systems

    Get PDF
    Die sensorbasierte Erfassung des Nahfeldes im Kontext des hochautomatisierten Fahrens erfährt einen spürbaren Trend bei der Integration von Radarsensorik. Fortschritte in der Mikroelektronik erlauben den Einsatz von hochauflösenden Radarsensoren, die durch effiziente Verfahren sowohl im Winkel als auch in der Entfernung und im Doppler die Messgenauigkeit kontinuierlich ansteigen lassen. Dadurch ergeben sich neuartige Möglichkeiten bei der Bestimmung der geometrischen und kinematischen Beschaffenheit ausgedehnter Ziele im Fahrzeugumfeld, die zur gezielten Entwicklung von automotiven Sicherheitssystemen herangezogen werden können. Im Rahmen dieser Arbeit werden ungeschützte Verkehrsteilnehmer wie Fußgänger und Radfahrer mittels eines hochauflösenden Automotive-Radars analysiert. Dabei steht die Erscheinung des Mikro-Doppler-Effekts, hervorgerufen durch das hohe Maß an kinematischen Freiheitsgraden der Objekte, im Vordergrund der Betrachtung. Die durch den Mikro-Doppler-Effekt entstehenden charakteristischen Radar-Signaturen erlauben eine detailliertere Perzeption der Objekte und können in direkten Zusammenhang zu ihren aktuellen Bewegungszuständen gesetzt werden. Es werden neuartige Methoden vorgestellt, die die geometrischen und kinematischen Ausdehnungen der Objekte berücksichtigen und echtzeitfähige Ansätze zur Klassifikation und Verhaltensindikation realisieren. Wird ein ausgedehntes Ziel (z.B. Radfahrer) von einem Radarsensor detektiert, können aus dessen Mikro-Doppler-Signatur wesentliche Eigenschaften bezüglich seines Bewegungszustandes innerhalb eines Messzyklus erfasst werden. Die Geschwindigkeitsverteilungen der sich drehenden Räder erlauben eine adaptive Eingrenzung der Tretbewegung, deren Verhalten essentielle Merkmale im Hinblick auf eine vorausschauende Unfallprädiktion aufweist. Ferner unterliegen ausgedehnte Radarziele einer Orientierungsabhängigkeit, die deren geometrischen und kinematischen Profile direkt beeinflusst. Dies kann sich sowohl negativ auf die Klassifikations-Performance als auch auf die Verwertbarkeit von Parametern auswirken, die eine Absichtsbekundung des Radarziels konstituieren. Am Beispiel des Radfahrers wird hierzu ein Verfahren vorgestellt, das die orientierungsabhängigen Parameter in Entfernung und Doppler normalisiert und die gemessenen Mehrdeutigkeiten kompensiert. Ferner wird in dieser Arbeit eine Methodik vorgestellt, die auf Grundlage des Mikro- Doppler-Profils eines Fußgängers dessen Beinbewegungen über die Zeit schätzt (Tracking) und wertvolle Objektinformationen hinsichtlich seines Bewegungsverhaltens offenbart. Dazu wird ein Bewegungsmodell entwickelt, das die nichtlineare Fortbewegung des Beins approximiert und dessen hohes Maß an biomechanischer Variabilität abbildet. Durch die Einbeziehung einer wahrscheinlichkeitsbasierten Datenassoziation werden die Radar-Detektionen ihren jeweils hervorrufenden Quellen (linkes und rechtes Bein) zugeordnet und eine Trennung der Gliedmaßen realisiert. Im Gegensatz zu bisherigen Tracking-Verfahren weist die vorgestellte Methodik eine Steigerung in der Genauigkeit der Objektinformationen auf und stellt damit einen entscheidenden Vorteil für zukünftige Fahrerassistenzsysteme dar, um deutlich schneller auf kritische Verkehrssituationen reagieren zu können.:1 Introduction 1 1.1 Automotive environmental perception 2 1.2 Contributions of this work 4 1.3 Thesis overview 6 2 Automotive radar 9 2.1 Physical fundamentals 9 2.1.1 Radar cross section 9 2.1.2 Radar equation 10 2.1.3 Micro-Doppler effect 11 2.2 Radar measurement model 15 2.2.1 FMCW radar 15 2.2.2 Chirp sequence modulation 17 2.2.3 Direction-of-arrival estimation 22 2.3 Signal processing 25 2.3.1 Target properties 26 2.3.2 Target extraction 28 Power detection 28 Clustering 30 2.3.3 Real radar data example 31 2.4 Conclusion 33 3 Micro-Doppler applications of a cyclist 35 3.1 Physical fundamentals 35 3.1.1 Micro-Doppler signatures of a cyclist 35 3.1.2 Orientation dependence 36 3.2 Cyclist feature extraction 38 3.2.1 Adaptive pedaling extraction 38 Ellipticity constraints 38 Ellipse fitting algorithm 39 3.2.2 Experimental results 42 3.3 Normalization of the orientation dependence 44 3.3.1 Geometric correction 44 3.3.2 Kinematic correction 45 3.3.3 Experimental results 45 3.4 Conclusion 47 3.5 Discussion and outlook 47 4 Micro-Doppler applications of a pedestrian 49 4.1 Pedestrian detection 49 4.1.1 Human kinematics 49 4.1.2 Micro-Doppler signatures of a pedestrian 51 4.1.3 Experimental results 52 Radially moving pedestrian 52 Crossing pedestrian 54 4.2 Pedestrian feature extraction 57 4.2.1 Frequency-based limb separation 58 4.2.2 Extraction of body parts 60 4.2.3 Experimental results 62 4.3 Pedestrian tracking 64 4.3.1 Probabilistic state estimation 65 4.3.2 Gaussian filters 67 4.3.3 The Kalman filter 67 4.3.4 The extended Kalman filter 69 4.3.5 Multiple-object tracking 71 4.3.6 Data association 74 4.3.7 Joint probabilistic data association 80 4.4 Kinematic-based pedestrian tracking 84 4.4.1 Kinematic modeling 84 4.4.2 Tracking motion model 87 4.4.3 4-D radar point cloud 91 4.4.4 Tracking implementation 92 4.4.5 Experimental results 96 Longitudinal trajectory 96 Crossing trajectory with sudden turn 98 4.5 Conclusion 102 4.6 Discussion and outlook 103 5 Summary and outlook 105 5.1 Developed algorithms 105 5.1.1 Adaptive pedaling extraction 105 5.1.2 Normalization of the orientation dependence 105 5.1.3 Model-based pedestrian tracking 106 5.2 Outlook 106 Bibliography 109 List of Acronyms 119 List of Figures 124 List of Tables 125 Appendix 127 A Derivation of the rotation matrix 2.26 127 B Derivation of the mixed radar signal 2.52 129 C Calculation of the marginal association probabilities 4.51 131 Curriculum Vitae 135Sensor-based detection of the near field in the context of highly automated driving is experiencing a noticeable trend in the integration of radar sensor technology. Advances in microelectronics allow the use of high-resolution radar sensors that continuously increase measurement accuracy through efficient processes in angle as well as distance and Doppler. This opens up novel possibilities in determining the geometric and kinematic nature of extended targets in the vehicle environment, which can be used for the specific development of automotive safety systems. In this work, vulnerable road users such as pedestrians and cyclists are analyzed using a high-resolution automotive radar. The focus is on the appearance of the micro-Doppler effect, caused by the objects’ high kinematic degree of freedom. The characteristic radar signatures produced by the micro-Doppler effect allow a clearer perception of the objects and can be directly related to their current state of motion. Novel methods are presented that consider the geometric and kinematic extents of the objects and realize real-time approaches to classification and behavioral indication. When a radar sensor detects an extended target (e.g., bicyclist), its motion state’s fundamental properties can be captured from its micro-Doppler signature within a measurement cycle. The spinning wheels’ velocity distributions allow an adaptive containment of the pedaling motion, whose behavior exhibits essential characteristics concerning predictive accident prediction. Furthermore, extended radar targets are subject to orientation dependence, directly affecting their geometric and kinematic profiles. This can negatively affect both the classification performance and the usability of parameters constituting the radar target’s intention statement. For this purpose, using the cyclist as an example, a method is presented that normalizes the orientation-dependent parameters in range and Doppler and compensates for the measured ambiguities. Furthermore, this paper presents a methodology that estimates a pedestrian’s leg motion over time (tracking) based on the pedestrian’s micro-Doppler profile and reveals valuable object information regarding his motion behavior. To this end, a motion model is developed that approximates the leg’s nonlinear locomotion and represents its high degree of biomechanical variability. By incorporating likelihood-based data association, radar detections are assigned to their respective evoking sources (left and right leg), and limb separation is realized. In contrast to previous tracking methods, the presented methodology shows an increase in the object information’s accuracy. It thus represents a decisive advantage for future driver assistance systems in order to be able to react significantly faster to critical traffic situations.:1 Introduction 1 1.1 Automotive environmental perception 2 1.2 Contributions of this work 4 1.3 Thesis overview 6 2 Automotive radar 9 2.1 Physical fundamentals 9 2.1.1 Radar cross section 9 2.1.2 Radar equation 10 2.1.3 Micro-Doppler effect 11 2.2 Radar measurement model 15 2.2.1 FMCW radar 15 2.2.2 Chirp sequence modulation 17 2.2.3 Direction-of-arrival estimation 22 2.3 Signal processing 25 2.3.1 Target properties 26 2.3.2 Target extraction 28 Power detection 28 Clustering 30 2.3.3 Real radar data example 31 2.4 Conclusion 33 3 Micro-Doppler applications of a cyclist 35 3.1 Physical fundamentals 35 3.1.1 Micro-Doppler signatures of a cyclist 35 3.1.2 Orientation dependence 36 3.2 Cyclist feature extraction 38 3.2.1 Adaptive pedaling extraction 38 Ellipticity constraints 38 Ellipse fitting algorithm 39 3.2.2 Experimental results 42 3.3 Normalization of the orientation dependence 44 3.3.1 Geometric correction 44 3.3.2 Kinematic correction 45 3.3.3 Experimental results 45 3.4 Conclusion 47 3.5 Discussion and outlook 47 4 Micro-Doppler applications of a pedestrian 49 4.1 Pedestrian detection 49 4.1.1 Human kinematics 49 4.1.2 Micro-Doppler signatures of a pedestrian 51 4.1.3 Experimental results 52 Radially moving pedestrian 52 Crossing pedestrian 54 4.2 Pedestrian feature extraction 57 4.2.1 Frequency-based limb separation 58 4.2.2 Extraction of body parts 60 4.2.3 Experimental results 62 4.3 Pedestrian tracking 64 4.3.1 Probabilistic state estimation 65 4.3.2 Gaussian filters 67 4.3.3 The Kalman filter 67 4.3.4 The extended Kalman filter 69 4.3.5 Multiple-object tracking 71 4.3.6 Data association 74 4.3.7 Joint probabilistic data association 80 4.4 Kinematic-based pedestrian tracking 84 4.4.1 Kinematic modeling 84 4.4.2 Tracking motion model 87 4.4.3 4-D radar point cloud 91 4.4.4 Tracking implementation 92 4.4.5 Experimental results 96 Longitudinal trajectory 96 Crossing trajectory with sudden turn 98 4.5 Conclusion 102 4.6 Discussion and outlook 103 5 Summary and outlook 105 5.1 Developed algorithms 105 5.1.1 Adaptive pedaling extraction 105 5.1.2 Normalization of the orientation dependence 105 5.1.3 Model-based pedestrian tracking 106 5.2 Outlook 106 Bibliography 109 List of Acronyms 119 List of Figures 124 List of Tables 125 Appendix 127 A Derivation of the rotation matrix 2.26 127 B Derivation of the mixed radar signal 2.52 129 C Calculation of the marginal association probabilities 4.51 131 Curriculum Vitae 13

    Robust computational intelligence techniques for visual information processing

    Get PDF
    The third part is exclusively dedicated to the super-resolution of Magnetic Resonance Images. In one of these works, an algorithm based on the random shifting technique is developed. Besides, we studied noise removal and resolution enhancement simultaneously. To end, the cost function of deep networks has been modified by different combinations of norms in order to improve their training. Finally, the general conclusions of the research are presented and discussed, as well as the possible future research lines that are able to make use of the results obtained in this Ph.D. thesis.This Ph.D. thesis is about image processing by computational intelligence techniques. Firstly, a general overview of this book is carried out, where the motivation, the hypothesis, the objectives, and the methodology employed are described. The use and analysis of different mathematical norms will be our goal. After that, state of the art focused on the applications of the image processing proposals is presented. In addition, the fundamentals of the image modalities, with particular attention to magnetic resonance, and the learning techniques used in this research, mainly based on neural networks, are summarized. To end up, the mathematical framework on which this work is based on, ₚ-norms, is defined. Three different parts associated with image processing techniques follow. The first non-introductory part of this book collects the developments which are about image segmentation. Two of them are applications for video surveillance tasks and try to model the background of a scenario using a specific camera. The other work is centered on the medical field, where the goal of segmenting diabetic wounds of a very heterogeneous dataset is addressed. The second part is focused on the optimization and implementation of new models for curve and surface fitting in two and three dimensions, respectively. The first work presents a parabola fitting algorithm based on the measurement of the distances of the interior and exterior points to the focus and the directrix. The second work changes to an ellipse shape, and it ensembles the information of multiple fitting methods. Last, the ellipsoid problem is addressed in a similar way to the parabola

    Geometric uncertainty models for correspondence problems in digital image processing

    Get PDF
    Many recent advances in technology rely heavily on the correct interpretation of an enormous amount of visual information. All available sources of visual data (e.g. cameras in surveillance networks, smartphones, game consoles) must be adequately processed to retrieve the most interesting user information. Therefore, computer vision and image processing techniques gain significant interest at the moment, and will do so in the near future. Most commonly applied image processing algorithms require a reliable solution for correspondence problems. The solution involves, first, the localization of corresponding points -visualizing the same 3D point in the observed scene- in the different images of distinct sources, and second, the computation of consistent geometric transformations relating correspondences on scene objects. This PhD presents a theoretical framework for solving correspondence problems with geometric features (such as points and straight lines) representing rigid objects in image sequences of complex scenes with static and dynamic cameras. The research focuses on localization uncertainty due to errors in feature detection and measurement, and its effect on each step in the solution of a correspondence problem. Whereas most other recent methods apply statistical-based models for spatial localization uncertainty, this work considers a novel geometric approach. Localization uncertainty is modeled as a convex polygonal region in the image space. This model can be efficiently propagated throughout the correspondence finding procedure. It allows for an easy extension toward transformation uncertainty models, and to infer confidence measures to verify the reliability of the outcome in the correspondence framework. Our procedure aims at finding reliable consistent transformations in sets of few and ill-localized features, possibly containing a large fraction of false candidate correspondences. The evaluation of the proposed procedure in practical correspondence problems shows that correct consistent correspondence sets are returned in over 95% of the experiments for small sets of 10-40 features contaminated with up to 400% of false positives and 40% of false negatives. The presented techniques prove to be beneficial in typical image processing applications, such as image registration and rigid object tracking

    Motion planning of mobile robot in dynamic environment using potential field and roadmap based planner

    Get PDF
    Mobile robots are increasingly being used to perform tasks in unknown environments. The potential of robots to undertake such tasks lies in their ability to intelligently and efficiently locate and interact with objects in their environment. My research focuses on developing algorithms to plan paths for mobile robots in a partially known environment observed by an overhead camera. The environment consists of dynamic obstacles and targets. A new methodology, Extrapolated Artificial Potential Field, is proposed for real time robot path planning. An algorithm for probabilistic collision detection and avoidance is used to enhance the planner. The aim of the robot is to select avoidance maneuvers to avoid the dynamic obstacles. The navigation of a mobile robot in a real-world dynamic environment is a complex and daunting task. Consider the case of a mobile robot working in an office environment. It has to avoid the static obstacles such as desks, chairs and cupboards and it also has to consider dynamic obstacles such as humans. In the presence of dynamic obstacles, the robot has to predict the motion of the obstacles. Humans inherently have an intuitive motion prediction scheme when planning a path in a crowded environment. A technique has been developed which predicts the possible future positions of obstacles. This technique coupled with the generalized Voronoi diagram enables the robot to safely navigate in a given environment

    Human-Centric Machine Vision

    Get PDF
    Recently, the algorithms for the processing of the visual information have greatly evolved, providing efficient and effective solutions to cope with the variability and the complexity of real-world environments. These achievements yield to the development of Machine Vision systems that overcome the typical industrial applications, where the environments are controlled and the tasks are very specific, towards the use of innovative solutions to face with everyday needs of people. The Human-Centric Machine Vision can help to solve the problems raised by the needs of our society, e.g. security and safety, health care, medical imaging, and human machine interface. In such applications it is necessary to handle changing, unpredictable and complex situations, and to take care of the presence of humans

    Robust convex optimisation techniques for autonomous vehicle vision-based navigation

    Get PDF
    This thesis investigates new convex optimisation techniques for motion and pose estimation. Numerous computer vision problems can be formulated as optimisation problems. These optimisation problems are generally solved via linear techniques using the singular value decomposition or iterative methods under an L2 norm minimisation. Linear techniques have the advantage of offering a closed-form solution that is simple to implement. The quantity being minimised is, however, not geometrically or statistically meaningful. Conversely, L2 algorithms rely on iterative estimation, where a cost function is minimised using algorithms such as Levenberg-Marquardt, Gauss-Newton, gradient descent or conjugate gradient. The cost functions involved are geometrically interpretable and can statistically be optimal under an assumption of Gaussian noise. However, in addition to their sensitivity to initial conditions, these algorithms are often slow and bear a high probability of getting trapped in a local minimum or producing infeasible solutions, even for small noise levels. In light of the above, in this thesis we focus on developing new techniques for finding solutions via a convex optimisation framework that are globally optimal. Presently convex optimisation techniques in motion estimation have revealed enormous advantages. Indeed, convex optimisation ensures getting a global minimum, and the cost function is geometrically meaningful. Moreover, robust optimisation is a recent approach for optimisation under uncertain data. In recent years the need to cope with uncertain data has become especially acute, particularly where real-world applications are concerned. In such circumstances, robust optimisation aims to recover an optimal solution whose feasibility must be guaranteed for any realisation of the uncertain data. Although many researchers avoid uncertainty due to the added complexity in constructing a robust optimisation model and to lack of knowledge as to the nature of these uncertainties, and especially their propagation, in this thesis robust convex optimisation, while estimating the uncertainties at every step is investigated for the motion estimation problem. First, a solution using convex optimisation coupled to the recursive least squares (RLS) algorithm and the robust H filter is developed for motion estimation. In another solution, uncertainties and their propagation are incorporated in a robust L convex optimisation framework for monocular visual motion estimation. In this solution, robust least squares is combined with a second order cone program (SOCP). A technique to improve the accuracy and the robustness of the fundamental matrix is also investigated in this thesis. This technique uses the covariance intersection approach to fuse feature location uncertainties, which leads to more consistent motion estimates. Loop-closure detection is crucial in improving the robustness of navigation algorithms. In practice, after long navigation in an unknown environment, detecting that a vehicle is in a location it has previously visited gives the opportunity to increase the accuracy and consistency of the estimate. In this context, we have developed an efficient appearance-based method for visual loop-closure detection based on the combination of a Gaussian mixture model with the KD-tree data structure. Deploying this technique for loop-closure detection, a robust L convex posegraph optimisation solution for unmanned aerial vehicle (UAVs) monocular motion estimation is introduced as well. In the literature, most proposed solutions formulate the pose-graph optimisation as a least-squares problem by minimising a cost function using iterative methods. In this work, robust convex optimisation under the L norm is adopted, which efficiently corrects the UAV’s pose after loop-closure detection. To round out the work in this thesis, a system for cooperative monocular visual motion estimation with multiple aerial vehicles is proposed. The cooperative motion estimation employs state-of-the-art approaches for optimisation, individual motion estimation and registration. Three-view geometry algorithms in a convex optimisation framework are deployed on board the monocular vision system for each vehicle. In addition, vehicle-to-vehicle relative pose estimation is performed with a novel robust registration solution in a global optimisation framework. In parallel, and as a complementary solution for the relative pose, a robust non-linear H solution is designed as well to fuse measurements from the UAVs’ on-board inertial sensors with the visual estimates. The suggested contributions have been exhaustively evaluated over a number of real-image data experiments in the laboratory using monocular vision systems and range imaging devices. In this thesis, we propose several solutions towards the goal of robust visual motion estimation using convex optimisation. We show that the convex optimisation framework may be extended to include uncertainty information, to achieve robust and optimal solutions. We observed that convex optimisation is a practical and very appealing alternative to linear techniques and iterative methods

    Sensor resource management with evolutionary algorithms applied to indoor positioning

    Get PDF
    Premio Extraordinario de Doctorado de la UAH en el año académico 2016-2017Esta tesis pretende contribuir a la mejora de la gestión de recursos en sistemas de sensores aplicados a localización en interiores. Mediante esta gestión pueden abordarse dos temas, la colocación de estos sensores y su uso óptimo una vez colocados, centrándose la tesis en el primero de ellos. Durante la tesis se considera el uso de un sistema de posicionamiento en interiores basado en señales infrarrojas con medida de diferencia de fase. Estas medidas de fase son posteriormente transformadas en distancias, con lo cual nuestro problema es el de trilateración hiperbólica utilizando medidas de diferencia de distancia. Aunque se describe un modelo para el error en diferencia de distancias del enlace infrarrojo, podemos abstraernos de este y simplemente considerar que utilizamos medidas de diferencia de distancia que están normalmente distribuidas con una varianza dada por el modelo usado. De hecho, el trabajo expuesto en esta tesis podría ser usado con cualquier otro sistema del cual obtengamos un modelo de los errores de medida, ya sea empleando además trilateración esférica o angulación. La gran mayoría de trabajos que mejoran la precisión de un sistema de posicionamiento colocando sensores optimizan funciones de coste basadas en el límite inferior de Cramér-Rao, enfoque que adoptamos también en este trabajo. En el capítulo de la tesis dedicado al estado del arte hacemos un repaso de las diferentes propuestas existentes, que concluye explicando qué pretendemos aportar sobre las contribuciones existentes en la literatura científica. En resumen, podemos clasificar las propuestas actuales en tres clases. La primera de ellas trata de determinar una configuración óptima para localizar un objetivo, normalmente utilizando el determinante de la matriz de información de Fisher o la dilución de la precisión. Estos métodos pueden obtener expresiones analíticas que proporcionan una explicación sobre como intervienen las características de los sensores y su colocación en la precisión obtenida. Sin embargo, carecen de aplicabilidad en situaciones reales. El segundo tipo de propuestas emplea métodos numéricos para optimizar la colocación de sensores considerando varios objetivos o un área entera. Los métodos propuestos en esta tesis encajan dentro de esta categoría. Por último, existen métodos que utilizan técnicas de selección de sensores para obtener configuraciones óptimas. Entre las distintas propuestas encontramos varias deficiencias, como la simplificación del modelo de error de la medida para obtener expresiones fácilmente tratables, la consideración de un solo criterio de precisión de la localización, colocación de un número determinado y fijo de sensores, o su despliegue en áreas simples que no presenten problemas de oclusiones. Nuestra primera aportación trata de solucionar la consideración de un único criterio de precisión, que normalmente es el determinante o la traza de la matriz de covarianza o información de la estimación. Cada métrica obtenida de estas matrices tiene un significado práctico distinto, y la consideración de solo una de ellas puede dar lugar a soluciones que presenten deficiencias en las otras, como la obtención de elipses de error muy alargadas. Nuestra propuesta implica el uso de algoritmos evolutivos multifunción que optimicen varias de estas métricas, como el error cuadrático medio en todo el área, la isotropía de la solución, y la máxima desviación que puede aparecer. Esto nos permite tener un conjunto de soluciones dadas en un frente de Pareto, que permitirán al gestor de la red de sensores visualizar las posibles soluciones y elegir entre ellas según las necesidades. También permite obtener colocaciones que mejoren la convergencia de algunos estimadores. La segunda contribución de la tesis se ocupa de la colocación de sensores en zonas más complejas, donde existan obstáculos que provoquen oclusiones a algunos sensores. De esta manera, podemos introducir el problema de intentar cubrir la mayor cantidad de puntos del espacio con el número mínimo de sensores necesario para calcular la posición de un objetivo. Dicho número influirá en el porcentaje de área cubierto y en la precisión obtenida, además de aumentar el coste del sistema. Debido a esto, también será un objetivo a optimizar junto a la cobertura y la incertidumbre de la posición estimada. Para llevar a cabo esta optimización se propone una mejora sobre el algoritmo utilizado en la aportación anterior basada en el uso de subpoblaciones y añadiendo operadores genéticos que modifiquen el número de sensores según la cobertura y condensación en los distintos puntos de la zona a cubrir. Cada uno de los capítulos dedicado a las aportaciones descritas contiene resultados y conclusiones que confirman el buen funcionamiento de los métodos propuestos. Finalmente, la tesis concluye con una lista de propuestas que serán estudiadas en un futuro

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research
    corecore