56 research outputs found

    Uncertainty Minimization in Robotic 3D Mapping Systems Operating in Dynamic Large-Scale Environments

    Get PDF
    This dissertation research is motivated by the potential and promise of 3D sensing technologies in safety and security applications. With specific focus on unmanned robotic mapping to aid clean-up of hazardous environments, under-vehicle inspection, automatic runway/pavement inspection and modeling of urban environments, we develop modular, multi-sensor, multi-modality robotic 3D imaging prototypes using localization/navigation hardware, laser range scanners and video cameras. While deploying our multi-modality complementary approach to pose and structure recovery in dynamic real-world operating conditions, we observe several data fusion issues that state-of-the-art methodologies are not able to handle. Different bounds on the noise model of heterogeneous sensors, the dynamism of the operating conditions and the interaction of the sensing mechanisms with the environment introduce situations where sensors can intermittently degenerate to accuracy levels lower than their design specification. This observation necessitates the derivation of methods to integrate multi-sensor data considering sensor conflict, performance degradation and potential failure during operation. Our work in this dissertation contributes the derivation of a fault-diagnosis framework inspired by information complexity theory to the data fusion literature. We implement the framework as opportunistic sensing intelligence that is able to evolve a belief policy on the sensors within the multi-agent 3D mapping systems to survive and counter concerns of failure in challenging operating conditions. The implementation of the information-theoretic framework, in addition to eliminating failed/non-functional sensors and avoiding catastrophic fusion, is able to minimize uncertainty during autonomous operation by adaptively deciding to fuse or choose believable sensors. We demonstrate our framework through experiments in multi-sensor robot state localization in large scale dynamic environments and vision-based 3D inference. Our modular hardware and software design of robotic imaging prototypes along with the opportunistic sensing intelligence provides significant improvements towards autonomous accurate photo-realistic 3D mapping and remote visualization of scenes for the motivating applications

    Deep Learning for 3D Visual Perception

    Get PDF
    La percepción visual 3D se refiere al conjunto de problemas que engloban la reunión de información a través de un sensor visual y la estimación la posición tridimensional y estructura de los objetos y formaciones al rededor del sensor. Algunas funcionalidades como la estimación de la ego moción o construcción de mapas are esenciales para otras tareas de más alto nivel como conducción autónoma o realidad aumentada. En esta tesis se han atacado varios desafíos en la percepción 3D, todos ellos útiles desde la perspectiva de SLAM (Localización y Mapeo Simultáneos) que en si es un problema de percepción 3D.Localización y Mapeo Simultáneos –SLAM– busca realizar el seguimiento de la posición de un dispositivo (por ejemplo de un robot, un teléfono o unas gafas de realidad virtual) con respecto al mapa que está construyendo simultáneamente mientras la plataforma explora el entorno. SLAM es una tecnología muy relevante en distintas aplicaciones como realidad virtual, realidad aumentada o conducción autónoma. SLAM Visual es el termino utilizado para referirse al problema de SLAM resuelto utilizando unicamente sensores visuales. Muchas de las piezas del sistema ideal de SLAM son, hoy en día, bien conocidas, maduras y en muchos casos presentes en aplicaciones. Sin embargo, hay otras piezas que todavía presentan desafíos de investigación significantes. En particular, en los que hemos trabajado en esta tesis son la estimación de la estructura 3D al rededor de una cámara a partir de una sola imagen, reconocimiento de lugares ya visitados bajo cambios de apariencia drásticos, reconstrucción de alto nivel o SLAM en entornos dinámicos; todos ellos utilizando redes neuronales profundas.Estimación de profundidad monocular is la tarea de percibir la distancia a la cámara de cada uno de los pixeles en la imagen, utilizando solo la información que obtenemos de una única imagen. Este es un problema mal condicionado, y por lo tanto es muy difícil de inferir la profundidad exacta de los puntos en una sola imagen. Requiere conocimiento de lo que se ve y del sensor que utilizamos. Por ejemplo, si podemos saber que un modelo de coche tiene cierta altura y también sabemos el tipo de cámara que hemos utilizado (distancia focal, tamaño de pixel...); podemos decir que si ese coche tiene cierta altura en la imagen, por ejemplo 50 pixeles, esta a cierta distancia de la cámara. Para ello nosotros presentamos el primer trabajo capaz de estimar profundidad a partir de una sola vista que es capaz de obtener un funcionamiento razonable con múltiples tipos de cámara; como un teléfono o una cámara de video.También presentamos como estimar, utilizando una sola imagen, la estructura de una habitación o el plan de la habitación. Para este segundo trabajo, aprovechamos imágenes esféricas tomadas por una cámara panorámica utilizando una representación equirectangular. Utilizando estas imágenes recuperamos el plan de la habitación, nuestro objetivo es reconocer las pistas en la imagen que definen la estructura de una habitación. Nos centramos en recuperar la versión más simple, que son las lineas que separan suelo, paredes y techo.Localización y mapeo a largo plazo requiere dar solución a los cambios de apariencia en el entorno; el efecto que puede tener en una imagen tomarla en invierno o verano puede ser muy grande. Introducimos un modelo multivista invariante a cambios de apariencia que resuelve el problema de reconocimiento de lugares de forma robusta. El reconocimiento de lugares visual trata de identificar un lugar que ya hemos visitado asociando pistas visuales que se ven en las imágenes; la tomada en el pasado y la tomada en el presente. Lo preferible es ser invariante a cambios en punto de vista, iluminación, objetos dinámicos y cambios de apariencia a largo plazo como el día y la noche, las estaciones o el clima.Para tener funcionalidad a largo plazo también presentamos DynaSLAM, un sistema de SLAM que distingue las partes estáticas y dinámicas de la escena. Se asegura de estimar su posición unicamente basándose en las partes estáticas y solo reconstruye el mapa de las partes estáticas. De forma que si visitamos una escena de nuevo, nuestro mapa no se ve afectado por la presencia de nuevos objetos dinámicos o la desaparición de los anteriores.En resumen, en esta tesis contribuimos a diferentes problemas de percepción 3D; todos ellos resuelven problemas del SLAM Visual.<br /

    Body image distortion in photography

    Get PDF
    This thesis investigates the theory that photography is, in terms of body image perception, an intrinsically distorting and often fattening medium. In the professional practice of photography, film and television, there is a widely held belief that the camera "adds 10lbs" to the portrayed weight of actors and presenters. The primary questions addressed here relate to the true extent of the fattening effect, to what perceptual mechanisms it can be ascribed and if it can be counteracted in common practice. Current theories in the perception of photographic images rarely, if ever discuss the medium's perceptual accuracy in recording the original scene. It is assumed by many users that most photographs convey essentially the same information they would have seen had they been present when they were taken. Further, it is generally accepted that photographs are an accurate, veridical and scientific method of record and their content should be trusted unless there is evidence of a technical failure, editing or deliberate tampering. This thesis investigates whether this level of trust is appropriate, specifically by examining the reliability of photography in relation to reproducing the face and form of human subjects. Body Image Distortion (B.I.D.) is a term normally used to describe the primary diagnostic symptom of the slimming disease, anorexia nervosa. However, it is demonstrated here that people viewing 2D photographic portraits often make very significant overestimations of size when comparing otherwise identical stereoscopic images. The conclusion is that losing stereoscopic information in conventional 2D photography will cause distortions of perceived body image, and that this is often seen as a distinct flattening and fattening effect. A second fattening effect was also identified in the use of telephoto lenses. It is demonstrated, using psychophysical experiments and geometry that these 2D images cannot convey the same spatial or volumetric information that normal human orthostereoscopic perception will give. The evidence gathered suggests that the Human Visual System requires images to be orthostereoscopic, and be captured using two cameras that mimic as closely as possible the natural vergences, angle of view, depth of field, magnification, brightness, contrast and colour to reproduce scenes as accurately as possible. The experiments reported use three different size estimation methodologies: stereoscopic versus monocular comparisons of human and virtual targets, bodyweight estimations in portraits taken at differing camera to subject distances and synoptic versus direct viewing comparisons. The three techniques were used because photographic images are typically made without disparity and accommodation/vergence information, but with magnifications that are greater than found with direct viewing of a target. By separately analysing the effects of disparity, magnification and accommodation/vergence the reported experiments show how changes in each condition can effect size estimation in photographs. The data suggest that photographs made without orthostereoscopic information will lead to predictably distorted perception and that conventional 2D imaging will almost always cause a significant flattening and fattening effect. In addition, it is argued that the conveyed jaw size, in relation to neck width is an important factor in body-weight perception and this will lead to sexually dimorphic perception: disproportionately larger estimations of bodyweight are made for female faces than male faces under the same photographic conditions

    Occlusion handling in video surveillance systems

    Get PDF

    A graph-theory-based C-space path planner for mobile robotic manipulators in close-proximity environments

    Get PDF
    In this thesis a novel guidance method for a 3-degree-of-freedom robotic manipulator arm in 3 dimensions for Improvised Explosive Device (IED) disposal has been developed. The work carried out in this thesis combines existing methods to develop a technique that delivers advantages taken from several other guidance techniques. These features are necessary for the IED disposal application. The work carried out in this thesis includes kinematic and dynamic modelling of robotic manipulators, T-space to C-space conversion, and path generation using Graph Theory to produce a guidance technique which can plan a safe path through a complex unknown environment. The method improves upon advantages given by other techniques in that it produces a suitable path in 3-dimensions in close-proximity environments in real time with no a priori knowledge of the environment, a necessary precursor to the application of this technique to IED disposal missions. To solve the problem of path planning, the thesis derives the kinematics and dynamics of a robotic arm in order to convert the Euclidean coordinates of measured environment data into C-space. Each dimension in C-space is one control input of the arm. The Euclidean start and end locations of the manipulator end effector are translated into C-space. A three-dimensional path is generated between them using Dijkstra’s Algorithm. The technique allows for a single path to be generated to guide the entire arm through the environment, rather than multiple paths to guide each component through the environment. The robotic arm parameters are modelled as a quasi-linear parameter varying system. As such it requires gain scheduling control, thus allowing compensation of the non-linearities in the system. A Genetic Algorithm is applied to tune a set of PID controllers for the dynamic model of the manipulator arm so that the generated path can then be followed using a conventional path-following algorithm. The technique proposed in this thesis is validated using numerical simulations in order to determine its advantages and limitations

    Deep Learning Localization for Self-driving Cars

    Get PDF
    Identifying the location of an autonomous car with the help of visual sensors can be a good alternative to traditional approaches like Global Positioning Systems (GPS) which are often inaccurate and absent due to insufficient network coverage. Recent research in deep learning has produced excellent results in different domains leading to the proposition of this thesis which uses deep learning to solve the problem of localization in smart cars with visual data. Deep Convolutional Neural Networks (CNNs) were used to train models on visual data corresponding to unique locations throughout a geographic location. In order to evaluate the performance of these models, multiple datasets were created from Google Street View as well as manually by driving a golf cart around the campus while collecting GPS tagged frames. The efficacy of the CNN models was also investigated across different weather/light conditions. Validation accuracies as high as 98% were obtained from some of these models, proving that this novel method has the potential to act as an alternative or aid to traditional GPS based localization methods for cars. The root mean square (RMS) precision of Google Maps is often between 2-10m. However, the precision required for the navigation of self-driving cars is between 2-10cm. Empirically, this precision has been achieved with the help of different error-correction systems on GPS feedback. The proposed method was able to achieve an approximate localization precision of 25 cm without the help of any external error correction system

    Einfluss der Objekterkennung auf die neuronalen Prozesse der Steuerung von Greifbewegungen

    Get PDF
    Das zielgerichtete Ergreifen bildet beim Menschen eine wesentliche Grundlage der Interaktion mit unserer Umwelt und somit des selbstständigen Lebens. Gleichzeitig stellt dieser Vorgang hohe Anforderungen an die zentralnervöse Verarbeitung: das erwünschte Ziel muss unter vielen möglichen Alternativen ausgewählt werden, seine Größe und räumliche Lage aus der visuellen Information ermittelt, und der Erstellung eines motorischen Programms zugeführt werden. Dabei gelingt dem gesunden Menschen eine fließende Bewegung mit adäquater Handformung, objektbezogener Griffskalierung und genau dosiertem Krafteinsatz. Die Untersuchung der neuronalen Grundlagen des visuell gesteuerten Ergreifens beim Gesunden bildet daher eine unverzichtbare Grundlage zum Verständnis von neurologischen Krankheitsbildern, die mit einer Einschränkung dessen einhergehen. Eine dieser Störungen ist die optische Ataxie. Patienten mit einer optischen Ataxie zeigen Defizite im zielgerichteten Ergreifen von Gegenständen, bei vorhandener Fähigkeit diese Gegenstände zu erkennen und zu beschreiben (Jakobson, Archibald, Carey, & Goodale, 1991; Jeannerod, 1986). Ein gängiges Modell (M. A. Goodale & Milner, 1992) erklärt dieses Verhalten mit der dualen Dissoziation der visuellen Informationsverarbeitung in zwei größtenteils voneinander unabhängige Verarbeitungsströme. Der ventrale okzipitotemporale Strom dient der Erkennung von Objekten, während der dorsale okzipitoparietale Strom, nur anhand von visuell feststellbaren physikalischen Eckdaten, wie Größe und Entfernung der Objekte, zur Steuerung und Planung des motorischen Ergreifens dient. In einer Fallstudie (Jeannerod, Decety, & Michel, 1994) zeigte sich jedoch, dass eine Patientin mit optischer Ataxie ihr bekannte Gegenstände, wie z.B. einen Lippenstift, präziser ergreifen konnte als abstrakte zylindrische Objekte. Eine naheliegende Folgerung ist, dass die Identifikation bekannter Gegenstände maßgeblich in die zerebralen Prozesse der Greifbewegungssteuerung einfließen muss. Im Rahmen unserer fMRT Studie haben gesunde junge Probanden bei laufender funktioneller Magnetresonanzmessung nach bedeutungsvollen Alltagsgegenständen, wie z.B. einem Textmarker oder einer Streichholzschachtel, und assoziationsfreien einfarbigen Holzblöcken griffen. Dabei wurde der komplette Bewegungsablauf mit 2 MR-kompatiblen Kameras aufgezeichnet und auf kinematische Basisparameter wie Reaktionszeit, Bewegungszeit etc. untersucht. In einer methodologischen Untersuchung konnten wir eine deutliche Auswirkung der Inklusion dieser Basisparameter in die funktionelle Ganzgehirnanalyse feststellen und eine geeignete Strategie zur Integration dieser Parameter in die fMRT-Analyse finden. Die darauffolgende vergleichende Analyse des visuell gesteuerten Ergreifens ergab höhere Signalunterschiede in den Gehirnarealen lateraler okzipitaler Kortex (LOC), anteriorer intraparietaler Sulcus (aIPS) und ventraler prämotorischer Kortex (PMv) beim Ergreifen von bedeutungsvollen Alltagsgegenständen im Vergleich zum Ergreifen von in ihren physikalischen Dimensionen zu den Alltagsgegenständen passenden einfarbigen Holzblöcken. Bei der aufmerksameren Betrachtung der beiden Objektkategorien konnten wir stärkere Signale beim Betrachten der Alltagsgegenstände im Vergleich zum Betrachten der Holzblöcke nur im LOC feststellen. In den Regionen aIPS und PMv wurden bei der aufmerksamen Betrachtung keine signifikanten Signalunterschiede gefunden. Somit konnten wir den LOC erwartungsgemäß als maßgeblich in der Objekterkennung involviertes Areal sowohl beim aufmerksamen Ansehen als auch beim visuell gesteuerten Ergreifen feststellen. Währenddessen stellten sich, anders als ausgehend vom Modell von Goodale und Milner (1992) zu erwarten wäre, aIPS und PMv als greifrelevante Areale dar, die zur Integration der aus der Objekterkennung hervorgegangenen erfahrungsbasierten Informationen in die motorische Planung des Ergreifens beitragen
    corecore