22,983 research outputs found

    Learned perception systems for self-driving vehicles

    Get PDF
    2022 Spring.Includes bibliographical references.Building self-driving vehicles is one of the most impactful technological challenges of modern artificial intelligence. Self-driving vehicles are widely anticipated to revolutionize the way people and freight move. In this dissertation, we present a collection of work that aims to improve the capability of the perception module, an essential module for safe and reliable autonomous driving. Specifically, it focuses on two perception topics: 1) Geo-localization (mapping) of spatially-compact static objects, and 2) Multi-target object detection and tracking of moving objects in the scene. Accurately estimating the position of static objects, such as traffic lights, from the moving camera of a self-driving car is a challenging problem. In this dissertation, we present a system that improves the localization of static objects by jointly optimizing the components of the system via learning. Our system is comprised of networks that perform: 1) 5DoF object pose estimation from a single image, 2) association of objects between pairs of frames, and 3) multi-object tracking to produce the final geo-localization of the static objects within the scene. We evaluate our approach using a publicly available data set, focusing on traffic lights due to data availability. For each component, we compare against contemporary alternatives and show significantly improved performance. We also show that the end-to-end system performance is further improved via joint training of the constituent models. Next, we propose an efficient joint detection and tracking model named DEFT, or "Detection Embeddings for Tracking." The proposed approach relies on an appearance-based object matching network jointly learned with an underlying object detection network. An LSTM is also added to capture motion constraints. DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards while having significant advantages in robustness when applied to more challenging tracking data. DEFT raises the bar on the nuScenes monocular 3D tracking challenge, more than doubling the performance of the previous top method (3.8x on AMOTA, 2.1x on MOTAR). We analyze the difference in performance between DEFT and the next best-published method on nuScenes and find that DEFT is more robust to occlusions and large inter-frame displacements, making it a superior choice for many use-cases. Third, we present an end-to-end model to solve the tasks of detection, tracking, and sequence modeling from raw sensor data, called Attention-based DEFT. Attention-based DEFT extends the original DEFT by adding an attentional encoder module that uses attention to compute tracklet embedding that 1) jointly reasons about the tracklet dependencies and interaction with other objects present in the scene and 2) captures the context and temporal information of the tracklet's past observations. The experimental results show that Attention-based DEFT performs favorably against or comparable to state-of-the-art trackers. Reasoning about the interactions between the actors in the scene allows Attention-based DEFT to boost the model tracking performance in heavily crowded and complex interactive scenes. We validate the sequence modeling effectiveness of the proposed approach by showing its superiority for velocity estimation task over other baseline methods on both simple and complex scenes. The experiments demonstrate the effectiveness of Attention-based DEFT for capturing spatio-temporal interaction of the crowd for velocity estimation task, which helps it to be more robust to handle complexities in densely crowded scenes. The experimental results show that all the joint models in this dissertation perform better than solving each problem independently

    Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery

    Full text link
    Street-view imagery provides us with novel experiences to explore different places remotely. Carefully calibrated street-view images (e.g. Google Street View) can be used for different downstream tasks, e.g. navigation, map features extraction. As personal high-quality cameras have become much more affordable and portable, an enormous amount of crowdsourced street-view images are uploaded to the internet, but commonly with missing or noisy sensor information. To prepare this hidden treasure for "ready-to-use" status, determining missing location information and camera orientation angles are two equally important tasks. Recent methods have achieved high performance on geo-localization of street-view images by cross-view matching with a pool of geo-referenced satellite imagery. However, most of the existing works focus more on geo-localization than estimating the image orientation. In this work, we re-state the importance of finding fine-grained orientation for street-view images, formally define the problem and provide a set of evaluation metrics to assess the quality of the orientation estimation. We propose two methods to improve the granularity of the orientation estimation, achieving 82.4% and 72.3% accuracy for images with estimated angle errors below 2 degrees for CVUSA and CVACT datasets, corresponding to 34.9% and 28.2% absolute improvement compared to previous works. Integrating fine-grained orientation estimation in training also improves the performance on geo-localization, giving top 1 recall 95.5%/85.5% and 86.8%/80.4% for orientation known/unknown tests on the two datasets.Comment: This paper has been accepted by ACM Multimedia 2022. The version contains additional supplementary material

    Engineering data compendium. Human perception and performance. User's guide

    Get PDF
    The concept underlying the Engineering Data Compendium was the product of a research and development program (Integrated Perceptual Information for Designers project) aimed at facilitating the application of basic research findings in human performance to the design and military crew systems. The principal objective was to develop a workable strategy for: (1) identifying and distilling information of potential value to system design from the existing research literature, and (2) presenting this technical information in a way that would aid its accessibility, interpretability, and applicability by systems designers. The present four volumes of the Engineering Data Compendium represent the first implementation of this strategy. This is the first volume, the User's Guide, containing a description of the program and instructions for its use

    A pilot study on discriminative power of features of superficial venous pattern in the hand

    Get PDF
    The goal of the project is to develop an automatic way to identify, represent the superficial vasculature of the back hand and investigate its discriminative power as biometric feature. A prototype of a system that extracts the superficial venous pattern of infrared images of back hands will be described. Enhancement algorithms are used to solve the lack of contrast of the infrared images. To trace the veins, a vessel tracking technique is applied, obtaining binary masks of the superficial venous tree. Successively, a method to estimate the blood vessels calibre, length, the location and angles of vessel junctions, will be presented. The discriminative power of these features will be studied, independently and simultaneously, considering two features vector. Pattern matching of two vasculature maps will be performed, to investigate the uniqueness of the vessel network / L’obiettivo del progetto è di sviluppare un metodo automatico per identificare e rappresentare la rete vascolare superficiale presente nel dorso della mano ed investigare sul suo potere discriminativo come caratteristica biometrica. Un prototipo di sistema che estrae l’albero superficiale delle vene da immagini infrarosse del dorso della mano sarà descritto. Algoritmi per il miglioramento del contrasto delle immagini infrarosse saranno applicati. Per tracciare le vene, una tecnica di tracking verrà utilizzata per ottenere una maschera binaria della rete vascolare. Successivamente, un metodo per stimare il calibro e la lunghezza dei vasi sanguigni, la posizione e gli angoli delle giunzioni sarà trattato. Il potere discriminativo delle precedenti caratteristiche verrà studiato ed una tecnica di pattern matching di due modelli vascolari sarà presentata per verificare l’unicità di quest

    Doctor of Philosophy

    Get PDF
    dissertationX-ray computed tomography (CT) is a widely popular medical imaging technique that allows for viewing of in vivo anatomy and physiology. In order to produce high-quality images and provide reliable treatment, CT imaging requires the precise knowledge of t

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    Get PDF
    No abstract available

    Large-area visually augmented navigation for autonomous underwater vehicles

    Get PDF
    Submitted to the Joint Program in Applied Ocean Science & Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution June 2005This thesis describes a vision-based, large-area, simultaneous localization and mapping (SLAM) algorithm that respects the low-overlap imagery constraints typical of autonomous underwater vehicles (AUVs) while exploiting the inertial sensor information that is routinely available on such platforms. We adopt a systems-level approach exploiting the complementary aspects of inertial sensing and visual perception from a calibrated pose-instrumented platform. This systems-level strategy yields a robust solution to underwater imaging that overcomes many of the unique challenges of a marine environment (e.g., unstructured terrain, low-overlap imagery, moving light source). Our large-area SLAM algorithm recursively incorporates relative-pose constraints using a view-based representation that exploits exact sparsity in the Gaussian canonical form. This sparsity allows for efficient O(n) update complexity in the number of images composing the view-based map by utilizing recent multilevel relaxation techniques. We show that our algorithmic formulation is inherently sparse unlike other feature-based canonical SLAM algorithms, which impose sparseness via pruning approximations. In particular, we investigate the sparsification methodology employed by sparse extended information filters (SEIFs) and offer new insight as to why, and how, its approximation can lead to inconsistencies in the estimated state errors. Lastly, we present a novel algorithm for efficiently extracting consistent marginal covariances useful for data association from the information matrix. In summary, this thesis advances the current state-of-the-art in underwater visual navigation by demonstrating end-to-end automatic processing of the largest visually navigated dataset to date using data collected from a survey of the RMS Titanic (path length over 3 km and 3100 m2 of mapped area). This accomplishment embodies the summed contributions of this thesis to several current SLAM research issues including scalability, 6 degree of freedom motion, unstructured environments, and visual perception.This work was funded in part by the CenSSIS ERC of the National Science Foundation under grant EEC-9986821, in part by the Woods Hole Oceanographic Institution through a grant from the Penzance Foundation, and in part by a NDSEG Fellowship awarded through the Department of Defense

    Multi-level Feedback Joint Representation Learning Network Based on Adaptive Area Elimination for Cross-view Geo-localization

    Get PDF
    Cross-view geo-localization refers to the task of matching the same geographic target using images obtained from different platforms, such as drone-view and satellite-view. However, the view angle of images obtained through different platforms will vary greatly, which can bring great challenges to the cross-view geo-localization task. Therefore, we propose a multi-level feedback joint representation learning network based on adaptive area elimination to solve the cross-view geo-localization problem. In our network model, we first process the extracted global features to obtain part-level and patch-level features. We then utilize these features as feedback to the global features to extract the contextual information in the global features and improve the robustness of the extracted features. In addition, as images obtained from different platforms differ, there will always be some interference when matching images. Therefore, we introduce an adaptive area elimination strategy to erase the interference information in the global features and assist the model in obtaining crucial information. On this basis, the feature correlation loss function is designed to constrain learning when using global feature information, thereby eliminating the possible interference, which can improve the network model performance. Finally, a series of experiments is carried out using two well-known benchmarks, namely University-1652 and SUES-200, and the experimental results show that the proposed network model achieves competitive results, thereby demonstrating the effectiveness of proposed model.</p
    • …
    corecore