26 research outputs found

    Data Acquisition and Processing Pipeline for E-Scooter Tracking Using 3d Lidar and Multi-Camera Setup

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Analyzing behaviors of objects on the road is a complex task that requires data from various sensors and their fusion to recreate the movement of objects with a high degree of accuracy. A data collection and processing system are thus needed to track the objects accurately in order to make an accurate and clear map of the trajectories of objects relative to various coordinate frame(s) of interest in the map. Detection and tracking moving objects (DATMO) and Simultaneous localization and mapping (SLAM) are the tasks that needs to be achieved in conjunction to create a clear map of the road comprising of the moving and static objects. These computational problems are commonly solved and used to aid scenario reconstruction for the objects of interest. The tracking of objects can be done in various ways, utilizing sensors such as monocular or stereo cameras, Light Detection and Ranging (LIDAR) sensors as well as Inertial Navigation systems (INS) systems. One relatively common method for solving DATMO and SLAM involves utilizing a 3D LIDAR with multiple monocular cameras in conjunction with an inertial measurement unit (IMU) allows for redundancies to maintain object classification and tracking with the help of sensor fusion in cases when sensor specific traditional algorithms prove to be ineffectual when either sensor falls short due to their limitations. The usage of the IMU and sensor fusion methods relatively eliminates the need for having an expensive INS rig. Fusion of these sensors allows for more effectual tracking to utilize the maximum potential of each sensor while allowing for methods to increase perceptional accuracy. The focus of this thesis will be the dock-less e-scooter and the primary goal will be to track its movements effectively and accurately with respect to cars on the road and the world. Since it is relatively more common to observe a car on the road than e-scooters, we propose a data collection system that can be built on top of an e-scooter and an offline processing pipeline that can be used to collect data in order to understand the behaviors of the e-scooters themselves. In this thesis, we plan to explore a data collection system involving a 3D LIDAR sensor and multiple monocular cameras and an IMU on an e-scooter as well as an offline method for processing the data to generate data to aid scenario reconstruction

    Improving the Geotagging Accuracy of Street-level Images

    Get PDF
    Integrating images taken at street-level with satellite imagery is becoming increasingly valuable in the decision-making processes not only for individuals, but also in business and governmental sectors. To perform this integration, images taken at street-level need to be accurately georeferenced. This georeference information can be derived from a global positioning system (GPS). However, GPS data is prone to errors up to 15 meters, and needs to be corrected for the purpose of geo-referencing. In this thesis, an automatic method is proposed for correcting the georeference information obtained from the GPS data, based on image registration techniques. The proposed method uses an optimization technique to find local optimal solutions by matching high-level features and their relative locations. A global optimization method is then employed over all of the local solutions by applying a geometric constraint. The main contribution of this thesis is introducing a new direction for correcting the GPS data which is more economical and more consistent compared to existing manual method. Other than high cost (labor and management), the main concern with manual correction is the low degree of consistency between different human operators. Our proposed automatic software-based method is a solution for these drawbacks. Other contributions can be listed as (1) modified Chamfer matching (CM) cost function which improves the accuracy of standard CM for images with various misleading/disturbing edges; (2) Monte-Carlo-inspired statistical analysis which made it possible to quantify the overall performance of the proposed algorithm; (3) Novel similarity measure for applying normalized cross correlation (NCC) technique on multi-level thresholded images, which is used to compare multi-modal images more accurately as compared to standard application of NCC on raw images. (4) Casting the problem of selecting an optimal global solution among set of local minima into a problem of finding an optimal path in a graph using Dijkstra\u27s algorithm. We used our algorithm for correcting the georeference information of 20 chains containing more than 7000 fisheye images and our experimental results show that the proposed algorithm can achieve an average error of 2 meters, which is acceptable for most of applications

    Görsel-ataletsel duyaç tümleştirme kullanılarak şehirlerde 3b modelleme.

    Get PDF
    In this dissertation, a real-time, autonomous and geo-registered approach is presented to tackle the large scale 3D urban modeling problem using a camera and inertial sensors. The proposed approach exploits the special structures of urban areas and visual-inertial sensor fusion. The buildings in urban areas are assumed to have planar facades that are perpendicular to the local level. A sparse 3D point cloud of the imaged scene is obtained from visual feature matches using camera poses estimates, and planar patches are obtained by an iterative Hough Transform on the 2D projection of the sparse 3D point cloud in the direction of gravity. The result is a compact and dense depth map of the building facades in terms of planar patches. The plane extraction is performed on sequential frames and a complete model is obtained by plane fusion. Inertial sensor integration helps to improve camera pose estimation, 3D reconstruction and planar modeling stages. For camera pose estimation, the visual measurements are integrated with the inertial sensors by means of an indirect feedback Kalman filter. This integration helps to get reliable and geo-referenced camera pose estimates in the absence of GPS. The inertial sensors are also used to filter out spurious visual feature matches in the 3D reconstruction stage, find the direction of gravity in plane search stage, and eliminate out of scope objects from the model using elevation data. The visual-inertial sensor fusion and urban heuristics utilization are shown to outperform the classical approaches for large scale urban modeling in terms of consistency and real-time applicability.Ph.D. - Doctoral Progra

    Image-based recognition, 3D localization, and retro-reflectivity evaluation of high-quantity low-cost roadway assets for enhanced condition assessment

    Get PDF
    Systematic condition assessment of high-quantity low-cost roadway assets such as traffic signs, guardrails, and pavement markings requires frequent reporting on location and up-to-date status of these assets. Today, most Departments of Transportation (DOTs) in the US collect data using camera-mounted vehicles to filter, annotate, organize, and present the data necessary for these assessments. However, the cost and complexity of the collection, analysis, and reporting as-is conditions result in sparse and infrequent monitoring. Thus, some of the gains in efficiency are consumed by monitoring costs. This dissertation proposes to improve frequency, detail, and applicability of image-based condition assessment via automating detection, classification, and 3D localization of multiple types of high-quantity low-cost roadway assets using both images collected by the DOTs and online databases such Google Street View Images. To address the new requirements of US Federal Highway Administration (FHWA), a new method is also developed that simulates nighttime visibility of traffic signs from images taken during daytime and measures their retro-reflectivity condition. To initiate detection and classification of high-quantity low-cost roadway assets from street-level images, a number of algorithms are proposed that automatically segment and localize high-level asset categories in 3D. The first set of algorithms focus on the task of detecting and segmenting assets at high-level categories. More specifically, a method based on Semantic Texton Forest classifiers, segments each geo-registered 2D video frame at the pixel-level based on shape, texture, and color. A Structure from Motion (SfM) procedure reconstructs the road and its assets in 3D. Next, a voting scheme assigns the most observed asset category to each point in 3D. The experimental results from application of this method are promising, nevertheless because this method relies on using supervised ground-truth pixel labels for training purposes, scaling it to various types of assets is challenging. To address this issue, a non-parametric image parsing method is proposed that leverages lazy learning scheme for segmentation and recognition of roadway assets. The semi-supervised technique used in the proposed method does not need training and provides ground truth data in a more efficient manner. It is easily scalable to thousands of video frames captured during data collection. Once the high-level asset categories are detected, specific techniques needs to be exploited to detect and classify the assets at a higher level of granularity. To this end, performance of three computer vision algorithms are evaluated for classification of traffic signs in presence of cluttered backgrounds and static and dynamic occlusions. Without making any prior assumptions about the location of traffic signs in 2D, the best performing method uses histograms of oriented gradients and color together with multiple one-vs-all Support Vector Machines, and classifies these assets into warning, regulatory, stop, and yield sign categories. To minimize the reliance on visual data collected by the DOTs and improve frequency and applicability of condition assessment, a new end-to-end procedure is presented that applies the above algorithms and creates comprehensive inventory of traffic signs using Google Street View images. By processing images extracted using Google Street View API and discriminative classification scores from all images that see a sign, the most probable 3D location of each traffic sign is derived and is shown on the Google Earth using a dynamic heat map. A data card containing information about location, type, and condition of each detected traffic sign is also created. Finally, a computer vision-based algorithm is proposed that measures retro-reflectivity of traffic signs during daytime using a vehicle mounted device. The algorithm simulates nighttime visibility of traffic signs from images taken during daytime and measures their retro-reflectivity. The technique is faster, cheaper, and safer compared to the state-of-the-art as it neither requires nighttime operation nor requires manual sign inspection. It also satisfies measurement guidelines set forth by FHWA both in terms of granularity and accuracy. To validate the techniques, new detailed video datasets and their ground-truth were generated from 2.2-mile smart road research facility and two interstate highways in the US. The comprehensive dataset contains over 11,000 annotated U.S. traffic sign images and exhibits large variations in sign pose, scale, background, illumination, and occlusion conditions. The performance of all algorithms were examined using these datasets. For retro-reflectivity measurement of traffic signs, experiments were conducted at different times of day and for different distances. Results were compared with a method recommended by ASTM standards. The experimental results show promise in scalability of these methods to reduce the time and effort required for developing road inventories, especially for those assets such as guardrails and traffic lights that are not typically considered in 2D asset recognition methods and also multiple categories of traffic signs. The applicability of Google Street View Images for inventory management purposes and also the technique for retro-reflectivity measurement during daytime demonstrate strong potential in lowering inspection costs and improving safety in practical applications

    Exploring the Visual Landscape: Advances in Physiognomic Landscape Research in the Netherlands

    Get PDF
    Exploring the Visual Landscape is about the combination of landscape research and planning, visual perception and Geographic Information Science. It showcases possible ways of getting a grip on themes like: landscape openness, cluttering of the rural landscape, high-rise buildings in relation to cityscape, historic landscapes and motorway panoramas. It offers clues for visual landscape assessment of spaces in cities, parks and rural areas. In that respect, it extends the long tradition in the Netherlands on physiognomic landscape research and shows the state of the art at this moment. Exploring the Visual Landscape offers important clues for theory, methodology and application in research and development of landscapes all over the world, from a specifically Dutch academic context. It provides a wide range of insights into the psychological background of landscape perception, the technical considerations of geomatics and methodology in landscape architecture, urban planning and design. Furthermore, there are some experiences worthwhile considering, which demonstrate how this research can be applied in the practice of landscape policy making

    Anchoring digital maps as rough guides : a practice-orientated digital sociology of map use

    Get PDF
    This thesis provides a theoretical contribution towards understanding how, and to what extent, people’s engagements with digital maps feature in the constitution of their social practices. Existing theory tends not to focus on people as active interpreters that engage with digital maps across a variety of contexts, or on the influence of their map use on wider sets of social practices. Addressing this, the thesis draws on practice theory, media studies, and internet studies to develop a conceptual framework, applying it to empirical findings to address three research questions: (1) How do people engage with digital maps; (2) How do people engage with the web-based affordances of digital maps, such as those for collaboration, sharing, and end-user amendment/generation of content; and (3) What influence does people’s engagement with digital maps have on the way they perform wider sets of social practices? The research provides insights from three contexts, each operating at a different temporal scale: home choice covers longer-term processes of selecting and viewing properties before buying or renting; countryside leisure-walking covers mid-term processes of route-planning and assessment; University orientation covers shorter-term processes of navigation and gaining orientation around campus. Those insights are gathered through: a scoping survey (N=260) to identify relevant contexts; 32 semi-structured interviews to initiate data analysis; and 3 focus groups to gather participant feedback (member validation) on the emerging analysis. The approach to data analysis borrows heavily from constructivist grounded theory (albeit sensitised by practice theory ontology) to generate seven concepts. Together, the concepts constitute a practicetheory oriented digital sociology of map use. Overall, this thesis argues that digital maps are engaged with as mundane technologies that partially anchor people’s senses of place and security (physical and ontological), their performance of practices and social positions, and more broadly, the movement and distribution of bodies in space

    Fusing Multimedia Data Into Dynamic Virtual Environments

    Get PDF
    In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education

    Cinematic assemblage: Sinofuturist worldbuilding and the smart city

    Get PDF
    New forms of digital surveillance have given rise to a data-driven urban condition, one where machine vision increasingly determines mobility and navigation. The ‘cinematic assemblage’, as I term it, refers to a machinic agent that contains many of the sensory, recording, and representational components required to create cinema, but is itself an object of cinematic interest. Framed through surveillance studies, theories of digital cinema, and critical legal frameworks, I investigate how filmmaking practice conducted entirely within a video game engine can embody the logic of two interrelated forms of cinematic assemblage—the smart city and the self-driving car. The resulting feature-length animated film, Death Drive, draws from liberatory practices in non-Western Futurism to formulate a legal fiction about the emergence of electronic personhood within contemporary China. Cinematic assemblage operates through posthuman approaches to distributed agency and embodied vision. This enables an analysis of the smart city and self-driving car as being coconstitutive, with both continually monitoring each other in an enmeshed system of sensing and control. To understand the hierarchy of sensorial regimes in this larger assemblage, I present a particular approach to image production. My practice explores the creation of virtual cinematographic apparatus in video game engines, using filmmaking to embody active and agential characteristics of digital surveillance systems. Based on existing self-driving car imagery, the rendered footage used to compose the film is constructed entirely within the game engine, but also references the coordinates and language of existing data and systems. This builds upon Harun Farocki’s notion of operational images to explore how a reflexive approach to filmmaking can address how surveillance functions in the smart city. In the process of developing the film, I ask how to situate my research without perpetuating either Chinese exceptionalism or Western coloniality. I look to Futurist practices that interrogate the privileged position of the human, reconfiguring narratives from the perspective of the Other. Accordingly, I treat both smart city and self-driving car as nonhuman protagonists. Set in SimBeijing, a fictional research city on the China-Russia border, Death Drive examines the unique conditions of Chinese technological development — as noted by Yuk Hui and Anna Greenspan among others—to speculate on how the social and legal implications of digital surveillance may manifest within a Sinofuturist context. The narrative couples the nonhuman, a key figure in Sinofuturism, with the legal fiction of electronic personhood. This is grounded through problem-centred interviews conducted with legal experts and forensic researchers, drawing together the frameworks of criminal investigation and detective story. By formulating a hypothetical crime involving a selfdriving car, the film circumscribes the nonhuman within the sphere of criminality and liability. This approach challenges humanist conceptions of AI as a disembodied mind, envisioning the electronic Other as a political subject whose legal personhood emerges from the consequences of its corporeal action

    Exploring Sparse, Unstructured Video Collections of Places

    Get PDF
    The abundance of mobile devices and digital cameras with video capture makes it easy to obtain large collections of video clips that contain the same location, environment, or event. However, such an unstructured collection is difficult to comprehend and explore. We propose a system that analyses collections of unstructured but related video data to create a Videoscape: a data structure that enables interactive exploration of video collections by visually navigating — spatially and/or temporally — between different clips. We automatically identify transition opportunities, or portals. From these portals, we construct the Videoscape, a graph whose edges are video clips and whose nodes are portals between clips. Now structured, the videos can be interactively explored by walking the graph or by geographic map. Given this system, we gauge preference for different video transition styles in a user study, and generate heuristics that automatically choose an appropriate transition style. We evaluate our system using three further user studies, which allows us to conclude that Videoscapes provides significant benefits over related methods. Our system leads to previously unseen ways of interactive spatio-temporal exploration of casually captured videos, and we demonstrate this on several video collections

    August 31, 2016 (Wednesday) Daily Journal

    Get PDF
    corecore