215 research outputs found

    On Deep Learning Enhanced Multi-Sensor Odometry and Depth Estimation

    Get PDF
    In this thesis, we systematically study the integration of deep learning and simultaneous localization and mapping (SLAM) and advance the research frontier by making the following contributions. (1) We devise a unified information theoretic framework for end-to-end learning methods aimed at odometry estimation, which not only improves the accuracy empirically, but provides an elegant theoretical tool for performance evaluation and understanding in information theoretical language. (2) For the integration of learning and geometry, we put our research focus on the scale ambiguity problem in monocular SLAM and odometry systems. To this end, we first propose VRVO (Virtual-to-Real Visual Odometry) which retrieves the absolute scale from virtual data, adapts the learnt features between real and virtual domains, and establishes a mutual reinforcement pipeline between learning and optimization to further leverage the complementary information. The depth maps are used to carry the scale information, which are then integrated with classical SLAM systems by providing initialization values and dense virtual stereo objectives. (3) Since modern sensor-suits usually contain multiple sensors including camera and IMU, we further propose DynaDepth, an unsupervised monocular depth estimation method that integrates IMU motion dynamics. A differentiable camera-centric extended Kalman filter (EKF) framework is derived to exploit the complementary information from both camera and IMU sensors, which also provides an uncertainty measure for the ego-motion predictions. The proposed depth network not only learns the absolute scale, but exhibits better generalization ability and robustness against vision degradation. And the resulting depth predictions can be integrated into classical SLAM systems in the similar way as VRVO to achieve a scale-aware monocular SLAM system during inference

    Efficient Semantic Segmentation for Resource-Constrained Applications with Lightweight Neural Networks

    Get PDF
    This thesis focuses on developing lightweight semantic segmentation models tailored for resource-constrained applications, effectively balancing accuracy and computational efficiency. It introduces several novel concepts, including knowledge sharing, dense bottleneck, and feature re-usability, which enhance the feature hierarchy by capturing fine-grained details, long-range dependencies, and diverse geometrical objects within the scene. To achieve precise object localization and improved semantic representations in real-time environments, the thesis introduces multi-stage feature aggregation, feature scaling, and hybrid-path attention methods

    Leveraging Metadata for Computer Vision on Unmanned Aerial Vehicles

    Get PDF
    The integration of computer vision technology into Unmanned Aerial Vehicles (UAVs) has become increasingly crucial in various aerial vision-based applications. Despite the great significant success of generic computer vision methods, a considerable performance drop is observed when applied to the UAV domain. This is due to large variations in imaging conditions, such as varying altitudes, dynamically changing viewing angles, and varying capture times resulting in vast changes in lighting conditions. Furthermore, the need for real-time algorithms and the hardware constraints pose specific problems that require special attention in the development of computer vision algorithms for UAVs. In this dissertation, we demonstrate that domain knowledge in the form of meta data is a valuable source of information and thus propose domain-aware computer vision methods by using freely accessible sensor data. The pipeline for computer vision systems on UAVs is discussed, from data mission planning, data acquisition, labeling and curation, to the construction of publicly available benchmarks and leaderboards and the establishment of a wide range of baseline algorithms. Throughout, the focus is on a holistic view of the problems and opportunities in UAV-based computer vision, and the aim is to bridge the gap between purely software-based computer vision algorithms and environmentally aware robotic platforms. The results demonstrate that incorporating meta data obtained from onboard sensors, such as GPS, barometers, and inertial measurement units, can significantly improve the robustness and interpretability of computer vision models in the UAV domain. This leads to more trustworthy models that can overcome challenges such as domain bias, altitude variance, synthetic data inefficiency, and enhance perception through environmental awareness in temporal scenarios, such as video object detection, tracking and video anomaly detection. The proposed methods and benchmarks provide a foundation for future research in this area, and the results suggest promising directions for developing environmentally aware robotic platforms. Overall, this work highlights the potential of combining computer vision and robotics to tackle real-world challenges and opens up new avenues for interdisciplinary research

    Localization of Autonomous Vehicles in Urban Environments

    Full text link
    The future of applications such as last-mile delivery, infrastructure inspection and surveillance bets big on employing small autonomous drones and ground robots in cluttered urban settings where precise positioning is critical. However, when navigating close to buildings, GPS-based localisation of robotic platforms is noisy due to obscured reception and multi-path reflection. Localisation methods using introspective sensors like monocular and stereo cameras mounted on the platforms offer a better alternative as they are suitable for both indoor and outdoor operations. However, the inherent drift in the estimated trajectory is often evident in the 7 degrees of freedom that captures scaling, rotation and translation motion, and needs to be corrected. The theme of the thesis is to use a pre-existing 3D model to supplement the pose estimation from a visual navigation system, reducing incremental drift and thereby improving localisation accuracy. The novel framework developed for the monocular camera first extracts the geometric relationship between the pixels of the calibrated camera and the 3D points on the model. These geometric constraints, when used in addition to the relative pose constraints typically used in Simultaneous Localisation and Mapping (SLAM) algorithms, provide superior trajectory estimation. Further, scale drift correction is proposed using a novel SIM3SIM_3 optimisation procedure and successfully demonstrated using a unique dataset that embodies many urban localisation challenges. Techniques developed for Stereo camera localisation aligns the textured 3D stereo scans with respect to a 3D model and estimates the associated camera pose. The idea is to solve the image registration problem between the projection of the 3D scan and images whose poses are accurately known with respect to the 3D model. The 2D motion parameters are then mapped to the 3D space for camera pose estimation. Novel image registration techniques are developed which use image edge information combined with traditional approaches to show successful results

    State of the Art of Audio- and Video-Based Solutions for AAL

    Get PDF
    It is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters. Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals. Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely lifelogging and self-monitoring, remote monitoring of vital signs, emotional state recognition, food intake monitoring, activity and behaviour recognition, activity and personal assistance, gesture recognition, fall detection and prevention, mobility assessment and frailty recognition, and cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed

    Future Transportation

    Get PDF
    Greenhouse gas (GHG) emissions associated with transportation activities account for approximately 20 percent of all carbon dioxide (co2) emissions globally, making the transportation sector a major contributor to the current global warming. This book focuses on the latest advances in technologies aiming at the sustainable future transportation of people and goods. A reduction in burning fossil fuel and technological transitions are the main approaches toward sustainable future transportation. Particular attention is given to automobile technological transitions, bike sharing systems, supply chain digitalization, and transport performance monitoring and optimization, among others

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above
    • …
    corecore