2,343 research outputs found

    Image Filtering Techniques for Object Recognition in Autonomous Vehicles

    Get PDF
    The deployment of autonomous vehicles has the potential to significantly lessen the variety of current harmful externalities, (such as accidents, traffic congestion, security, and environmental degradation), making autonomous vehicles an emerging topic of research. In this paper, a literature review of autonomous vehicle development has been conducted with a notable finding that autonomous vehicles will inevitably become an indispensable future greener solution. Subsequently, 5 different deep learning models, YOLOv5s, EfficientNet-B7, Xception, MobilenetV3, and InceptionV4, have been built and analyzed for 2-D object recognition in the navigation system. While testing on the BDD100K dataset, YOLOv5s and EfficientNet-B7 appear to be the two best models. Finally, this study has proposed Hessian, Laplacian, and Hessian-based Ridge Detection filtering techniques to optimize the performance and sustainability of those 2 models. The results demonstrate that these filters could increase the mean average precision by up to 11.81%, reduce detection time by up to 43.98%, and significantly reduce energy consumption by up to 50.69% when applied to YOLOv5s and EfficientNet-B7 models. Overall, all the experiment results are promising and could be extended to other domains for semantic understanding of the environment. Additionally, various filtering algorithms for multiple object detection and classification could be applied to other areas. Different recommendations and future work have been clearly defined in this study

    A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles

    Get PDF
    Funding Agency: 10.13039/100016335-Jaguar Land Rover 10.13039/501100000266-U.K. Engineering and Physical Sciences Research Council (EPSRC) (Grant Number: EP/N01300X/1) jointly funded Towards Autonomy: Smart and Connected Control (TASCC) ProgramPeer reviewedPostprin

    A Survey on Socially Aware Robot Navigation: Taxonomy and Future Challenges

    Get PDF
    Socially aware robot navigation is gaining popularity with the increase in delivery and assistive robots. The research is further fueled by a need for socially aware navigation skills in autonomous vehicles to move safely and appropriately in spaces shared with humans. Although most of these are ground robots, drones are also entering the field. In this paper, we present a literature survey of the works on socially aware robot navigation in the past 10 years. We propose four different faceted taxonomies to navigate the literature and examine the field from four different perspectives. Through the taxonomic review, we discuss the current research directions and the extending scope of applications in various domains. Further, we put forward a list of current research opportunities and present a discussion on possible future challenges that are likely to emerge in the field

    RGB-Only Reconstruction of Tabletop Scenes for Collision-Free Manipulator Control

    Full text link
    We present a system for collision-free control of a robot manipulator that uses only RGB views of the world. Perceptual input of a tabletop scene is provided by multiple images of an RGB camera (without depth) that is either handheld or mounted on the robot end effector. A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function (ESDF) is computed. A model predictive control algorithm is then used to control the manipulator to reach a desired pose while avoiding obstacles in the ESDF. We show results on a real dataset collected and annotated in our lab.Comment: ICRA 2023. Project page at https://ngp-mpc.github.io

    RT-MonoDepth: Real-time Monocular Depth Estimation on Embedded Systems

    Full text link
    Depth sensing is a crucial function of unmanned aerial vehicles and autonomous vehicles. Due to the small size and simple structure of monocular cameras, there has been a growing interest in depth estimation from a single RGB image. However, state-of-the-art monocular CNN-based depth estimation methods using fairly complex deep neural networks are too slow for real-time inference on embedded platforms. This paper addresses the problem of real-time depth estimation on embedded systems. We propose two efficient and lightweight encoder-decoder network architectures, RT-MonoDepth and RT-MonoDepth-S, to reduce computational complexity and latency. Our methodologies demonstrate that it is possible to achieve similar accuracy as prior state-of-the-art works on depth estimation at a faster inference speed. Our proposed networks, RT-MonoDepth and RT-MonoDepth-S, runs at 18.4\&30.5 FPS on NVIDIA Jetson Nano and 253.0\&364.1 FPS on NVIDIA Jetson AGX Orin on a single RGB image of resolution 640×\times192, and achieve relative state-of-the-art accuracy on the KITTI dataset. To the best of the authors' knowledge, this paper achieves the best accuracy and fastest inference speed compared with existing fast monocular depth estimation methods.Comment: 8 pages, 5 figure

    Reconstruction and Synthesis of Human-Scene Interaction

    Get PDF
    In this thesis, we argue that the 3D scene is vital for understanding, reconstructing, and synthesizing human motion. We present several approaches which take the scene into consideration in reconstructing and synthesizing Human-Scene Interaction (HSI). We first observe that state-of-the-art pose estimation methods ignore the 3D scene and hence reconstruct poses that are inconsistent with the scene. We address this by proposing a pose estimation method that takes the 3D scene explicitly into account. We call our method PROX for Proximal Relationships with Object eXclusion. We leverage the data generated using PROX and build a method to automatically place 3D scans of people with clothing in scenes. The core novelty of our method is encoding the proximal relationships between the human and the scene in a novel HSI model, called POSA for Pose with prOximitieS and contActs. POSA is limited to static HSI, however. We propose a real-time method for synthesizing dynamic HSI, which we call SAMP for Scene-Aware Motion Prediction. SAMP enables virtual humans to navigate cluttered indoor scenes and naturally interact with objects. Data-driven kinematic models, like SAMP, can produce high-quality motion when applied in environments similar to those shown in the dataset. However, when applied to new scenarios, kinematic models can struggle to generate realistic behaviors that respect scene constraints. In contrast, we present InterPhys which uses adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a physical and life-like manner

    Characterisation and State Estimation of Magnetic Soft Continuum Robots

    Get PDF
    Minimally invasive surgery has become more popular as it leads to less bleeding, scarring, pain, and shorter recovery time. However, this has come with counter-intuitive devices and steep surgeon learning curves. Magnetically actuated Soft Continuum Robots (SCR) have the potential to replace these devices, providing high dexterity together with the ability to conform to complex environments and safe human interactions without the cognitive burden for the clinician. Despite considerable progress in the past decade in their development, several challenges still plague SCR hindering their full realisation. This thesis aims at improving magnetically actuated SCR by addressing some of these challenges, such as material characterisation and modelling, and sensing feedback and localisation. Material characterisation for SCR is essential for understanding their behaviour and designing effective modelling and simulation strategies. In this work, the material properties of commonly employed materials in magnetically actuated SCR, such as elastic modulus, hyper-elastic model parameters, and magnetic moment were determined. Additionally, the effect these parameters have on modelling and simulating these devices was investigated. Due to the nature of magnetic actuation, localisation is of utmost importance to ensure accurate control and delivery of functionality. As such, two localisation strategies for magnetically actuated SCR were developed, one capable of estimating the full 6 degrees of freedom (DOFs) pose without any prior pose information, and another capable of accurately tracking the full 6-DOFs in real-time with positional errors lower than 4~mm. These will contribute to the development of autonomous navigation and closed-loop control of magnetically actuated SCR

    Object detection and localization: an application inspired by RobotAtFactory using machine learning

    Get PDF
    Mestrado de dupla diplomação com a UTFPR - Universidade Tecnológica Federal do ParanáThe evolution of artificial intelligence and digital cameras has made the transformation of the real world into its digital image version more accessible and widely used. In this way, the analysis of information can be carried out with the use of algorithms. The detection and localization of objects is a crucial task in several applications, such as surveillance, autonomous robotics, intelligent transportation systems, and others. Based on this, this work aims to implement a system that can find objects and estimate their location (distance and angle), through the acquisition and analysis of images. Having as motivation the possible problems that can be introduced in the robotics competition, RobotAtFactory Lite, in future versions. As an example, the obstruction of the path developed through the printed lines, requiring the robot to deviate, and/or the positioning of the boxes in different places of the initial warehouses, being positioned so that the robot does not know its previous location, having to find it somehow. For this, different methods were analyzed, based on machine leraning, for object detection using feature extraction and neural networks, as well as object localization, based on the Pinhole model and triangulation. By compiling these techniques through python programming in the module, based on a Raspberry Pi Model B and a Raspi Cam Rev 1.3, the goal of the work is achieved. Thus, it was possible to find the objects and obtain an estimate of their relative position. In the future, in a possible implementation together with a robot, this data can be used to find objects and perform tasks.A evolução da inteligência artificial e das câmeras digitais, tornou mais acessível e amplamente utilizada a transformação do mundo real, para sua versão em imagem digital. Dessa maneira, a análise das informações pode ser efetuada com a utilização de algoritmos. A deteção e localização de objetos é uma tarefa crucial em diversas aplicações, tais como vigilância, robótica autônoma, sistemas de transporte inteligente, entre outras. Baseado nisso, este trabalho tem como objetivo implementar um sistema que consiga encontrar objetos e estimar sua localização (distância e ângulo), através da aquisição e análise de imagens. Tendo como motivação os possíveis problemas que possam ser introduzidos na competição de robótica, Robot@Factory Lite, em versões futuras. Podendo ser citados como exemplo a obstrução do caminho desenvolvido através das linhas impressas, requerendo que o robô desvie, e/ou o posicionamento das caixas em locais diferentes dos armazéns iniciais, sendo posicionadas de modo que o robô não saiba sua localização prévia, devendo encontra-las de alguma maneira. Para isso, foram analisados diferentes métodos, baseadas em machine leraning, para deteção de objetos utilizando extração de características e redes neurais, bem como a localização de objetos, baseada no modelo de Pinhole e triangulação. Compilando essas técnicas através da programação em python, no módulo, baseado em um Raspberry Pi Model B e um Raspi Cam Rev 1.3, o objetivo do trabalho é alcançado. Assim, foi possível encontrar os objetos e obter uma estimativa da sua posição relativa. Futuramente, em uma possível implementação junta a um robô, esses dados podem ser utilizados para encontrar objetos e executar tarefas

    Toward Efficient and Robust Computer Vision for Large-Scale Edge Applications

    Get PDF
    The past decade has been witnessing remarkable advancements in computer vision and deep learning algorithms, ushering in a transformative wave of large-scale edge applications across various industries. These image processing methods, however, still encounter numerous challenges when it comes to meeting real-world demands, especially in terms of accuracy and latency at scale. Indeed, striking a balance among efficiency, robustness, and scalability remains a common obstacle. This dissertation investigates these issues in the context of different computer vision tasks, including image classification, semantic segmentation, depth estimation, and object detection. We introduce novel solutions, focusing on utilizing adjustable neural networks, joint multi-task architecture search, and generalized supervision interpolation. The first obstacle revolves around the ability to trade off between speed and accuracy in convolutional neural networks (CNNs) during inference on resource-constrained platforms. Despite their progress, CNNs are typically monolithic at runtime, which can present practical difficulties since computational budgets may vary over time. To address this, we introduce Any-Width Network, an adjustable-width CNN architecture that utilizes a novel Triangular Convolution module to enable fine-grained control over speed and accuracy during inference. The second challenge focuses on the computationally demanding nature of dense prediction tasks such as semantic segmentation and depth estimation. This issue becomes especially problematic for edge platforms with limited resources. To tackle this, we propose a novel and scalable framework named EDNAS. EDNAS leverages the synergistic relationship between Multi-Task Learning and hardware-aware Neural Architecture Search to significantly enhance on-device speed and accuracy of dense predictions. Finally, to improve the robustness of object detection, we introduce a novel data mixing augmentation. While mixing techniques such as Mixup have proven successful in image classification, their application to object detection is non-trivial due to spatial misalignment, foreground/background distinction, and instance multiplicity. To address these issues, we propose a generalized data mixing principle, Supervision Interpolation, and its simple yet effective implementation, LossMix. By addressing these challenges, this dissertation aims to facilitate better efficiency, accuracy, and scalability of computer vision and deep learning algorithms and contribute to the advancement of large-scale edge applications across different domains.Doctor of Philosoph
    corecore