2,271 research outputs found

    Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture

    Get PDF
    Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks are designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature levels. After designing a set of networks, these models may be combined into a single network topology using graph optimization techniques. This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common network layers, and can be further optimized by retraining to achieve an improved model compared to the individual topologies. In this study, four SPDNN models are trained and have been evaluated at 2 stages on the KITTI dataset. The ground truth images in the first part of the experiment are provided by the benchmark, and for the second part, the ground truth images are the depth map results from applying a state-of-the-art stereo matching method. The results of this evaluation demonstrate that using post-processing techniques to refine the target of the network increases the accuracy of depth estimation on individual mono images. The second evaluation shows that using segmentation data alongside the original data as the input can improve the depth estimation results to a point where performance is comparable with stereo depth estimation. The computational time is also discussed in this study.Comment: 44 pages, 25 figure

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

    Full text link
    We introduce SAMPro3D for zero-shot 3D indoor scene segmentation. Given the 3D point cloud and multiple posed 2D frames of 3D scenes, our approach segments 3D scenes by applying the pretrained Segment Anything Model (SAM) to 2D frames. Our key idea involves locating 3D points in scenes as natural 3D prompts to align their projected pixel prompts across frames, ensuring frame-consistency in both pixel prompts and their SAM-predicted masks. Moreover, we suggest filtering out low-quality 3D prompts based on feedback from all 2D frames, for enhancing segmentation quality. We also propose to consolidate different 3D prompts if they are segmenting the same object, bringing a more comprehensive segmentation. Notably, our method does not require any additional training on domain-specific data, enabling us to preserve the zero-shot power of SAM. Extensive qualitative and quantitative results show that our method consistently achieves higher quality and more diverse segmentation than previous zero-shot or fully supervised approaches, and in many cases even surpasses human-level annotations. The project page can be accessed at https://mutianxu.github.io/sampro3d/.Comment: Project page: https://mutianxu.github.io/sampro3d

    Machine Learning in Robotic Ultrasound Imaging: Challenges and Perspectives

    Full text link
    This article reviews the recent advances in intelligent robotic ultrasound (US) imaging systems. We commence by presenting the commonly employed robotic mechanisms and control techniques in robotic US imaging, along with their clinical applications. Subsequently, we focus on the deployment of machine learning techniques in the development of robotic sonographers, emphasizing crucial developments aimed at enhancing the intelligence of these systems. The methods for achieving autonomous action reasoning are categorized into two sets of approaches: those relying on implicit environmental data interpretation and those using explicit interpretation. Throughout this exploration, we also discuss practical challenges, including those related to the scarcity of medical data, the need for a deeper understanding of the physical aspects involved, and effective data representation approaches. Moreover, we conclude by highlighting the open problems in the field and analyzing different possible perspectives on how the community could move forward in this research area.Comment: Accepted by Annual Review of Control, Robotics, and Autonomous System

    Computing fast search heuristics for physics-based mobile robot motion planning

    Get PDF
    Mobile robots are increasingly being employed to assist responders in search and rescue missions. Robots have to navigate in dangerous areas such as collapsed buildings and hazardous sites, which can be inaccessible to humans. Tele-operating the robots can be stressing for the human operators, which are also overloaded with mission tasks and coordination overhead, so it is important to provide the robot with some degree of autonomy, to lighten up the task for the human operator and also to ensure robot safety. Moving robots around requires reasoning, including interpretation of the environment, spatial reasoning, planning of actions (motion), and execution. This is particularly challenging when the environment is unstructured, and the terrain is \textit{harsh}, i.e. not flat and cluttered with obstacles. Approaches reducing the problem to a 2D path planning problem fall short, and many of those who reason about the problem in 3D don't do it in a complete and exhaustive manner. The approach proposed in this thesis is to use rigid body simulation to obtain a more truthful model of the reality, i.e. of the interaction between the robot and the environment. Such a simulation obeys the laws of physics, takes into account the geometry of the environment, the geometry of the robot, and any dynamic constraints that may be in place. The physics-based motion planning approach by itself is also highly intractable due to the computational load required to perform state propagation combined with the exponential blowup of planning; additionally, there are more technical limitations that disallow us to use things such as state sampling or state steering, which are known to be effective in solving the problem in simpler domains. The proposed solution to this problem is to compute heuristics that can bias the search towards the goal, so as to quickly converge towards the solution. With such a model, the search space is a rich space, which can only contain states which are physically reachable by the robot, and also tells us enough information about the safety of the robot itself. The overall result is that by using this framework the robot engineer has a simpler job of encoding the \textit{domain knowledge} which now consists only of providing the robot geometric model plus any constraints

    Robust semi-automated path extraction for visualising stenosis of the coronary arteries

    Get PDF
    Computed tomography angiography (CTA) is useful for diagnosing and planning treatment of heart disease. However, contrast agent in surrounding structures (such as the aorta and left ventricle) makes 3-D visualisation of the coronary arteries difficult. This paper presents a composite method employing segmentation and volume rendering to overcome this issue. A key contribution is a novel Fast Marching minimal path cost function for vessel centreline extraction. The resultant centreline is used to compute a measure of vessel lumen, which indicates the degree of stenosis (narrowing of a vessel). Two volume visualisation techniques are presented which utilise the segmented arteries and lumen measure. The system is evaluated and demonstrated using synthetic and clinically obtained datasets

    Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning

    Full text link
    In this work, we introduce panoramic panoptic segmentation, as the most holistic scene understanding, both in terms of Field of View (FoV) and image-level understanding for standard camera-based input. A complete surrounding understanding provides a maximum of information to a mobile agent. This is essential information for any intelligent vehicle to make informed decisions in a safety-critical dynamic environment such as real-world traffic. In order to overcome the lack of annotated panoramic images, we propose a framework which allows model training on standard pinhole images and transfers the learned features to the panoramic domain in a cost-minimizing way. The domain shift from pinhole to panoramic images is non-trivial as large objects and surfaces are heavily distorted close to the image border regions and look different across the two domains. Using our proposed method with dense contrastive learning, we manage to achieve significant improvements over a non-adapted approach. Depending on the efficient panoptic segmentation architecture, we can improve 3.5-6.5% measured in Panoptic Quality (PQ) over non-adapted models on our established Wild Panoramic Panoptic Segmentation (WildPPS) dataset. Furthermore, our efficient framework does not need access to the images of the target domain, making it a feasible domain generalization approach suitable for a limited hardware setting. As additional contributions, we publish WildPPS: The first panoramic panoptic image dataset to foster progress in surrounding perception and explore a novel training procedure combining supervised and contrastive training.Comment: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). Extended version of arXiv:2103.00868. The project is at https://github.com/alexanderjaus/PP

    Reliable Navigational Scene Perception for Autonomous Ships in Maritime Environment

    Get PDF
    Due to significant advances in robotics and transportation, research on autonomous ships has attracted considerable attention. The most critical task is to make the ships capable of accurately, reliably, and intelligently detecting their surroundings to achieve high levels of autonomy. Three deep learning-based models are constructed in this thesis to perform complex perceptual tasks such as identifying ships, analysing encounter situations, and recognising water surface objects. In this thesis, sensors, including the Automatic Identification System (AIS) and cameras, provide critical information for scene perception. Specifically, the AIS enables mid-range and long-range detection, assisting the decision-making system to take suitable and decisive action. A Convolutional Neural Network-Ship Movement Modes Classification (CNN-SMMC) is used to detect ships or objects. Following that, a Semi- Supervised Convolutional Encoder-Decoder Network (SCEDN) is developed to classify ship encounter situations and make a collision avoidance plan for the moving ships or objects. Additionally, cameras are used to detect short-range objects, a supplementary solution to ships or objects not equipped with an AIS. A Water Obstacle Detection Network based on Image Segmentation (WODIS) is developed to find potential threat targets. A series of quantifiable experiments have demonstrated that these models can provide reliable scene perception for autonomous ships
    • …
    corecore