20 research outputs found
J-MOD: Joint Monocular Obstacle Detection and Depth Estimation
In this work, we propose an end-to-end deep architecture that jointly learns
to detect obstacles and estimate their depth for MAV flight applications. Most
of the existing approaches either rely on Visual SLAM systems or on depth
estimation models to build 3D maps and detect obstacles. However, for the task
of avoiding obstacles this level of complexity is not required. Recent works
have proposed multi task architectures to both perform scene understanding and
depth estimation. We follow their track and propose a specific architecture to
jointly estimate depth and obstacles, without the need to compute a global map,
but maintaining compatibility with a global SLAM system if needed. The network
architecture is devised to exploit the joint information of the obstacle
detection task, that produces more reliable bounding boxes, with the depth
estimation one, increasing the robustness of both to scenario changes. We call
this architecture J-MOD. We test the effectiveness of our approach with
experiments on sequences with different appearance and focal lengths and
compare it to SotA multi task methods that jointly perform semantic
segmentation and depth estimation. In addition, we show the integration in a
full system using a set of simulated navigation experiments where a MAV
explores an unknown scenario and plans safe trajectories by using our detection
model
AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture
The problem of multi-object tracking (MOT) consists in detecting and tracking
all the objects in a video sequence while keeping a unique identifier for each
object. It is a challenging and fundamental problem for robotics. In precision
agriculture the challenge of achieving a satisfactory solution is amplified by
extreme camera motion, sudden illumination changes, and strong occlusions. Most
modern trackers rely on the appearance of objects rather than motion for
association, which can be ineffective when most targets are static objects with
the same appearance, as in the agricultural case. To this end, on the trail of
SORT [5], we propose AgriSORT, a simple, online, real-time
tracking-by-detection pipeline for precision agriculture based only on motion
information that allows for accurate and fast propagation of tracks between
frames. The main focuses of AgriSORT are efficiency, flexibility, minimal
dependencies, and ease of deployment on robotic platforms. We test the proposed
pipeline on a novel MOT benchmark specifically tailored for the agricultural
context, based on video sequences taken in a table grape vineyard, particularly
challenging due to strong self-similarity and density of the instances. Both
the code and the dataset are available for future comparisons.Comment: 8 pages, 5 figures, submitted to International Conference on Robotics
and Automation (ICRA) 2024. Code and dataset will be soon available on my
github. This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Transferring knowledge across robots: A risk sensitive approach
One of the most impressive characteristics of human perception is its domain adaptation capability. Humans can recognize objects and places simply by transferring knowledge from their past experience. Inspired by that, current research in robotics is addressing a great challenge: building robots able to sense and interpret the surrounding world by reusing information previously collected, gathered by other robots or obtained from the web. But, how can a robot automatically understand what is useful among a large amount of information and perform knowledge transfer? In this paper we address the domain adaptation problem in the context of visual place recognition. We consider the scenario where a robot equipped with a monocular camera explores a new environment. In this situation traditional approaches based on supervised learning perform poorly, as no annotated data are provided in the new environment and the models learned from data collected in other places are inappropriate due to the large variability of visual information. To overcome these problems we introduce a novel transfer learning approach. With our algorithm the robot is given only some training data (annotated images collected in different environments by other robots) and is able to decide whether, and how much, this knowledge is useful in the current scenario. At the base of our approach there is a transfer risk measure which quantifies the similarity between the given and the new visual data. To improve the performance, we also extend our framework to take into account multiple visual cues. Our experiments on three publicly available datasets demonstrate the effectiveness of the proposed approach
Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks
Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the motion of the vehicle that limit their applicability, or work at very limited ranges due to intrinsic constraints. We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (∼ 300Hz), without making assumptions on the type of motion. We achieve these results using a Deep Neural Network approach trained on real and synthetic images and trading some depth accuracy for fast, robust and consistent operation.We show how photo-realistic synthetic images are able to solve the problem of training set dimension and variety typical of machine learning approaches, and how our system is robust to massive blurring of test images
Weakly Supervised Fruit Counting for Yield Estimation Using Spatial Consistency
Fruit counting is a fundamental component for yield estimation applications. Most of the existing approaches address this problem by relying on fruit models (i.e., by using object detectors) or by explicitly learning to count. Despite the impressive results achieved by these approaches, all of them need strong supervision information during the training phase. In agricultural applications, manual labeling may require a huge effort or, in some cases, it could be impossible to acquire fine-grained ground truth labels. In this letter, we tackle this problem by proposing a weakly supervised framework that learns to count fruits without the need for task-specific supervision labels. In particular, we devise a novel convolutional neural network architecture that requires only a simple image level binary classifier to detect whether the image contains instances of the fruits or not and combines this information with image spatial consistency constraints. The result is an architecture that learns to count without task-specific labels (e.g., object bounding boxes or the multiplicity of fruit instances in the image). The experiments on three different varieties of fruits (i.e., olives, almonds, and apples) show that our approach reaches performances that are comparable with SotA approaches based on the supervised paradigm
Visual-inertial Tracking on Android for Augmented Reality Applications
Augmented Reality (AR) aims to enhance a person’s vision of the real world with useful information about the surrounding environment. Amongst all the possible applications, AR systems can be very useful as visualization tools for structural and environmental monitoring. While the large majority of AR systems run on a laptop or on a head-mounted device, the advent of smartphones have created new opportunities. One of the most important functionality of an AR system is the
ability of the device to self localize. This can be achieved through visual dometry, a very challenging task for smartphone. Indeed, on most of the available smartphone AR applications, self localization is achieved through GPS and/or inertial sensors. Hence, developing an AR system on a mobile phone also poses
new challenges due to the limited amount of computational resources. In this paper we describe the development of a egomotion estimation algorithm for an Android smartphone. We also present an approach based on an Extended Kalman Filter for improving localization accuracy integrating the information from
inertial sensors. The implemented solution achieves a localization accuracy comparable to the PC implementation while running on an Android device
Modelling and simulation of a quadrotor in V-tail configuration
Standard quad-rotors are the most common and versatile unmanned aerial vehicles (UAVs) thanks to their simple control and mechanics. However, the common coplanar rotor configurations are designed for maximising hovering and loitering performances, and not for fast and aggressive manoeuvrings. Since the expanding field of application of micro aerial vehicles (MAVs) requires ever-increasing speed and agility, the question whether there are better configurations for aggressive flight arises. In this work, we address this question by studying the energetics and dynamics of fixed tilted rotor configurations compared to standard quad-rotor. To do so we chose a specific configuration, called V-tail, which is as mechanically simple as the standard X-4 quad-rotor, but has back rotors tilted by a known fixed angle, and developed the dynamical model to test its properties both through software simulation and with actual experiments. Mathematical modelling and field experiments suggest that this configuration is able to achieve better performance in manoeuvring control, while losing some power in hovering owing to less vertical thrust. In addition, these increases in performance are obtained with the same attitude control as the standard quad-rotor, making this configuration very easy to set up