27 research outputs found
leave a trace - A People Tracking System Meets Anomaly Detection
Video surveillance always had a negative connotation, among others because of
the loss of privacy and because it may not automatically increase public
safety. If it was able to detect atypical (i.e. dangerous) situations in real
time, autonomously and anonymously, this could change. A prerequisite for this
is a reliable automatic detection of possibly dangerous situations from video
data. This is done classically by object extraction and tracking. From the
derived trajectories, we then want to determine dangerous situations by
detecting atypical trajectories. However, due to ethical considerations it is
better to develop such a system on data without people being threatened or even
harmed, plus with having them know that there is such a tracking system
installed. Another important point is that these situations do not occur very
often in real, public CCTV areas and may be captured properly even less. In the
artistic project leave a trace the tracked objects, people in an atrium of a
institutional building, become actor and thus part of the installation.
Visualisation in real-time allows interaction by these actors, which in turn
creates many atypical interaction situations on which we can develop our
situation detection. The data set has evolved over three years and hence, is
huge. In this article we describe the tracking system and several approaches
for the detection of atypical trajectories
Vision-based safe autonomous UAV landing with panoramic sensors
The remarkable growth of unmanned aerial vehicles (UAVs) has also raised concerns about safety measures during their missions. To advance towards safer autonomous aerial robots, this thesis strives to develop a safe autonomous UAV landing solution, a vital part of every UAV operation. The project proposes a vision-based framework for monitoring the landing area by leveraging the omnidirectional view of a single panoramic camera pointing upwards to detect and localize any person within the landing zone. Then, it sends this information to approaching UAVs to either hover and wait or adaptively search for a more optimal position to land themselves. We utilize and fine-tune the YOLOv7 object detection model, an XGBooxt model for localizing nearby people, and the open-source ROS and PX4 frameworks for communications and drone control. We present both simulation and real-world indoor experimental results to demonstrate the capability of our methods
PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features
Event cameras are motion-activated sensors that capture pixel-level
illumination changes instead of the intensity image with a fixed frame rate.
Compared with the standard cameras, it can provide reliable visual perception
during high-speed motions and in high dynamic range scenarios. However, event
cameras output only a little information or even noise when the relative motion
between the camera and the scene is limited, such as in a still state. While
standard cameras can provide rich perception information in most scenarios,
especially in good lighting conditions. These two cameras are exactly
complementary. In this paper, we proposed a robust, high-accurate, and
real-time optimization-based monocular event-based visual-inertial odometry
(VIO) method with event-corner features, line-based event features, and
point-based image features. The proposed method offers to leverage the
point-based features in the nature scene and line-based features in the
human-made scene to provide more additional structure or constraints
information through well-design feature management. Experiments in the public
benchmark datasets show that our method can achieve superior performance
compared with the state-of-the-art image-based or event-based VIO. Finally, we
used our method to demonstrate an onboard closed-loop autonomous quadrotor
flight and large-scale outdoor experiments. Videos of the evaluations are
presented on our project website: https://b23.tv/OE3QM6
3D Multiple Object Tracking on Autonomous Driving: A Literature Review
3D multi-object tracking (3D MOT) stands as a pivotal domain within
autonomous driving, experiencing a surge in scholarly interest and commercial
promise over recent years. Despite its paramount significance, 3D MOT confronts
a myriad of formidable challenges, encompassing abrupt alterations in object
appearances, pervasive occlusion, the presence of diminutive targets, data
sparsity, missed detections, and the unpredictable initiation and termination
of object motion trajectories. Countless methodologies have emerged to grapple
with these issues, yet 3D MOT endures as a formidable problem that warrants
further exploration. This paper undertakes a comprehensive examination,
assessment, and synthesis of the research landscape in this domain, remaining
attuned to the latest developments in 3D MOT while suggesting prospective
avenues for future investigation. Our exploration commences with a systematic
exposition of key facets of 3D MOT and its associated domains, including
problem delineation, classification, methodological approaches, fundamental
principles, and empirical investigations. Subsequently, we categorize these
methodologies into distinct groups, dissecting each group meticulously with
regard to its challenges, underlying rationale, progress, merits, and demerits.
Furthermore, we present a concise recapitulation of experimental metrics and
offer an overview of prevalent datasets, facilitating a quantitative comparison
for a more intuitive assessment. Lastly, our deliberations culminate in a
discussion of the prevailing research landscape, highlighting extant challenges
and charting possible directions for 3D MOT research. We present a structured
and lucid road-map to guide forthcoming endeavors in this field.Comment: 24 pages, 6 figures, 2 table
3D Video Object Detection with Learnable Object-Centric Global Optimization
We explore long-term temporal visual correspondence-based optimization for 3D
video object detection in this work. Visual correspondence refers to one-to-one
mappings for pixels across multiple images. Correspondence-based optimization
is the cornerstone for 3D scene reconstruction but is less studied in 3D video
object detection, because moving objects violate multi-view geometry
constraints and are treated as outliers during scene reconstruction. We address
this issue by treating objects as first-class citizens during
correspondence-based optimization. In this work, we propose BA-Det, an
end-to-end optimizable object detector with object-centric temporal
correspondence learning and featuremetric object bundle adjustment.
Empirically, we verify the effectiveness and efficiency of BA-Det for multiple
baseline 3D detectors under various setups. Our BA-Det achieves SOTA
performance on the large-scale Waymo Open Dataset (WOD) with only marginal
computation cost. Our code is available at
https://github.com/jiaweihe1996/BA-Det.Comment: CVPR202
The always best positioned paradigm for mobile indoor applications
In this dissertation, methods for personal positioning in outdoor and indoor environments are investigated. The Always Best Positioned paradigm, which has the goal of providing a preferably consistent self-positioning, will be defined. Furthermore, the localization toolkit LOCATO will be presented, which allows to easily realize positioning systems that follow the paradigm. New algorithms were developed, which particularly address the robustness of positioning systems with respect to the Always Best Positioned paradigm. With the help of this toolkit, three example positioning-systems were implemented, each designed for different applications and requirements: a low-cost system, which can be used in conjunction with user-adaptive public displays, a so-called opportunistic system, which enables positioning with room-level accuracy in any building that provides a WiFi infrastructure, and a high-accuracy system for instrumented environments, which works with active RFID tags and infrared beacons. Furthermore, a new and unique evaluation-method for positioning systems is presented, which uses step-accurate natural walking-traces as ground truth. Finally, six location based services will be presented, which were realized either with the tools provided by LOCATO or with one of the example positioning-systems.In dieser Doktorarbeit werden Methoden zur Personenpositionierung im Innen- und Außenbereich von Gebäuden untersucht. Es wird das ,,Always Best Positioned” Paradigma definiert, welches eine möglichst lückenlose Selbstpositionierung zum Ziel hat. Weiterhin wird die Lokalisierungsplattform LOCATO vorgestellt, welche eine einfache Umsetzung von Positionierungssystemen ermöglicht. Hierzu wurden neue Algorithmen entwickelt, welche gezielt die Robustheit von Positionierungssystemen unter Berücksichtigung des ,,Always Best Positioned” Paradigmas angehen. Mit Hilfe dieser Plattform wurden drei Beispiel Positionierungssysteme entwickelt, welche unterschiedliche Einsatzgebiete berücksichtigen: Ein kostengünstiges System, das im Zusammenhang mit benutzeradaptiven öffentlichen Bildschirmen benutzt werden kann; ein sogenanntes opportunistisches Positionierungssystem, welches eine raumgenaue Positionierung in allen Gebäuden mit WLAN-Infrastruktur ermöglicht, sowie ein metergenaues Positionierungssystem, welches mit Hilfe einer Instrumentierung aus aktiven RFID-Tags und Infrarot-Baken arbeitet. Weiterhin wird erstmalig eine Positionierungsevaluation vorgestellt, welche schrittgenaue, natürliche Bewegungspfade als Referenzsystem einsetzt. Im Abschluss werden 6 lokationsbasierte Dienste vorgestellt, welche entweder mit Hilfe von LOCATO oder mit Hilfe einer der drei Beispiel-Positionierungssysteme entwickelt wurden
Virtual road signs : a thesis presented in partial fulfilment of the requirements for the degree of Master of Technology in Computer Systems Engineering at Massey University, Palmerston North, New Zealand
Conventional road signs are subject to a number of problems and limitations. They are unable to disseminate dynamic information to road users, their visibility is heavily dependent on environmental conditions, they are expensive to maintain and frequently the target of vandals and thieves. Virtual road signs (VRS) differ from conventional signs in that they exist only in an information database - no physical signs exist on the roadside. By projecting virtual signs into a driver's field of view at the correct time, virtual road signs attempt to mimic conventional road signs. In addition, their visibility is independent of weather and traffic conditions, they can be tailored to specific driver and vehicle needs (such as truck drivers), and they cannot be vandalised like physical signs. This thesis examines many of the major technical design decisions that must be made in implementing a virtual road sign system. A software prototype was designed and written to implement an experimental VRS system. The prototype served as a testbed to assess the technical feasibility of a VRS system and investigate alternative VRS designs. One limitation of the project was the lack of suitable display device that could display virtual signs inside a vehicle in real-time. Therefore, this project examined only the proof-of-concept. A test world was created around a university campus in which virtual signs were "erected" to target a visitor to the campus. The prototype used a handheld GPS receiver to track a vehicle as it was driven around the campus. A Kalman filter was implemented to filter the GPS data and predict the motion of the vehicle when GPS data was unavailable. A laptop PC provided onboard processing capability inside the test vehicle. The prototype shows that technical implementation of virtual road signs is potentially feasible, subject to limitations in current display devices such as heads-up displays. Potential applications include signs custom designed for tourists to indicate places of interest, bilingual signage, and aiding co-drivers in rally car driving. Before large-scale implementation can be considered, however, much research is needed, particularly with respect to systems acceptability to the public and road authorities
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
End-to-end approaches to autonomous driving commonly rely on expert demonstrations. Although humans are good drivers, they are not good coaches for end-to-end algorithms that demand dense on-policy supervision. On the contrary, automated experts that leverage privileged information can efficiently generate large scale on-policy and off-policy demonstrations. However, existing automated experts for urban driving make heavy use of hand-crafted rules and perform suboptimally even on driving simulators, where ground-truth information is available. To address these issues, we train a reinforcement learning expert that maps bird's-eye view images to continuous low-level actions. While setting a new performance upper-bound on CARLA, our expert is also a better coach that provides informative supervision signals for imitation learning agents to learn from. Supervised by our reinforcement learning coach, a baseline end-to-end agent with monocular camera-input achieves expert-level performance. Our end-to-end agent achieves a 78% success rate while generalizing to a new town and new weather on the NoCrash-dense benchmark and state-of-the-art performance on the more challenging CARLA LeaderBoard
Understanding a Dynamic World: Dynamic Motion Estimation for Autonomous Driving Using LIDAR
In a society that is heavily reliant on personal transportation, autonomous vehicles present an increasingly intriguing technology. They have the potential to save lives, promote efficiency, and enable mobility. However, before this vision becomes a reality, there are a number of challenges that must be solved. One key challenge involves problems in dynamic motion estimation, as it is critical for an autonomous vehicle to have an understanding of the dynamics in its environment for it to operate safely on the road. Accordingly, this thesis presents several algorithms for dynamic motion estimation for autonomous vehicles. We focus on methods using light detection and ranging (LIDAR), a prevalent sensing modality used by autonomous vehicle platforms, due to its advantages over other sensors, such as cameras, including lighting invariance and fidelity of 3D geometric data.
First, we propose a dynamic object tracking algorithm. The proposed method takes as input a stream of LIDAR data from a moving object collected by a multi-sensor platform. It generates an estimate of its trajectory over time and a point cloud model of its shape. We formulate the problem similarly to simultaneous localization and mapping (SLAM), allowing us to leverage existing techniques. Unlike prior work, we properly handle a stream of sensor measurements observed over time by deriving our algorithm using a continuous-time estimation framework. We evaluate our proposed method on a real-world dataset that we collect.
Second, we present a method for scene flow estimation from a stream of LIDAR data. Inspired by optical flow and scene flow from the computer vision community, our framework can estimate dynamic motion in the scene without relying on segmentation and data association while still rivaling the results of state-of-the-art object tracking methods. We design our algorithms to exploit a graphics processing unit (GPU), enabling real-time performance.
Third, we leverage deep learning tools to build a feature learning framework that allows us to train an encoding network to estimate features from a LIDAR occupancy grid. The learned feature space describes the geometric and semantic structure of any location observed by the LIDAR data. We formulate the training process so that distances in this learned feature space are meaningful in comparing the similarity of different locations. Accordingly, we demonstrate that using this feature space improves our estimate of the dynamic motion in the environment over time.
In summary, this thesis presents three methods to aid in understanding a dynamic world for autonomous vehicle applications with LIDAR. These methods include a novel object tracking algorithm, a real-time scene flow estimation method, and a feature learning framework to aid in dynamic motion estimation. Furthermore, we demonstrate the performance of all our proposed methods on a collection of real-world datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147587/1/aushani_1.pd