43 research outputs found

    Online Domain Adaptation for Multi-Object Tracking

    Full text link
    Automatically detecting, labeling, and tracking objects in videos depends first and foremost on accurate category-level object detectors. These might, however, not always be available in practice, as acquiring high-quality large scale labeled training datasets is either too costly or impractical for all possible real-world application scenarios. A scalable solution consists in re-using object detectors pre-trained on generic datasets. This work is the first to investigate the problem of on-line domain adaptation of object detectors for causal multi-object tracking (MOT). We propose to alleviate the dataset bias by adapting detectors from category to instances, and back: (i) we jointly learn all target models by adapting them from the pre-trained one, and (ii) we also adapt the pre-trained model on-line. We introduce an on-line multi-task learning algorithm to efficiently share parameters and reduce drift, while gradually improving recall. Our approach is applicable to any linear object detector, and we evaluate both cheap "mini-Fisher Vectors" and expensive "off-the-shelf" ConvNet features. We quantitatively measure the benefit of our domain adaptation strategy on the KITTI tracking benchmark and on a new dataset (PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.Comment: To appear at BMVC 201

    An Approach to Counting Vehicles from Pre-Recorded Video Using Computer Algorithms

    Get PDF
    One of the fundamental sources of data for traffic analysis is vehicle counts, which can be conducted either by the traditional manual method or by automated means. Different agencies have guidelines for manual counting, but they are typically prepared for particular conditions. In the case of automated counting, different methods have been applied, but You Only Look Once (YOLO), a recently developed object detection model, presents new potential in automated vehicle counting. The first objective of this study was to formulate general guidelines for manual counting based on experience gained in the field. Another goal of this study was to develop a computer program for vehicle counting from pre-recorded video applying the YOLO model. The documented general guidelines provided in this project can be useful in acquiring the required standard and minimizing the cost of a manual counting project. The accuracy of the automated counting program was found to be about 90 percent for total daily counts, although most of that error was a consistent undercounting by automated counting

    Point Cloud Processing for Environmental Analysis in Autonomous Driving using Deep Learning

    Get PDF
    Autonomous self-driving cars need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types, such as lidar scanners, are in use. This thesis contributes highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds originating from lidar sensors. First, a single-shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and joint probabilistic tracking to stabilize predictions and filter outliers. The last part presents an evaluation of data from automotive-grade lidar scanners. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation.One of the main objectives of leading automotive companies is autonomous self-driving cars. They need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types are in use. Besides cameras, lidar scanners became very important. The development in that field is significant for future applications and system integration because lidar offers a more accurate depth representation, independent from environmental illumination. Especially algorithms and machine learning approaches, including Deep Learning and Artificial Intelligence based on raw laser scanner data, are very important due to the long range and three-dimensional resolution of the measured point clouds. Consequently, a broad field of research with many challenges and unsolved tasks has been established. This thesis aims to address this deficit and contribute highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds. First, a single shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and a joint probabilistic tracking to stabilize predictions and filter outliers. In the last part, a concept for deployment into an existing test vehicle focuses on the semi-automated generation of a suitable dataset. Subsequently, an evaluation of data from automotive-grade lidar scanners is presented. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation. Experiments on the acquired application-specific and benchmark datasets show that the presented methods compete with a variety of state-of-the-art algorithms while being trimmed down to efficiency for use in self-driving cars. Furthermore, they include an extensive set of standard evaluation metrics and results to form a solid baseline for future research.Eines der Hauptziele fĂŒhrender Automobilhersteller sind autonome Fahrzeuge. Sie benötigen ein sehr prĂ€zises System fĂŒr die Wahrnehmung der Umgebung, dass fĂŒr jedes denkbare Szenario ĂŒberall auf der Welt funktioniert. Daher sind verschiedene Arten von Sensoren im Einsatz, sodass neben Kameras u. a. auch Lidar Sensoren ein wichtiger Bestandteil sind. Die Entwicklung auf diesem Gebiet ist fĂŒr kĂŒnftige Anwendungen von höchster Bedeutung, da Lidare eine genauere, von der Umgebungsbeleuchtung unabhĂ€ngige, Tiefendarstellung bieten. Insbesondere Algorithmen und maschinelle LernansĂ€tze wie Deep Learning, die Rohdaten ĂŒber Lernzprozesse direkt verarbeiten können, sind aufgrund der großen Reichweite und der dreidimensionalen Auflösung der gemessenen Punktwolken sehr wichtig. Somit hat sich ein weites Forschungsfeld mit vielen Herausforderungen und ungelösten Problemen etabliert. Diese Arbeit zielt darauf ab, dieses Defizit zu verringern und effiziente Algorithmen zur 3D-Objekterkennung zu entwickeln. Sie stellt ein tiefes Neuronales Netzwerk mit spezifischen Schichten und einer neuartigen Fehlerfunktion zur sicheren Lokalisierung und SchĂ€tzung der Orientierung von Objekten aus Punktwolken bereit. ZunĂ€chst wird ein 3D-Detektor entwickelt, der in nur einem VorwĂ€rtsdurchlauf aus einer Punktwolke alle Objekte detektiert. Anschließend wird dieser Detektor durch die Fusion von komplementĂ€ren semantischen Merkmalen aus Kamerabildern und einem gemeinsamen probabilistischen Tracking verfeinert, um die Detektionen zu stabilisieren und Ausreißer zu filtern. Im letzten Teil wird ein Konzept fĂŒr den Einsatz in einem bestehenden Testfahrzeug vorgestellt, das sich auf die halbautomatische Generierung eines geeigneten Datensatzes konzentriert. Hierbei wird eine Auswertung auf Daten von Automotive-Lidaren vorgestellt. Als Alternative zur zielgerichteten kĂŒnstlichen Datengenerierung wird ein weiteres generatives Neuronales Netzwerk untersucht. Experimente mit den erzeugten anwendungsspezifischen- und Benchmark-DatensĂ€tzen zeigen, dass sich die vorgestellten Methoden mit dem Stand der Technik messen können und gleichzeitig auf Effizienz fĂŒr den Einsatz in selbstfahrenden Autos optimiert sind. DarĂŒber hinaus enthalten sie einen umfangreichen Satz an Evaluierungsmetriken und -ergebnissen, die eine solide Grundlage fĂŒr die zukĂŒnftige Forschung bilden

    Multispectral image analysis in laparoscopy – A machine learning approach to live perfusion monitoring

    Get PDF
    Modern visceral surgery is often performed through small incisions. Compared to open surgery, these minimally invasive interventions result in smaller scars, fewer complications and a quicker recovery. While to the patients benefit, it has the drawback of limiting the physician’s perception largely to that of visual feedback through a camera mounted on a rod lens: the laparoscope. Conventional laparoscopes are limited by “imitating” the human eye. Multispectral cameras remove this arbitrary restriction of recording only red, green and blue colors. Instead, they capture many specific bands of light. Although these could help characterize important indications such as ischemia and early stage adenoma, the lack of powerful digital image processing prevents realizing the technique’s full potential. The primary objective of this thesis was to pioneer fluent functional multispectral imaging (MSI) in laparoscopy. The main technical obstacles were: (1) The lack of image analysis concepts that provide both high accuracy and speed. (2) Multispectral image recording is slow, typically ranging from seconds to minutes. (3) Obtaining a quantitative ground truth for the measurements is hard or even impossible. To overcome these hurdles and enable functional laparoscopy, for the first time in this field physical models are combined with powerful machine learning techniques. The physical model is employed to create highly accurate simulations, which in turn teach the algorithm to rapidly relate multispectral pixels to underlying functional changes. To reduce the domain shift introduced by learning from simulations, a novel transfer learning approach automatically adapts generic simulations to match almost arbitrary recordings of visceral tissue. In combination with the only available video-rate capable multispectral sensor, the method pioneers fluent perfusion monitoring with MSI. This system was carefully tested in a multistage process, involving in silico quantitative evaluations, tissue phantoms and a porcine study. Clinical applicability was ensured through in-patient recordings in the context of partial nephrectomy; in these, the novel system characterized ischemia live during the intervention. Verified against a fluorescence reference, the results indicate that fluent, non-invasive ischemia detection and monitoring is now possible. In conclusion, this thesis presents the first multispectral laparoscope capable of videorate functional analysis. The system was successfully evaluated in in-patient trials, and future work should be directed towards evaluation of the system in a larger study. Due to the broad applicability and the large potential clinical benefit of the presented functional estimation approach, I am confident the descendants of this system are an integral part of the next generation OR

    Traffic Scene Perception for Automated Driving with Top-View Grid Maps

    Get PDF
    Ein automatisiertes Fahrzeug muss sichere, sinnvolle und schnelle Entscheidungen auf Basis seiner Umgebung treffen. Dies benötigt ein genaues und recheneffizientes Modell der Verkehrsumgebung. Mit diesem Umfeldmodell sollen Messungen verschiedener Sensoren fusioniert, gefiltert und nachfolgenden Teilsysteme als kompakte, aber aussagekrĂ€ftige Information bereitgestellt werden. Diese Arbeit befasst sich mit der Modellierung der Verkehrsszene auf Basis von Top-View Grid Maps. Im Vergleich zu anderen Umfeldmodellen ermöglichen sie eine frĂŒhe Fusion von Distanzmessungen aus verschiedenen Quellen mit geringem Rechenaufwand sowie eine explizite Modellierung von Freiraum. Nach der Vorstellung eines Verfahrens zur BodenoberflĂ€chenschĂ€tzung, das die Grundlage der Top-View Modellierung darstellt, werden Methoden zur Belegungs- und Elevationskartierung fĂŒr Grid Maps auf Basis von mehreren, verrauschten, teilweise widersprĂŒchlichen oder fehlenden Distanzmessungen behandelt. Auf der resultierenden, sensorunabhĂ€ngigen ReprĂ€sentation werden anschließend Modelle zur Detektion von Verkehrsteilnehmern sowie zur SchĂ€tzung von Szenenfluss, Odometrie und Tracking-Merkmalen untersucht. Untersuchungen auf öffentlich verfĂŒgbaren DatensĂ€tzen und einem Realfahrzeug zeigen, dass Top-View Grid Maps durch on-board LiDAR Sensorik geschĂ€tzt und verlĂ€sslich sicherheitskritische Umgebungsinformationen wie Beobachtbarkeit und Befahrbarkeit abgeleitet werden können. Schließlich werden Verkehrsteilnehmer als orientierte Bounding Boxen mit semantischen Klassen, Geschwindigkeiten und Tracking-Merkmalen aus einem gemeinsamen Modell zur Objektdetektion und FlussschĂ€tzung auf Basis der Top-View Grid Maps bestimmt

    Recent Trends in Computational Intelligence

    Get PDF
    Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

    People detection and tracking in crowded scenes

    Get PDF
    People are often a central element of visual scenes, particularly in real-world street scenes. Thus it has been a long-standing goal in Computer Vision to develop methods aiming at analyzing humans in visual data. Due to the complexity of real-world scenes, visual understanding of people remains challenging for machine perception. In this thesis we focus on advancing the techniques for people detection and tracking in crowded street scenes. We also propose new models for human pose estimation and motion segmentation in realistic images and videos. First, we propose detection models that are jointly trained to detect single person as well as pairs of people under varying degrees of occlusion. The learning algorithm of our joint detector facilitates a tight integration of tracking and detection, because it is designed to address common failure cases during tracking due to long-term inter-object occlusions. Second, we propose novel multi person tracking models that formulate tracking as a graph partitioning problem. Our models jointly cluster detection hypotheses in space and time, eliminating the need for a heuristic non-maximum suppression. Furthermore, for crowded scenes, our tracking model encodes long-range person re-identification information into the detection clustering process in a unified and rigorous manner. Third, we explore the visual tracking task in different granularity. We present a tracking model that simultaneously clusters object bounding boxes and pixel level trajectories over time. This approach provides a rich understanding of the motion of objects in the scene. Last, we extend our tracking model for the multi person pose estimation task. We introduce a joint subset partitioning and labelling model where we simultaneously estimate the poses of all the people in the scene. In summary, this thesis addresses a number of diverse tasks that aim to enable vision systems to analyze people in realistic images and videos. In particular, the thesis proposes several novel ideas and rigorous mathematical formulations, pushes the boundary of state-of-the-arts and results in superior performance.Personen sind oft ein zentraler Bestandteil visueller Szenen, besonders in natĂŒrlichen Straßenszenen. Daher ist es seit langem ein Ziel der Computer Vision, Methoden zu entwickeln, um Personen in einer Szene zu analysieren. Aufgrund der KomplexitĂ€t natĂŒrlicher Szenen bleibt das visuelle VerstĂ€ndnis von Personen eine Herausforderung fĂŒr die maschinelle Wahrnehmung. Im Zentrum dieser Arbeit steht die Weiterentwicklung von Verfahren zur Detektion und zum Tracking von Personen in Straßenszenen mit Menschenmengen. Wir erforschen darĂŒber hinaus neue Methoden zur menschlichen PosenschĂ€tzung und Bewegungssegmentierung in realistischen Bildern und Videos. ZunĂ€chst schlagen wir Detektionsmodelle vor, die gemeinsam trainiert werden, um sowohl einzelne Personen als auch Personenpaare bei verschiedener Verdeckung zu detektieren. Der Lernalgorithmus unseres gemeinsamen Detektors erleichtert eine enge Integration von Tracking und Detektion, da er darauf konzipiert ist, hĂ€ufige FehlerfĂ€lle aufgrund langfristiger Verdeckungen zwischen Objekten wĂ€hrend des Tracking anzugehen. Zweitens schlagen wir neue Modelle fĂŒr das Tracking mehrerer Personen vor, die das Tracking als Problem der Graphenpartitionierung formulieren. Unsere Mod- elle clustern Detektionshypothesen gemeinsam in Raum und Zeit und eliminieren dadurch die Notwendigkeit einer heuristischen UnterdrĂŒckung nicht maximaler De- tektionen. Bei Szenen mit Menschenmengen kodiert unser Trackingmodell darĂŒber hinaus einheitlich und genau Informationen zur langfristigen Re-Identifizierung in den Clusteringprozess der Detektionen. Drittens untersuchen wir die visuelle Trackingaufgabe bei verschiedener Gran- ularitĂ€t. Wir stellen ein Trackingmodell vor, das im Zeitablauf gleichzeitig Begren- zungsrahmen von Objekten und Trajektorien auf Pixelebene clustert. Diese Herange- hensweise ermöglicht ein umfassendes VerstĂ€ndnis der Bewegung der Objekte in der Szene. Schließlich erweitern wir unser Trackingmodell fĂŒr die PosenschĂ€tzung mehrerer Personen. Wir fĂŒhren ein Modell zur gemeinsamen Graphzerlegung und Knoten- klassifikation ein, mit dem wir gleichzeitig die Posen aller Personen in der Szene schĂ€tzen. Zusammengefasst widmet sich diese Arbeit einer Reihe verschiedener Aufgaben mit dem gemeinsamen Ziel, Bildverarbeitungssystemen die Analyse von Personen in realistischen Bildern und Videos zu ermöglichen. Insbesondere schlĂ€gt die Arbeit mehrere neue AnsĂ€tze und genaue mathematische Formulierungen vor, und sie zeigt Methoden, welche die Grenze des neuesten Stands der Technik ĂŒberschreiten und eine höhere Leistung von Bildverarbeitungssystemen ermöglichen

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above
    corecore