3,660 research outputs found
Multi-task near-field perception for autonomous driving using surround-view fisheye cameras
Die Bildung der Augen führte zum Urknall der Evolution. Die Dynamik änderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch Mängel. Der Mensch hat über Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser Fähigkeiten für Computer ist entscheidend für verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented Realität und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360° Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. Jüngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen für die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die für die Entwicklung von Echtzeit-Anwendungen zur Verfügung steht. Aufgrund dieses Engpasses kommt es häufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer Rechenkomplexität für verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Überwindung von Rechenengpässen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks
3D Adversarial Augmentations for Robust Out-of-Domain Predictions
Since real-world training datasets cannot properly sample the long tail of
the underlying data distribution, corner cases and rare out-of-domain samples
can severely hinder the performance of state-of-the-art models. This problem
becomes even more severe for dense tasks, such as 3D semantic segmentation,
where points of non-standard objects can be confidently associated to the wrong
class. In this work, we focus on improving the generalization to out-of-domain
data. We achieve this by augmenting the training set with adversarial examples.
First, we learn a set of vectors that deform the objects in an adversarial
fashion. To prevent the adversarial examples from being too far from the
existing data distribution, we preserve their plausibility through a series of
constraints, ensuring sensor-awareness and shapes smoothness. Then, we perform
adversarial augmentation by applying the learned sample-independent vectors to
the available objects when training a model. We conduct extensive experiments
across a variety of scenarios on data from KITTI, Waymo, and CrashD for 3D
object detection, and on data from SemanticKITTI, Waymo, and nuScenes for 3D
semantic segmentation. Despite training on a standard single dataset, our
approach substantially improves the robustness and generalization of both 3D
object detection and 3D semantic segmentation methods to out-of-domain data.Comment: 37 pages, 12 figure
Machine Learning in Adversarial Environments
Machine Learning, especially Deep Neural Nets (DNNs), has achieved great success in a variety of applications. Unlike classical algorithms that could be formally analyzed, there is less understanding of neural network-based learning algorithms. This lack of understanding through either formal methods or empirical observations results in potential vulnerabilities that could be exploited by adversaries. This also hinders the deployment and adoption of learning methods in security-critical systems.
Recent works have demonstrated that DNNs are vulnerable to carefully crafted adversarial perturbations. We refer to data instances with added adversarial perturbations as “adversarial examples”. Such adversarial examples can mislead DNNs to produce adversary-selected results. Furthermore, it can cause a DNN system to misbehavior in unexpected and potentially dangerous ways. In this context, in this thesis, we focus on studying the security problem of current DNNs from the viewpoints of both attack and defense.
First, we explore the space of attacks against DNNs during the test time. We revisit the integrity of Lp regime and propose a new and rigorous threat model of adversarial examples. Based on this new threat model, we present the technique to generate adversarial examples in the digital space.
Second, we study the physical consequence of adversarial examples in the 3D and physical spaces. We first study the vulnerabilities of various vision systems by simulating the photo0taken process by using the physical renderer. To further explore the physical consequence in the real world, we select the safety-critical application of autonomous driving as the target system and study the vulnerability of the LiDAR-perceptual module. These studies show the potentially severe consequences of adversarial examples and raise awareness on its risks.
Last but not least, we develop solutions to defend against adversarial examples. We propose a consistency-check based method to detect adversarial examples by leveraging property of either the learning model or the data. We show two examples in the segmentation task (leveraging learning model) and video data (leveraging the data), respectively.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162944/1/xiaocw_1.pd
Deep Learning for Face Anti-Spoofing: A Survey
Face anti-spoofing (FAS) has lately attracted increasing attention due to its
vital role in securing face recognition systems from presentation attacks
(PAs). As more and more realistic PAs with novel types spring up, traditional
FAS methods based on handcrafted features become unreliable due to their
limited representation capacity. With the emergence of large-scale academic
datasets in the recent decade, deep learning based FAS achieves remarkable
performance and dominates this area. However, existing reviews in this field
mainly focus on the handcrafted features, which are outdated and uninspiring
for the progress of FAS community. In this paper, to stimulate future research,
we present the first comprehensive review of recent advances in deep learning
based FAS. It covers several novel and insightful components: 1) besides
supervision with binary label (e.g., '0' for bonafide vs. '1' for PAs), we also
investigate recent methods with pixel-wise supervision (e.g., pseudo depth
map); 2) in addition to traditional intra-dataset evaluation, we collect and
analyze the latest methods specially designed for domain generalization and
open-set FAS; and 3) besides commercial RGB camera, we summarize the deep
learning applications under multi-modal (e.g., depth and infrared) or
specialized (e.g., light field and flash) sensors. We conclude this survey by
emphasizing current open issues and highlighting potential prospects.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Deep Learning-Based Human Pose Estimation: A Survey
Human pose estimation aims to locate the human body parts and build human
body representation (e.g., body skeleton) from input data such as images and
videos. It has drawn increasing attention during the past decade and has been
utilized in a wide range of applications including human-computer interaction,
motion analysis, augmented reality, and virtual reality. Although the recently
developed deep learning-based solutions have achieved high performance in human
pose estimation, there still remain challenges due to insufficient training
data, depth ambiguities, and occlusion. The goal of this survey paper is to
provide a comprehensive review of recent deep learning-based solutions for both
2D and 3D pose estimation via a systematic analysis and comparison of these
solutions based on their input data and inference procedures. More than 240
research papers since 2014 are covered in this survey. Furthermore, 2D and 3D
human pose estimation datasets and evaluation metrics are included.
Quantitative performance comparisons of the reviewed methods on popular
datasets are summarized and discussed. Finally, the challenges involved,
applications, and future research directions are concluded. We also provide a
regularly updated project page: \url{https://github.com/zczcwh/DL-HPE
Contextual Social Networking
The thesis centers around the multi-faceted research question of how contexts may
be detected and derived that can be used for new context aware Social Networking
services and for improving the usefulness of existing Social Networking services, giving
rise to the notion of Contextual Social Networking. In a first foundational part,
we characterize the closely related fields of Contextual-, Mobile-, and Decentralized
Social Networking using different methods and focusing on different detailed
aspects. A second part focuses on the question of how short-term and long-term
social contexts as especially interesting forms of context for Social Networking may
be derived. We focus on NLP based methods for the characterization of social relations
as a typical form of long-term social contexts and on Mobile Social Signal
Processing methods for deriving short-term social contexts on the basis of geometry
of interaction and audio. We furthermore investigate, how personal social agents
may combine such social context elements on various levels of abstraction. The third
part discusses new and improved context aware Social Networking service concepts.
We investigate special forms of awareness services, new forms of social information
retrieval, social recommender systems, context aware privacy concepts and services
and platforms supporting Open Innovation and creative processes.
This version of the thesis does not contain the included publications because of
copyrights of the journals etc. Contact in terms of the version with all included
publications: Georg Groh, [email protected] zentrale Gegenstand der vorliegenden Arbeit ist die vielschichtige Frage, wie Kontexte detektiert und abgeleitet werden können, die dazu dienen können, neuartige kontextbewusste Social Networking Dienste zu schaffen und bestehende Dienste in ihrem Nutzwert zu verbessern. Die (noch nicht abgeschlossene) erfolgreiche Umsetzung dieses Programmes führt auf ein Konzept, das man als Contextual Social Networking bezeichnen kann. In einem grundlegenden ersten Teil werden die eng zusammenhängenden Gebiete Contextual Social Networking, Mobile Social Networking und Decentralized Social Networking mit verschiedenen Methoden und unter Fokussierung auf verschiedene Detail-Aspekte näher beleuchtet und in Zusammenhang gesetzt. Ein zweiter Teil behandelt die Frage, wie soziale Kurzzeit- und Langzeit-Kontexte als für das Social Networking besonders interessante Formen von Kontext gemessen und abgeleitet werden können. Ein Fokus liegt hierbei auf NLP Methoden zur Charakterisierung sozialer Beziehungen als einer typischen Form von sozialem Langzeit-Kontext. Ein weiterer Schwerpunkt liegt auf Methoden aus dem Mobile Social Signal Processing zur Ableitung sinnvoller sozialer Kurzzeit-Kontexte auf der Basis von Interaktionsgeometrien und Audio-Daten. Es wird ferner untersucht, wie persönliche soziale Agenten Kontext-Elemente verschiedener Abstraktionsgrade miteinander kombinieren können. Der dritte Teil behandelt neuartige und verbesserte Konzepte für kontextbewusste Social Networking Dienste. Es werden spezielle Formen von Awareness Diensten, neue Formen von sozialem Information Retrieval, Konzepte für kontextbewusstes Privacy Management und Dienste und Plattformen zur Unterstützung von Open Innovation und Kreativität untersucht und vorgestellt. Diese Version der Habilitationsschrift enthält die inkludierten Publikationen zurVermeidung von Copyright-Verletzungen auf Seiten der Journals u.a. nicht. Kontakt in Bezug auf die Version mit allen inkludierten Publikationen: Georg Groh, [email protected]
- …