52 research outputs found

    ParGANDA: Making Synthetic Pedestrians A Reality For Object Detection

    Full text link
    Object detection is the key technique to a number of Computer Vision applications, but it often requires large amounts of annotated data to achieve decent results. Moreover, for pedestrian detection specifically, the collected data might contain some personally identifiable information (PII), which is highly restricted in many countries. This label intensive and privacy concerning task has recently led to an increasing interest in training the detection models using synthetically generated pedestrian datasets collected with a photo-realistic video game engine. The engine is able to generate unlimited amounts of data with precise and consistent annotations, which gives potential for significant gains in the real-world applications. However, the use of synthetic data for training introduces a synthetic-to-real domain shift aggravating the final performance. To close the gap between the real and synthetic data, we propose to use a Generative Adversarial Network (GAN), which performsparameterized unpaired image-to-image translation to generate more realistic images. The key benefit of using the GAN is its intrinsic preference of low-level changes to geometric ones, which means annotations of a given synthetic image remain accurate even after domain translation is performed thus eliminating the need for labeling real data. We extensively experimented with the proposed method using MOTSynth dataset to train and MOT17 and MOT20 detection datasets to test, with experimental results demonstrating the effectiveness of this method. Our approach not only produces visually plausible samples but also does not require any labels of the real domain thus making it applicable to the variety of downstream tasks

    ADVERSARY AWARE CONTINUAL LEARNING

    Get PDF
    Continual learning approaches are useful as they help the model to learn new information (classes) sequentially, while also retaining the previously acquired information (classes). However, these approaches are adversary agnostic, i.e., they do not consider the possibility of malicious attacks. In this dissertation, we have demonstrated that continual learning approaches are extremely vulnerable to the adversarial backdoor attacks, where an intelligent adversary can introduce small amount of misinformation to the model in the form of imperceptible backdoor pattern during training to cause deliberate forgetting of a specific class at test time. We then propose a novel defensive framework to counter such an insidious attack where, we use the attacker’s primary strength – hiding the backdoor pattern by making it imperceptible to humans – against it and propose to learn a perceptible (stronger) pattern (also during the training) that can overpower the attacker’s imperceptible (weaker) pattern. We demonstrate the effectiveness of the proposed defensive mechanism through various commonly used replay-based (both generative and exact replay-based) continual learning algorithms using CIFAR-10, CIFAR-100, and MNIST benchmark datasets. Most noteworthy, we show that our proposed defensive framework considerably improves the robustness of continual learning algorithms with ZERO knowledge of the attacker’s target task, attacker’s target class, shape, size, and location of the attacker’s pattern. The proposed defensive framework also does not depend on the underlying continual learning algorithm. We term our proposed defensive framework as Adversary Aware Continual Learning (AACL)

    Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation

    Full text link
    Deep-learning models for 3D point cloud semantic segmentation exhibit limited generalization capabilities when trained and tested on data captured with different sensors or in varying environments due to domain shift. Domain adaptation methods can be employed to mitigate this domain shift, for instance, by simulating sensor noise, developing domain-agnostic generators, or training point cloud completion networks. Often, these methods are tailored for range view maps or necessitate multi-modal input. In contrast, domain adaptation in the image domain can be executed through sample mixing, which emphasizes input data manipulation rather than employing distinct adaptation modules. In this study, we introduce compositional semantic mixing for point cloud domain adaptation, representing the first unsupervised domain adaptation technique for point cloud segmentation based on semantic and geometric sample mixing. We present a two-branch symmetric network architecture capable of concurrently processing point clouds from a source domain (e.g. synthetic) and point clouds from a target domain (e.g. real-world). Each branch operates within one domain by integrating selected data fragments from the other domain and utilizing semantic information derived from source labels and target (pseudo) labels. Additionally, our method can leverage a limited number of human point-level annotations (semi-supervised) to further enhance performance. We assess our approach in both synthetic-to-real and real-to-real scenarios using LiDAR datasets and demonstrate that it significantly outperforms state-of-the-art methods in both unsupervised and semi-supervised settings.Comment: TPAMI. arXiv admin note: text overlap with arXiv:2207.0977

    An Outlook into the Future of Egocentric Vision

    Full text link
    What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.Comment: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1

    Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration

    Full text link
    Virtual reality has proved to be useful in applications in several fields ranging from gaming, medicine, and training to development of interfaces that enable human-robot collaboration. It empowers designers to explore applications outside of the constraints posed by the real world environment and develop innovative solutions and experiences. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets. In order to utilize the power of natural and intuitive hand gestures in the virtual domain for enabling embodied teleoperation of collaborative robots, similarly large datasets must be created so as to keep the working interface easy to learn and flexible enough to add more gestures. Depending on the application, this may be computationally or economically prohibitive. Thus, the adaptation of trained deep learning models that perform well in the real environment to the virtual may be a solution to this challenge. This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset along with guidelines for creating a curated dataset. Finally, while hand gestures have been considered as the communication mode, the guidelines and recommendations presented are generic. These are applicable to other modes such as body poses and facial expressions which have large datasets available in the real domain which must be adapted to the virtual one

    A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

    Full text link
    Machine learning methods strive to acquire a robust model during training that can generalize well to test samples, even under distribution shifts. However, these methods often suffer from a performance drop due to unknown test distributions. Test-time adaptation (TTA), an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions. Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference. In this survey, we divide TTA into several distinct categories, namely, test-time (source-free) domain adaptation, test-time batch adaptation, online test-time adaptation, and test-time prior adaptation. For each category, we provide a comprehensive taxonomy of advanced algorithms, followed by a discussion of different learning scenarios. Furthermore, we analyze relevant applications of TTA and discuss open challenges and promising areas for future research. A comprehensive list of TTA methods can be found at \url{https://github.com/tim-learn/awesome-test-time-adaptation}.Comment: Discussions, comments, and questions are all welcomed in \url{https://github.com/tim-learn/awesome-test-time-adaptation

    Simultaneous Localization and Mapping (SLAM) for Autonomous Driving: Concept and Analysis

    Get PDF
    The Simultaneous Localization and Mapping (SLAM) technique has achieved astonishing progress over the last few decades and has generated considerable interest in the autonomous driving community. With its conceptual roots in navigation and mapping, SLAM outperforms some traditional positioning and localization techniques since it can support more reliable and robust localization, planning, and controlling to meet some key criteria for autonomous driving. In this study the authors first give an overview of the different SLAM implementation approaches and then discuss the applications of SLAM for autonomous driving with respect to different driving scenarios, vehicle system components and the characteristics of the SLAM approaches. The authors then discuss some challenging issues and current solutions when applying SLAM for autonomous driving. Some quantitative quality analysis means to evaluate the characteristics and performance of SLAM systems and to monitor the risk in SLAM estimation are reviewed. In addition, this study describes a real-world road test to demonstrate a multi-sensor-based modernized SLAM procedure for autonomous driving. The numerical results show that a high-precision 3D point cloud map can be generated by the SLAM procedure with the integration of Lidar and GNSS/INS. Online four–five cm accuracy localization solution can be achieved based on this pre-generated map and online Lidar scan matching with a tightly fused inertial system

    Advanced traffic video analytics for robust traffic accident detection

    Get PDF
    Automatic traffic accident detection is an important task in traffic video analysis due to its key applications in developing intelligent transportation systems. Reducing the time delay between the occurrence of an accident and the dispatch of the first responders to the scene may help lower the mortality rate and save lives. Since 1980, many approaches have been presented for the automatic detection of incidents in traffic videos. In this dissertation, some challenging problems for accident detection in traffic videos are discussed and a new framework is presented in order to automatically detect single-vehicle and intersection traffic accidents in real-time. First, a new foreground detection method is applied in order to detect the moving vehicles and subtract the ever-changing background in the traffic video frames captured by static or non-stationary cameras. For the traffic videos captured during day-time, the cast shadows degrade the performance of the foreground detection and road segmentation. A novel cast shadow detection method is therefore presented to detect and remove the shadows cast by moving vehicles and also the shadows cast by static objects on the road. Second, a new method is presented to detect the region of interest (ROI), which applies the location of the moving vehicles and the initial road samples and extracts the discriminating features to segment the road region. After detecting the ROI, the moving direction of the traffic is estimated based on the rationale that the crashed vehicles often make rapid change of direction. Lastly, single-vehicle traffic accidents and trajectory conflicts are detected using the first-order logic decision-making system. The experimental results using publicly available videos and a dataset provided by the New Jersey Department of Transportation (NJDOT) demonstrate the feasibility of the proposed methods. Additionally, the main challenges and future directions are discussed regarding (i) improving the performance of the foreground segmentation, (ii) reducing the computational complexity, and (iii) detecting other types of traffic accidents

    LiDAR Domain Adaptation - Automotive 3D Scene Understanding

    Get PDF
    Umgebungswahrnehmung und Szeneverständnis spielen bei autonomen Fahrzeugen eine wesentliche Rolle. Ein Fahrzeug muss sich der Geometrie und Semantik seiner Umgebung bewusst sein, um das Verhalten anderer Verkehrsteilnehmer:innen vorherzusagen und sich selbst im fahrbaren Raum zu lokalisieren, um somit richtig zu navigieren. Heutzutage verwenden praktisch alle modernen Wahrnehmungssysteme für das automatisierte Fahren tiefe neuronale Netze. Um diese zu trainieren, werden enorme Datenmengen mit passenden Annotationen benötigt. Die Beschaffung der Daten ist relativ unaufwendig, da nur ein mit den richtigen Sensoren ausgestattetes Fahrzeug herumfahren muss. Die Erstellung von Annotationen ist jedoch ein sehr zeitaufwändiger und teurer Prozess. Erschwerend kommt hinzu, dass autonome Fahrzeuge praktisch überall (z.B. Europa und Asien, auf dem Land und in der Stadt) und zu jeder Zeit (z.B. Tag und Nacht, Sommer und Winter, Regen und Nebel) eingesetzt werden müssen. Dies erfordert, dass die Daten eine noch größere Anzahl unterschiedlicher Szenarien und Domänen abdecken. Es ist nicht praktikabel, Daten für eine solche Vielzahl von Domänen zu sammeln und zu annotieren. Wenn jedoch nur mit Daten aus einer Domäne trainiert wird, führt dies aufgrund von Unterschieden in den Daten zu einer schlechten Leistung in einer anderen Zieldomäne. Für eine sicherheitskritische Anwendung ist dies nicht akzeptabel. Das Gebiet der sogenannten Domänenanpassung führt Methoden ein, die helfen, diese Domänenlücken ohne die Verwendung von Annotationen aus der Zieldomäne zu schließen und somit auf die Entwicklung skalierbarer Wahrnehmungssysteme hinzuarbeiten. Die Mehrzahl der Arbeiten zur Domänenanpassung konzentriert sich auf die zweidimensionale Kamerawahrnehmung. In autonomen Fahrzeugen ist jedoch das dreidimensionale Verständnis der Szene essentiell, wofür heutzutage häufig LiDAR-Sensoren verwendet werden. Diese Dissertation befasst sich mit der Domänenanpassung für LiDAR-Wahrnehmung unter mehreren Aspekten. Zunächst wird eine Reihe von Techniken vorgestellt, die die Leistung und die Laufzeit von semantischen Segmentierungssystemen verbessern. Die gewonnenen Erkenntnisse werden in das Wahrnehmungsmodell integriert, das in dieser Dissertation verwendet wird, um die Wirksamkeit der vorgeschlagenen Domänenanpassungsansätze zu bewerten. Zweitens werden bestehende Ansätze diskutiert und Forschungslücken durch die Formulierung von offenen Forschungsfragen aufgezeigt. Um einige dieser Fragen zu beantworten, wird in dieser Dissertation eine neuartige quantitative Metrik vorgestellt. Diese Metrik erlaubt es, den Realismus von LiDAR-Daten abzuschätzen, der für die Leistung eines Wahrnehmungssystems entscheidend ist. So wird die Metrik zur Bewertung der Qualität von LiDAR-Punktwolken verwendet, die zum Zweck des Domänenmappings erzeugt werden, bei dem Daten von einer Domäne in eine anderen übertragen werden. Dies ermöglicht die Wiederverwendung von Annotationen aus einer Quelldomäne in der Zieldomäne. In einem weiteren Feld der Domänenanpassung wird in dieser Dissertation eine neuartige Methode vorgeschlagen, die die Geometrie der Szene nutzt, um domäneninvariante Merkmale zu lernen. Die geometrischen Informationen helfen dabei, die Domänenanpassungsfähigkeiten des Segmentierungsmodells zu verbessern und ohne zusätzlichen Mehraufwand bei der Inferenz die beste Leistung zu erzielen. Schließlich wird eine neuartige Methode zur Erzeugung semantisch sinnvoller Objektformen aus kontinuierlichen Beschreibungen vorgeschlagen, die – mit zusätzlicher Arbeit – zur Erweiterung von Szenen verwendet werden kann, um die Erkennungsfähigkeiten der Modelle zu verbessern. Zusammenfassend stellt diese Dissertation ein umfassendes System für die Domänenanpassung und semantische Segmentierung von LiDAR-Punktwolken im Kontext des autonomen Fahrens vor

    Deep Feature Learning and Adaptation for Computer Vision

    Get PDF
    We are living in times when a revolution of deep learning is taking place. In general, deep learning models have a backbone that extracts features from the input data followed by task-specific layers, e.g. for classification. This dissertation proposes various deep feature extraction and adaptation methods to improve task-specific learning, such as visual re-identification, tracking, and domain adaptation. The vehicle re-identification (VRID) task requires identifying a given vehicle among a set of vehicles under variations in viewpoint, illumination, partial occlusion, and background clutter. We propose a novel local graph aggregation module for feature extraction to improve VRID performance. We also utilize a class-balanced loss to compensate for the unbalanced class distribution in the training dataset. Overall, our framework achieves state-of-the-art (SOTA) performance in multiple VRID benchmarks. We further extend our VRID method for visual object tracking under occlusion conditions. We motivate visual object tracking from aerial platforms by conducting a benchmarking of tracking methods on aerial datasets. Our study reveals that the current techniques have limited capabilities to re-identify objects when fully occluded or out of view. The Siamese network based trackers perform well compared to others in overall tracking performance. We utilize our VRID work in visual object tracking and propose Siam-ReID, a novel tracking method using a Siamese network and VRID technique. In another approach, we propose SiamGauss, a novel Siamese network with a Gaussian Head for improved confuser suppression and real time performance. Our approach achieves SOTA performance on aerial visual object tracking datasets. A related area of research is developing deep learning based domain adaptation techniques. We propose continual unsupervised domain adaptation, a novel paradigm for domain adaptation in data constrained environments. We show that existing works fail to generalize when the target domain data are acquired in small batches. We propose to use a buffer to store samples that are previously seen by the network and a novel loss function to improve the performance of continual domain adaptation. We further extend our continual unsupervised domain adaptation research for gradually varying domains. Our method outperforms several SOTA methods even though they have the entire domain data available during adaptation
    • …
    corecore