715 research outputs found

    Toward Efficient and Robust Computer Vision for Large-Scale Edge Applications

    Get PDF
    The past decade has been witnessing remarkable advancements in computer vision and deep learning algorithms, ushering in a transformative wave of large-scale edge applications across various industries. These image processing methods, however, still encounter numerous challenges when it comes to meeting real-world demands, especially in terms of accuracy and latency at scale. Indeed, striking a balance among efficiency, robustness, and scalability remains a common obstacle. This dissertation investigates these issues in the context of different computer vision tasks, including image classification, semantic segmentation, depth estimation, and object detection. We introduce novel solutions, focusing on utilizing adjustable neural networks, joint multi-task architecture search, and generalized supervision interpolation. The first obstacle revolves around the ability to trade off between speed and accuracy in convolutional neural networks (CNNs) during inference on resource-constrained platforms. Despite their progress, CNNs are typically monolithic at runtime, which can present practical difficulties since computational budgets may vary over time. To address this, we introduce Any-Width Network, an adjustable-width CNN architecture that utilizes a novel Triangular Convolution module to enable fine-grained control over speed and accuracy during inference. The second challenge focuses on the computationally demanding nature of dense prediction tasks such as semantic segmentation and depth estimation. This issue becomes especially problematic for edge platforms with limited resources. To tackle this, we propose a novel and scalable framework named EDNAS. EDNAS leverages the synergistic relationship between Multi-Task Learning and hardware-aware Neural Architecture Search to significantly enhance on-device speed and accuracy of dense predictions. Finally, to improve the robustness of object detection, we introduce a novel data mixing augmentation. While mixing techniques such as Mixup have proven successful in image classification, their application to object detection is non-trivial due to spatial misalignment, foreground/background distinction, and instance multiplicity. To address these issues, we propose a generalized data mixing principle, Supervision Interpolation, and its simple yet effective implementation, LossMix. By addressing these challenges, this dissertation aims to facilitate better efficiency, accuracy, and scalability of computer vision and deep learning algorithms and contribute to the advancement of large-scale edge applications across different domains.Doctor of Philosoph

    CoLo-CAM: Class Activation Mapping for Object Co-Localization in Weakly-Labeled Unconstrained Videos

    Full text link
    Weakly supervised video object localization (WSVOL) methods often rely on visual and motion cues only, making them susceptible to inaccurate localization. Recently, discriminative models have been explored using a temporal class activation mapping (CAM) method. Although their results are promising, objects are assumed to have limited movement from frame to frame, leading to degradation in performance for relatively long-term dependencies. In this paper, a novel CoLo-CAM method for WSVOL is proposed that leverages spatiotemporal information in activation maps during training without making assumptions about object position. Given a sequence of frames, explicit joint learning of localization is produced based on color cues across these maps, by assuming that an object has similar color across adjacent frames. CAM activations are constrained to respond similarly over pixels with similar colors, achieving co-localization. This joint learning creates direct communication among pixels across all image locations and over all frames, allowing for transfer, aggregation, and correction of learned localization, leading to better localization performance. This is achieved by minimizing the color term of a conditional random field (CRF) loss over a sequence of frames/CAMs. Empirical experiments on two challenging datasets with unconstrained videos, YouTube-Objects, show the merits of our method, and its robustness to long-term dependencies, leading to new state-of-the-art performance for WSVOL.Comment: 16 pages, 8 figure

    Deep Learning for Time-Series Analysis of Optical Satellite Imagery

    Get PDF
    In this cumulative thesis, I cover four papers on time-series analysis of optical satellite imagery. The contribution is split into two parts. The first one introduces DENETHOR and DynamicEarthNet, two landmark datasets with high-quality ground truth data for agricultural monitoring and change detection. Second, I introduce SiROC and SemiSiROC, two methodological contributions to label-efficient change detection

    Exploring Hyperspectral Imaging and 3D Convolutional Neural Network for Stress Classification in Plants

    Get PDF
    Hyperspectral imaging (HSI) has emerged as a transformative technology in imaging, characterized by its ability to capture a wide spectrum of light, including wavelengths beyond the visible range. This approach significantly differs from traditional imaging methods such as RGB imaging, which uses three color channels, and multispectral imaging, which captures several discrete spectral bands. Through this approach, HSI offers detailed spectral signatures for each pixel, facilitating a more nuanced analysis of the imaged subjects. This capability is particularly beneficial in applications like agricultural practices, where it can detect changes in physiological and structural characteristics of crops. Moreover, the ability of HSI to monitor these changes over time is advantageous for observing how subjects respond to different environmental conditions or treatments. However, the high-dimensional nature of hyperspectral data presents challenges in data processing and feature extraction. Traditional machine learning algorithms often struggle to handle such complexity. This is where 3D Convolutional Neural Networks (CNNs) become valuable. Unlike 1D-CNNs, which extract features from spectral dimensions, and 2D-CNNs, which focus on spatial dimensions, 3D CNNs have the capability to process data across both spectral and spatial dimensions. This makes them adept at extracting complex features from hyperspectral data. In this thesis, we explored the potency of HSI combined with 3D-CNN in agriculture domain where plant health and vitality are paramount. To evaluate this, we subjected lettuce plants to varying stress levels to assess the performance of this method in classifying the stressed lettuce at the early stages of growth into their respective stress-level groups. For this study, we created a dataset comprising 88 hyperspectral image samples of stressed lettuce. Utilizing Bayesian optimization, we developed 350 distinct 3D-CNN models to assess the method. The top-performing model achieved a 75.00\% test accuracy. Additionally, we addressed the challenge of generating valid 3D-CNN models in the Keras Tuner library through meticulous hyperparameter configuration. Our investigation also extends to the role of individual channels and channel groups within the color and near-infrared spectrum in predicting results for each stress-level group. We observed that the red and green spectra have a higher influence on the prediction results. Furthermore, we conducted a comprehensive review of 3D-CNN-based classification techniques for diseased and defective crops using non-UAV-based hyperspectral images.MITACSMaster of Science in Applied Computer Scienc

    Lifelong Learning in the Clinical Open World

    Get PDF
    Despite mounting evidence that data drift causes deep learning models to deteriorate over time, the majority of medical imaging research is developed for - and evaluated on - static close-world environments. There have been exciting advances in the automatic detection and segmentation of diagnostically-relevant findings. Yet the few studies that attempt to validate their performance in actual clinics are met with disappointing results and little utility as perceived by healthcare professionals. This is largely due to the many factors that introduce shifts in medical image data distribution, from changes in the acquisition practices to naturally occurring variations in the patient population and disease manifestation. If we truly wish to leverage deep learning technologies to alleviate the workload of clinicians and drive forward the democratization of health care, we must move away from close-world assumptions and start designing systems for the dynamic open world. This entails, first, the establishment of reliable quality assurance mechanisms with methods from the fields of uncertainty estimation, out-of-distribution detection, and domain-aware prediction appraisal. Part I of the thesis summarizes my contributions to this area. I first propose two approaches that identify outliers by monitoring a self-supervised objective or by quantifying the distance to training samples in a low-dimensional latent space. I then explore how to maximize the diversity among members of a deep ensemble for improved calibration and robustness; and present a lightweight method to detect low-quality lung lesion segmentation masks using domain knowledge. Of course, detecting failures is only the first step. We ideally want to train models that are reliable in the open world for a large portion of the data. Out-of-distribution generalization and domain adaptation may increase robustness, but only to a certain extent. As time goes on, models can only maintain acceptable performance if they continue learning with newly acquired cases that reflect changes in the data distribution. The goal of continual learning is to adapt to changes in the environment without forgetting previous knowledge. One practical strategy to approach this is expansion, whereby multiple parametrizations of the model are trained and the most appropriate one is selected during inference. In the second part of the thesis, I present two expansion-based methods that do not rely on information regarding when or how the data distribution changes. Even when appropriate mechanisms are in place to fail safely and accumulate knowledge over time, this will only translate to clinical usage insofar as the regulatory framework allows it. Current regulations in the USA and European Union only authorize locked systems that do not learn post-deployment. Fortunately, regulatory bodies are noting the need for a modern lifecycle regulatory approach. I review these efforts, along with other practical aspects of developing systems that learn through their lifecycle, in the third part of the thesis. We are finally at a stage where healthcare professionals and regulators are embracing deep learning. The number of commercially available diagnostic radiology systems is also quickly rising. This opens up our chance - and responsibility - to show that these systems can be safe and effective throughout their lifespan

    Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation

    Full text link
    Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. Our findings reveal two interesting observations that shed light on the impact of prompt tuning on WSSS. First, modifying only the class token of the text prompt results in a greater impact on the Class Activation Map (CAM), compared to arguably more complex strategies that optimize the context. And second, the class token associated with the image ground truth does not necessarily correspond to the category that yields the best CAM. Motivated by these observations, we introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy. Through extensive experiments we demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark. These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.Comment: Under revie

    Pre-Trained Driving in Localized Surroundings with Semantic Radar Information and Machine Learning

    Get PDF
    Entlang der Signalverarbeitungskette von Radar Detektionen bis zur Fahrzeugansteuerung, diskutiert diese Arbeit eine semantischen Radar Segmentierung, einen darauf aufbauenden Radar SLAM, sowie eine im Verbund realisierte autonome Parkfunktion. Die Radarsegmentierung der (statischen) Umgebung wird durch ein Radar-spezifisches neuronales Netzwerk RadarNet erreicht. Diese Segmentierung ermöglicht die Entwicklung des semantischen Radar Graph-SLAM SERALOC. Auf der Grundlage der semantischen Radar SLAM Karte wird eine beispielhafte autonome Parkfunktionalität in einem realen Versuchsträger umgesetzt. Entlang eines aufgezeichneten Referenzfades parkt die Funktion ausschließlich auf Basis der Radar Wahrnehmung mit bisher unerreichter Positioniergenauigkeit. Im ersten Schritt wird ein Datensatz von 8.2 · 10^6 punktweise semantisch gelabelten Radarpunktwolken über eine Strecke von 2507.35m generiert. Es sind keine vergleichbaren Datensätze dieser Annotationsebene und Radarspezifikation öffentlich verfügbar. Das überwachte Training der semantischen Segmentierung RadarNet erreicht 28.97% mIoU auf sechs Klassen. Außerdem wird ein automatisiertes Radar-Labeling-Framework SeRaLF vorgestellt, welches das Radarlabeling multimodal mittels Referenzkameras und LiDAR unterstützt. Für die kohärente Kartierung wird ein Radarsignal-Vorfilter auf der Grundlage einer Aktivierungskarte entworfen, welcher Rauschen und andere dynamische Mehrwegreflektionen unterdrückt. Ein speziell für Radar angepasstes Graph-SLAM-Frontend mit Radar-Odometrie Kanten zwischen Teil-Karten und semantisch separater NDT Registrierung setzt die vorgefilterten semantischen Radarscans zu einer konsistenten metrischen Karte zusammen. Die Kartierungsgenauigkeit und die Datenassoziation werden somit erhöht und der erste semantische Radar Graph-SLAM für beliebige statische Umgebungen realisiert. Integriert in ein reales Testfahrzeug, wird das Zusammenspiel der live RadarNet Segmentierung und des semantischen Radar Graph-SLAM anhand einer rein Radar-basierten autonomen Parkfunktionalität evaluiert. Im Durchschnitt über 42 autonome Parkmanöver (∅3.73 km/h) bei durchschnittlicher Manöverlänge von ∅172.75m wird ein Median absoluter Posenfehler von 0.235m und End-Posenfehler von 0.2443m erreicht, der vergleichbare Radar-Lokalisierungsergebnisse um ≈ 50% übertrifft. Die Kartengenauigkeit von veränderlichen, neukartierten Orten über eine Kartierungsdistanz von ∅165m ergibt eine ≈ 56%-ige Kartenkonsistenz bei einer Abweichung von ∅0.163m. Für das autonome Parken wurde ein gegebener Trajektorienplaner und Regleransatz verwendet

    Critical Points ++: An Agile Point Cloud Importance Measure for Robust Classification, Adversarial Defense and Explainable AI

    Full text link
    The ability to cope accurately and fast with Out-Of-Distribution (OOD) samples is crucial in real-world safety demanding applications. In this work we first study the interplay between critical points of 3D point clouds and OOD samples. Our findings are that common corruptions and outliers are often interpreted as critical points. We generalize the notion of critical points into importance measures. We show that training a classification network based only on less important points dramatically improves robustness, at a cost of minor performance loss on the clean set. We observe that normalized entropy is highly informative for corruption analysis. An adaptive threshold based on normalized entropy is suggested for selecting the set of uncritical points. Our proposed importance measure is extremely fast to compute. We show it can be used for a variety of applications, such as Explainable AI (XAI), Outlier Removal, Uncertainty Estimation, Robust Classification and Adversarial Defense. We reach SOTA results on the two latter tasks. Code is available at: https://github.com/yossilevii100/critical_points

    Remote sensing traffic scene retrieval based on learning control algorithm for robot multimodal sensing information fusion and human-machine interaction and collaboration

    Get PDF
    In light of advancing socio-economic development and urban infrastructure, urban traffic congestion and accidents have become pressing issues. High-resolution remote sensing images are crucial for supporting urban geographic information systems (GIS), road planning, and vehicle navigation. Additionally, the emergence of robotics presents new possibilities for traffic management and road safety. This study introduces an innovative approach that combines attention mechanisms and robotic multimodal information fusion for retrieving traffic scenes from remote sensing images. Attention mechanisms focus on specific road and traffic features, reducing computation and enhancing detail capture. Graph neural algorithms improve scene retrieval accuracy. To achieve efficient traffic scene retrieval, a robot equipped with advanced sensing technology autonomously navigates urban environments, capturing high-accuracy, wide-coverage images. This facilitates comprehensive traffic databases and real-time traffic information retrieval for precise traffic management. Extensive experiments on large-scale remote sensing datasets demonstrate the feasibility and effectiveness of this approach. The integration of attention mechanisms, graph neural algorithms, and robotic multimodal information fusion enhances traffic scene retrieval, promising improved information extraction accuracy for more effective traffic management, road safety, and intelligent transportation systems. In conclusion, this interdisciplinary approach, combining attention mechanisms, graph neural algorithms, and robotic technology, represents significant progress in traffic scene retrieval from remote sensing images, with potential applications in traffic management, road safety, and urban planning
    corecore