546 research outputs found
Self-supervised learning for transferable representations
Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
Reconstruction and Synthesis of Human-Scene Interaction
In this thesis, we argue that the 3D scene is vital for understanding, reconstructing, and synthesizing human motion. We present several approaches which take the scene into consideration in reconstructing and synthesizing Human-Scene Interaction (HSI). We first observe that state-of-the-art pose estimation methods ignore the 3D scene and hence reconstruct poses that are inconsistent with the scene. We address this by proposing a pose estimation method that takes the 3D scene explicitly into account. We call our method PROX for Proximal Relationships with Object eXclusion. We leverage the data generated using PROX and build a method to automatically place 3D scans of people with clothing in scenes. The core novelty of our method is encoding the proximal relationships between the human and the scene in a novel HSI model, called POSA for Pose with prOximitieS and contActs. POSA is limited to static HSI, however. We propose a real-time method for synthesizing dynamic HSI, which we call SAMP for Scene-Aware Motion Prediction. SAMP enables virtual humans to navigate cluttered indoor scenes and naturally interact with objects. Data-driven kinematic models, like SAMP, can produce high-quality motion when applied in environments similar to those shown in the dataset. However, when applied to new scenarios, kinematic models can struggle to generate realistic behaviors that respect scene constraints. In contrast, we present InterPhys which uses adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a physical and life-like manner
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
Training perception systems for self-driving cars requires substantial
annotations. However, manual labeling in 2D images is highly labor-intensive.
While existing datasets provide rich annotations for pre-recorded sequences,
they fall short in labeling rarely encountered viewpoints, potentially
hampering the generalization ability for perception models. In this paper, we
present PanopticNeRF-360, a novel approach that combines coarse 3D annotations
with noisy 2D semantic cues to generate consistent panoptic labels and
high-quality images from any viewpoint. Our key insight lies in exploiting the
complementarity of 3D and 2D priors to mutually enhance geometry and semantics.
Specifically, we propose to leverage noisy semantic and instance labels in both
3D and 2D spaces to guide geometry optimization. Simultaneously, the improved
geometry assists in filtering noise present in the 3D and 2D annotations by
merging them in 3D space via a learned semantic field. To further enhance
appearance, we combine MLP and hash grids to yield hybrid scene features,
striking a balance between high-frequency appearance and predominantly
contiguous semantics. Our experiments demonstrate PanopticNeRF-360's
state-of-the-art performance over existing label transfer methods on the
challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360
enables omnidirectional rendering of high-fidelity, multi-view and
spatiotemporally consistent appearance, semantic and instance labels. We make
our code and data available at https://github.com/fuxiao0719/PanopticNeRFComment: Project page: http://fuxiao0719.github.io/projects/panopticnerf360/.
arXiv admin note: text overlap with arXiv:2203.1522
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
Digital agriculture: research, development and innovation in production chains.
Digital transformation in the field towards sustainable and smart agriculture. Digital agriculture: definitions and technologies. Agroenvironmental modeling and the digital transformation of agriculture. Geotechnologies in digital agriculture. Scientific computing in agriculture. Computer vision applied to agriculture. Technologies developed in precision agriculture. Information engineering: contributions to digital agriculture. DIPN: a dictionary of the internal proteins nanoenvironments and their potential for transformation into agricultural assets. Applications of bioinformatics in agriculture. Genomics applied to climate change: biotechnology for digital agriculture. Innovation ecosystem in agriculture: Embrapa?s evolution and contributions. The law related to the digitization of agriculture. Innovating communication in the age of digital agriculture. Driving forces for Brazilian agriculture in the next decade: implications for digital agriculture. Challenges, trends and opportunities in digital agriculture in Brazil
Pre-Trained Driving in Localized Surroundings with Semantic Radar Information and Machine Learning
Entlang der Signalverarbeitungskette von Radar Detektionen bis zur Fahrzeugansteuerung, diskutiert diese Arbeit eine semantischen Radar Segmentierung, einen darauf aufbauenden Radar SLAM, sowie eine im Verbund realisierte autonome Parkfunktion. Die Radarsegmentierung der (statischen) Umgebung wird durch ein Radar-spezifisches neuronales Netzwerk RadarNet erreicht. Diese Segmentierung ermöglicht die Entwicklung des semantischen Radar Graph-SLAM SERALOC. Auf der Grundlage der semantischen Radar SLAM Karte wird eine beispielhafte autonome Parkfunktionalität in einem realen Versuchsträger umgesetzt.
Entlang eines aufgezeichneten Referenzfades parkt die Funktion ausschließlich auf Basis der Radar Wahrnehmung mit bisher unerreichter Positioniergenauigkeit.
Im ersten Schritt wird ein Datensatz von 8.2 · 10^6 punktweise semantisch gelabelten Radarpunktwolken über eine Strecke von 2507.35m generiert. Es sind keine vergleichbaren Datensätze dieser Annotationsebene und Radarspezifikation öffentlich verfügbar. Das überwachte
Training der semantischen Segmentierung RadarNet erreicht 28.97% mIoU auf sechs Klassen.
Außerdem wird ein automatisiertes Radar-Labeling-Framework SeRaLF vorgestellt, welches das Radarlabeling multimodal mittels Referenzkameras und LiDAR unterstützt.
Für die kohärente Kartierung wird ein Radarsignal-Vorfilter auf der Grundlage einer Aktivierungskarte entworfen, welcher Rauschen und andere dynamische Mehrwegreflektionen unterdrückt. Ein speziell für Radar angepasstes Graph-SLAM-Frontend mit Radar-Odometrie
Kanten zwischen Teil-Karten und semantisch separater NDT Registrierung setzt die vorgefilterten semantischen Radarscans zu einer konsistenten metrischen Karte zusammen. Die Kartierungsgenauigkeit und die Datenassoziation werden somit erhöht und der erste semantische Radar Graph-SLAM für beliebige statische Umgebungen realisiert.
Integriert in ein reales Testfahrzeug, wird das Zusammenspiel der live RadarNet Segmentierung und des semantischen Radar Graph-SLAM anhand einer rein Radar-basierten autonomen Parkfunktionalität evaluiert. Im Durchschnitt über 42 autonome Parkmanöver
(∅3.73 km/h) bei durchschnittlicher Manöverlänge von ∅172.75m wird ein Median absoluter Posenfehler von 0.235m und End-Posenfehler von 0.2443m erreicht, der vergleichbare
Radar-Lokalisierungsergebnisse um ≈ 50% übertrifft. Die Kartengenauigkeit von veränderlichen, neukartierten Orten über eine Kartierungsdistanz von ∅165m ergibt eine ≈ 56%-ige Kartenkonsistenz bei einer Abweichung von ∅0.163m. Für das autonome Parken wurde ein gegebener Trajektorienplaner und Regleransatz verwendet
A semi-automated approach to model architectural elements in Scan-to-BIM processes
In the last years, the AEC (Architecture, Engineering and Construction) domain has exponentially increased the use of BIM and HBIM models for several applications, such as planning renovation and restoration, building maintenance, cost managing, or structural/energetic retrofit design. However, obtaining detailed as-built BIM models is a demanding and time-consuming process. Especially in historical contexts, many different and complex architectural elements need to be carefully and manually modelled. Meshes or surfaces and NURBS or polylines, derived from 3D reality-based data, are recently used as a reference for the HBIM accurate modelling. This work proposes a comprehensive and novel semi-automated approach to reconstruct architectural elements through the Visual Programming Language (VPL) Dynamo software and a Boundary-Representation method (B-rep), starting from
3D surveying data and point clouds classification. A wide package of scripts provides solutions for modelling complex shapes and transferring the obtained 3D models into BIM Authoring tools for a complete reconstruction phase. The presented procedure, useful for different BIM or HBIM applications, proved to reduce the modelling time significantly
Synthetic Datasets for Autonomous Driving: A Survey
Autonomous driving techniques have been flourishing in recent years while
thirsting for huge amounts of high-quality data. However, it is difficult for
real-world datasets to keep up with the pace of changing requirements due to
their expensive and time-consuming experimental and labeling costs. Therefore,
more and more researchers are turning to synthetic datasets to easily generate
rich and changeable data as an effective complement to the real world and to
improve the performance of algorithms. In this paper, we summarize the
evolution of synthetic dataset generation methods and review the work to date
in synthetic datasets related to single and multi-task categories for to
autonomous driving study. We also discuss the role that synthetic dataset plays
the evaluation, gap test, and positive effect in autonomous driving related
algorithm testing, especially on trustworthiness and safety aspects. Finally,
we discuss general trends and possible development directions. To the best of
our knowledge, this is the first survey focusing on the application of
synthetic datasets in autonomous driving. This survey also raises awareness of
the problems of real-world deployment of autonomous driving technology and
provides researchers with a possible solution.Comment: 19 pages, 5 figure
Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul
Deploying deep learning-based applications in specialized domains like the
aircraft production industry typically suffers from the training data
availability problem. Only a few datasets represent non-everyday objects,
situations, and tasks. Recent advantages in research around Vision Foundation
Models (VFM) opened a new area of tasks and models with high generalization
capabilities in non-semantic and semantic predictions. As recently demonstrated
by the Segment Anything Project, exploiting VFM's zero-shot capabilities is a
promising direction in tackling the boundaries spanned by data, context, and
sensor variety. Although, investigating its application within specific domains
is subject to ongoing research. This paper contributes here by surveying
applications of the SAM in aircraft production-specific use cases. We include
manufacturing, intralogistics, as well as maintenance, repair, and overhaul
processes, also representing a variety of other neighboring industrial domains.
Besides presenting the various use cases, we further discuss the injection of
domain knowledge
Pathway to Future Symbiotic Creativity
This report presents a comprehensive view of our vision on the development
path of the human-machine symbiotic art creation. We propose a classification
of the creative system with a hierarchy of 5 classes, showing the pathway of
creativity evolving from a mimic-human artist (Turing Artists) to a Machine
artist in its own right. We begin with an overview of the limitations of the
Turing Artists then focus on the top two-level systems, Machine Artists,
emphasizing machine-human communication in art creation. In art creation, it is
necessary for machines to understand humans' mental states, including desires,
appreciation, and emotions, humans also need to understand machines' creative
capabilities and limitations. The rapid development of immersive environment
and further evolution into the new concept of metaverse enable symbiotic art
creation through unprecedented flexibility of bi-directional communication
between artists and art manifestation environments. By examining the latest
sensor and XR technologies, we illustrate the novel way for art data collection
to constitute the base of a new form of human-machine bidirectional
communication and understanding in art creation. Based on such communication
and understanding mechanisms, we propose a novel framework for building future
Machine artists, which comes with the philosophy that a human-compatible AI
system should be based on the "human-in-the-loop" principle rather than the
traditional "end-to-end" dogma. By proposing a new form of inverse
reinforcement learning model, we outline the platform design of machine
artists, demonstrate its functions and showcase some examples of technologies
we have developed. We also provide a systematic exposition of the ecosystem for
AI-based symbiotic art form and community with an economic model built on NFT
technology. Ethical issues for the development of machine artists are also
discussed
- …