4,506 research outputs found
Challenges for Monocular 6D Object Pose Estimation in Robotics
Object pose estimation is a core perception task that enables, for example,
object grasping and scene understanding. The widely available, inexpensive and
high-resolution RGB sensors and CNNs that allow for fast inference based on
this modality make monocular approaches especially well suited for robotics
applications. We observe that previous surveys on object pose estimation
establish the state of the art for varying modalities, single- and multi-view
settings, and datasets and metrics that consider a multitude of applications.
We argue, however, that those works' broad scope hinders the identification of
open challenges that are specific to monocular approaches and the derivation of
promising future challenges for their application in robotics. By providing a
unified view on recent publications from both robotics and computer vision, we
find that occlusion handling, novel pose representations, and formalizing and
improving category-level pose estimation are still fundamental challenges that
are highly relevant for robotics. Moreover, to further improve robotic
performance, large object sets, novel objects, refractive materials, and
uncertainty estimates are central, largely unsolved open challenges. In order
to address them, ontological reasoning, deformability handling, scene-level
reasoning, realistic datasets, and the ecological footprint of algorithms need
to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182
Blending the Material and Digital World for Hybrid Interfaces
The development of digital technologies in the 21st century is progressing continuously and new device classes such as tablets, smartphones or smartwatches are finding their way into our everyday lives. However, this development also poses problems, as these prevailing touch and gestural interfaces often lack tangibility, take little account of haptic qualities and therefore require full attention from their users. Compared to traditional tools and analog interfaces, the human skills to experience and manipulate material in its natural environment and context remain unexploited. To combine the best of both, a key question is how it is possible to blend the material world and digital world to design and realize novel hybrid interfaces in a meaningful way. Research on Tangible User Interfaces (TUIs) investigates the coupling between physical objects and virtual data. In contrast, hybrid interfaces, which specifically aim to digitally enrich analog artifacts of everyday work, have not yet been sufficiently researched and systematically discussed.
Therefore, this doctoral thesis rethinks how user interfaces can provide useful digital functionality while maintaining their physical properties and familiar patterns of use in the real world. However, the development of such hybrid interfaces raises overarching research questions about the design: Which kind of physical interfaces are worth exploring? What type of digital enhancement will improve existing interfaces? How can hybrid interfaces retain their physical properties while enabling new digital functions? What are suitable methods to explore different design? And how to support technology-enthusiast users in prototyping?
For a systematic investigation, the thesis builds on a design-oriented, exploratory and iterative development process using digital fabrication methods and novel materials. As a main contribution, four specific research projects are presented that apply and discuss different visual and interactive augmentation principles along real-world applications. The applications range from digitally-enhanced paper, interactive cords over visual watch strap extensions to novel prototyping tools for smart garments. While almost all of them integrate visual feedback and haptic input, none of them are built on rigid, rectangular pixel screens or use standard input modalities, as they all aim to reveal new design approaches. The dissertation shows how valuable it can be to rethink familiar, analog applications while thoughtfully extending them digitally. Finally, this thesis’ extensive work of engineering versatile research platforms is accompanied by overarching conceptual work, user evaluations and technical experiments, as well as literature reviews.Die Durchdringung digitaler Technologien im 21. Jahrhundert schreitet stetig voran und neue Geräteklassen wie Tablets, Smartphones oder Smartwatches erobern unseren Alltag. Diese Entwicklung birgt aber auch Probleme, denn die vorherrschenden berührungsempfindlichen Oberflächen berücksichtigen kaum haptische Qualitäten und erfordern daher die volle Aufmerksamkeit ihrer Nutzer:innen. Im Vergleich zu traditionellen Werkzeugen und analogen Schnittstellen bleiben die menschlichen Fähigkeiten ungenutzt, die Umwelt mit allen Sinnen zu begreifen und wahrzunehmen. Um das Beste aus beiden Welten zu vereinen, stellt sich daher die Frage, wie neuartige hybride Schnittstellen sinnvoll gestaltet und realisiert werden können, um die materielle und die digitale Welt zu verschmelzen. In der Forschung zu Tangible User Interfaces (TUIs) wird die Verbindung zwischen physischen Objekten und virtuellen Daten untersucht. Noch nicht ausreichend erforscht wurden hingegen hybride Schnittstellen, die speziell darauf abzielen, physische Gegenstände des Alltags digital zu erweitern und anhand geeigneter Designparameter und Entwurfsräume systematisch zu untersuchen.
In dieser Dissertation wird daher untersucht, wie Materialität und Digitalität nahtlos ineinander übergehen können. Es soll erforscht werden, wie künftige Benutzungsschnittstellen nützliche digitale Funktionen bereitstellen können, ohne ihre physischen Eigenschaften und vertrauten Nutzungsmuster in der realen Welt zu verlieren. Die Entwicklung solcher hybriden Ansätze wirft jedoch übergreifende Forschungsfragen zum Design auf: Welche Arten von physischen Schnittstellen sind es wert, betrachtet zu werden? Welche Art von digitaler Erweiterung verbessert das Bestehende? Wie können hybride Konzepte ihre physischen Eigenschaften beibehalten und gleichzeitig neue digitale Funktionen ermöglichen? Was sind geeignete Methoden, um verschiedene Designs zu erforschen? Wie kann man Technologiebegeisterte bei der Erstellung von Prototypen unterstützen?
Für eine systematische Untersuchung stützt sich die Arbeit auf einen designorientierten, explorativen und iterativen Entwicklungsprozess unter Verwendung digitaler Fabrikationsmethoden und neuartiger Materialien. Im Hauptteil werden vier Forschungsprojekte vorgestellt, die verschiedene visuelle und interaktive Prinzipien entlang realer Anwendungen diskutieren. Die Szenarien reichen von digital angereichertem Papier, interaktiven Kordeln über visuelle Erweiterungen von Uhrarmbändern bis hin zu neuartigen Prototyping-Tools für intelligente Kleidungsstücke. Um neue Designansätze aufzuzeigen, integrieren nahezu alle visuelles Feedback und haptische Eingaben, um Alternativen zu Standard-Eingabemodalitäten auf starren Pixelbildschirmen zu schaffen. Die Dissertation hat gezeigt, wie wertvoll es sein kann, bekannte, analoge Anwendungen zu überdenken und sie dabei gleichzeitig mit Bedacht digital zu erweitern. Dabei umfasst die vorliegende Arbeit sowohl realisierte technische Forschungsplattformen als auch übergreifende konzeptionelle Arbeiten, Nutzerstudien und technische Experimente sowie die Analyse existierender Forschungsarbeiten
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Body language (BL) refers to the non-verbal communication expressed through
physical movements, gestures, facial expressions, and postures. It is a form of
communication that conveys information, emotions, attitudes, and intentions
without the use of spoken or written words. It plays a crucial role in
interpersonal interactions and can complement or even override verbal
communication. Deep multi-modal learning techniques have shown promise in
understanding and analyzing these diverse aspects of BL. The survey emphasizes
their applications to BL generation and recognition. Several common BLs are
considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and
Talking Head (TH), and we have conducted an analysis and established the
connections among these four BL for the first time. Their generation and
recognition often involve multi-modal approaches. Benchmark datasets for BL
research are well collected and organized, along with the evaluation of SOTA
methods on these datasets. The survey highlights challenges such as limited
labeled data, multi-modal learning, and the need for domain adaptation to
generalize models to unseen speakers or languages. Future research directions
are presented, including exploring self-supervised learning techniques,
integrating contextual information from other modalities, and exploiting
large-scale pre-trained multi-modal models. In summary, this survey paper
provides a comprehensive understanding of deep multi-modal learning for various
BL generations and recognitions for the first time. By analyzing advancements,
challenges, and future directions, it serves as a valuable resource for
researchers and practitioners in advancing this field. n addition, we maintain
a continuously updated paper list for deep multi-modal learning for BL
recognition and generation: https://github.com/wentaoL86/awesome-body-language
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
Generalized Planning as Heuristic Search: A new planning search-space that leverages pointers over objects
Planning as heuristic search is one of the most successful approaches to
classical planning but unfortunately, it does not extend trivially to
Generalized Planning (GP). GP aims to compute algorithmic solutions that are
valid for a set of classical planning instances from a given domain, even if
these instances differ in the number of objects, the number of state variables,
their domain size, or their initial and goal configuration. The generalization
requirements of GP make it impractical to perform the state-space search that
is usually implemented by heuristic planners. This paper adapts the planning as
heuristic search paradigm to the generalization requirements of GP, and
presents the first native heuristic search approach to GP. First, the paper
introduces a new pointer-based solution space for GP that is independent of the
number of classical planning instances in a GP problem and the size of those
instances (i.e. the number of objects, state variables and their domain sizes).
Second, the paper defines a set of evaluation and heuristic functions for
guiding a combinatorial search in our new GP solution space. The computation of
these evaluation and heuristic functions does not require grounding states or
actions in advance. Therefore our GP as heuristic search approach can handle
large sets of state variables with large numerical domains, e.g.~integers.
Lastly, the paper defines an upgraded version of our novel algorithm for GP
called Best-First Generalized Planning (BFGP), that implements a best-first
search in our pointer-based solution space, and that is guided by our
evaluation/heuristic functions for GP.Comment: Under review in the Artificial Intelligence Journal (AIJ
Ditransitives in germanic languages. Synchronic and diachronic aspects
This volume brings together twelve empirical studies on ditransitive constructions in Germanic languages and their varieties, past and present. Specifically, the volume includes contributions on a wide variety of Germanic languages, including English, Dutch, and German, but also Danish, Swedish, and Norwegian, as well as lesser-studied ones such as Faroese. While the first part of the volume focuses on diachronic aspects, the second part showcases a variety of synchronic aspects relating to ditransitive patterns. Methodologically, the volume covers both experimental and corpus-based studies. Questions addressed by the papers in the volume are, among others, issues like the cross-linguistic pervasiveness and cognitive reality of factors involved in the choice between different ditransitive constructions, or differences and similarities in the diachronic development of ditransitives. The volume’s broad scope and comparative perspective offers comprehensive insights into well-known phenomena and furthers our understanding of variation across languages of the same family
Markov field models of molecular kinetics
Computer simulations such as molecular dynamics (MD) provide a possible means to understand protein dynamics and mechanisms on an atomistic scale. The resulting simulation data can be analyzed with Markov state models (MSMs), yielding a quantitative kinetic model that, e.g., encodes state populations and transition rates. However, the larger an investigated system, the more data is required to estimate a valid kinetic model. In this work, we show that this scaling problem can be escaped when decomposing a system into smaller ones, leveraging weak couplings between local domains. Our approach, termed independent Markov decomposition (IMD), is a first-order approximation neglecting couplings, i.e., it represents a decomposition of the underlying global dynamics into a set of independent local ones. We demonstrate that for truly independent systems, IMD can reduce the sampling by three orders of magnitude. IMD is applied to two biomolecular systems. First, synaptotagmin-1 is analyzed, a rapid calcium switch from the neurotransmitter release machinery. Within its C2A domain, local conformational switches are identified and modeled with independent MSMs, shedding light on the mechanism of its calcium-mediated activation. Second, the catalytic site of the serine protease TMPRSS2 is analyzed with a local drug-binding model. Equilibrium populations of different drug-binding modes are derived for three inhibitors, mirroring experimentally determined drug efficiencies. IMD is subsequently extended to an end-to-end deep learning framework called iVAMPnets, which learns a domain decomposition from simulation data and simultaneously models the kinetics in the local domains. We finally classify IMD and iVAMPnets as Markov field models (MFM), which we define as a class of models that describe dynamics by decomposing systems into local domains. Overall, this thesis introduces a local approach to Markov modeling that enables to quantitatively assess the kinetics of large macromolecular complexes, opening up possibilities to tackle current and future computational molecular biology questions
- …