239 research outputs found
Neural Semantic Surface Maps
We present an automated technique for computing a map between two genus-zero
shapes, which matches semantically corresponding regions to one another. Lack
of annotated data prohibits direct inference of 3D semantic priors; instead,
current State-of-the-art methods predominantly optimize geometric properties or
require varying amounts of manual annotation. To overcome the lack of annotated
training data, we distill semantic matches from pre-trained vision models: our
method renders the pair of 3D shapes from multiple viewpoints; the resulting
renders are then fed into an off-the-shelf image-matching method which
leverages a pretrained visual model to produce feature points. This yields
semantic correspondences, which can be projected back to the 3D shapes,
producing a raw matching that is inaccurate and inconsistent between different
viewpoints. These correspondences are refined and distilled into an
inter-surface map by a dedicated optimization scheme, which promotes
bijectivity and continuity of the output map. We illustrate that our approach
can generate semantic surface-to-surface maps, eliminating manual annotations
or any 3D training data requirement. Furthermore, it proves effective in
scenarios with high semantic complexity, where objects are non-isometrically
related, as well as in situations where they are nearly isometric
The role of facial movements in emotion recognition
Most past research on emotion recognition has used photographs of posed expressions intended to depict the apex of the emotional display. Although these studies have provided important insights into how emotions are perceived in the face, they necessarily leave out any role of dynamic information. In this Review, we synthesize evidence from vision science, affective science and neuroscience to ask when, how and why dynamic information contributes to emotion recognition, beyond the information conveyed in static images. Dynamic displays offer distinctive temporal information such as the direction, quality and speed of movement, which recruit higher-level cognitive processes and support social and emotional inferences that enhance judgements of facial affect. The positive influence of dynamic information on emotion recognition is most evident in suboptimal conditions when observers are impaired and/or facial expressions are degraded or subtle. Dynamic displays further recruit early attentional and motivational resources in the perceiver, facilitating the prompt detection and prediction of othersâ emotional states, with benefits for social interaction. Finally, because emotions can be expressed in various modalities, we examine the multimodal integration of dynamic and static cues across different channels, and conclude with suggestions for future research
XVoxel-Based Parametric Design Optimization of Feature Models
Parametric optimization is an important product design technique, especially
in the context of the modern parametric feature-based CAD paradigm. Realizing
its full potential, however, requires a closed loop between CAD and CAE (i.e.,
CAD/CAE integration) with automatic design modifications and simulation
updates. Conventionally the approach of model conversion is often employed to
form the loop, but this way of working is hard to automate and requires manual
inputs. As a result, the overall optimization process is too laborious to be
acceptable. To address this issue, a new method for parametric optimization is
introduced in this paper, based on a unified model representation scheme called
eXtended Voxels (XVoxels). This scheme hybridizes feature models and voxel
models into a new concept of semantic voxels, where the voxel part is
responsible for FEM solving, and the semantic part is responsible for
high-level information to capture both design and simulation intents. As such,
it can establish a direct mapping between design models and analysis models,
which in turn enables automatic updates on simulation results for design
modifications, and vice versa -- effectively a closed loop between CAD and CAE.
In addition, robust and efficient geometric algorithms for manipulating XVoxel
models and efficient numerical methods (based on the recent finite cell method)
for simulating XVoxel models are provided. The presented method has been
validated by a series of case studies of increasing complexity to demonstrate
its effectiveness. In particular, a computational efficiency improvement of up
to 55.8 times the existing FCM method has been seen.Comment: 22 page
Toward Mesh-Invariant 3D Generative Deep Learning with Geometric Measures
3D generative modeling is accelerating as the technology allowing the capture
of geometric data is developing. However, the acquired data is often
inconsistent, resulting in unregistered meshes or point clouds. Many generative
learning algorithms require correspondence between each point when comparing
the predicted shape and the target shape. We propose an architecture able to
cope with different parameterizations, even during the training phase. In
particular, our loss function is built upon a kernel-based metric over a
representation of meshes using geometric measures such as currents and
varifolds. The latter allows to implement an efficient dissimilarity measure
with many desirable properties such as robustness to resampling of the mesh or
point cloud. We demonstrate the efficiency and resilience of our model with a
generative learning task of human faces
Neural function approximation on graphs: shape modelling, graph discrimination & compression
Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces
Data-Driven ClassiïŹcation Methods for Craniosynostosis Using 3D Surface Scans
Diese Arbeit befasst sich mit strahlungsfreier Klassifizierung von
Kraniosynostose mit zusÀtzlichem Schwerpunkt auf Datenaugmentierung und auf
die Verwendung synthetischer Daten als Ersatz fĂŒr klinische Daten.
Motivation: Kraniosynostose ist eine Erkrankung, die SĂ€uglinge
betrifft und zu KopfdeformitĂ€ten fĂŒhrt. Diagnose mittels strahlungsfreier 3D
OberflÀchenscans ist eine vielversprechende Alternative zu traditioneller
computertomographischer Bildgebung. Aufgrund der niedrigen PrÀvalenz und
schwieriger Anonymisierbarkeit sind klinische Daten nur spÀrlich vorhanden.
Diese Arbeit adressiert diese Herausforderungen, indem sie neue
Klassifizierungsalgorithmen vorschlĂ€gt, synthetische Daten fĂŒr die
wissenschaftliche Gemeinschaft erstellt und zeigt, dass es möglich ist,
klinische Daten vollstÀndig durch synthetische Daten zu ersetzen, ohne die
Klassifikationsleistung zu beeintrÀchtigen.
Methoden: Ein Statistisches Shape Modell (SSM) von
Kraniosynostosepatienten wird erstellt und öffentlich zugÀnglich gemacht. Es
wird eine 3D-2D-Konvertierung von der 3D-Gittergeometrie in ein 2D-Bild
vorgeschlagen, die die Verwendung von Convolutional Neural Networks (CNNs) und
Datenaugmentierung im Bildbereich ermöglicht. Drei KlassifizierungsansÀtze
(basierend auf cephalometrischen Messungen, basierend auf dem SSM, und
basierend auf den 2D Bildern mit einem CNN) zur Unterscheidung zwischen
drei Pathologien und einer Kontrollgruppe werden vorgeschlagen und bewertet.
SchlieĂlich werden die klinischen Trainingsdaten vollstĂ€ndig durch
synthetische Daten aus einem SSM und einem generativen adversarialen Netz
(GAN) ersetzt.
Ergebnisse: Die vorgeschlagene CNN-Klassifikation ĂŒbertraf
konkurrierende AnsÀtze in einem klinischen Datensatz von 496 Probanden und
erreichte einen F1-Score von 0,964. Datenaugmentierung erhöhte den F1-Score
auf 0,975. Zuschreibungen der Klassifizierungsentscheidung zeigten hohe
Amplituden an Teilen des Kopfes, die mit Kraniosynostose in Verbindung stehen.
Das Ersetzen der klinischen Daten durch synthetische Daten, die mit einem SSM
und einem GAN erstellt wurden, ergab noch immer einen F1-Score von ĂŒber
0,95, ohne dass das Modell ein einziges klinisches Subjekt gesehen hatte.
Schlussfolgerung: Die vorgeschlagene Umwandlung von 3D-Geometrie in
ein 2D-kodiertes Bild verbesserte die Leistung bestehender Klassifikatoren und
ermöglichte eine Datenaugmentierung wÀhrend des Trainings. Unter Verwendung
eines SSM und eines GANs konnten klinische Trainingsdaten durch synthetische
Daten ersetzt werden. Diese Arbeit verbessert bestehende diagnostische
AnsÀtze auf strahlungsfreien Aufnahmen und demonstriert die Verwendbarkeit von
synthetischen Daten, was klinische Anwendungen objektiver, interpretierbarer,
und weniger kostspielig machen
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
Talking head video generation aims to animate a human face in a still image
with dynamic poses and expressions using motion information derived from a
target-driving video, while maintaining the person's identity in the source
image. However, dramatic and complex motions in the driving video cause
ambiguous generation, because the still source image cannot provide sufficient
appearance information for occluded regions or delicate expression variations,
which produces severe artifacts and significantly degrades the generation
quality. To tackle this problem, we propose to learn a global facial
representation space, and design a novel implicit identity representation
conditioned memory compensation network, coined as MCNet, for high-fidelity
talking head generation.~Specifically, we devise a network module to learn a
unified spatial facial meta-memory bank from all training samples, which can
provide rich facial structure and appearance priors to compensate warped source
facial features for the generation. Furthermore, we propose an effective query
mechanism based on implicit identity representations learned from the discrete
keypoints of the source image. It can greatly facilitate the retrieval of more
correlated information from the memory bank for the compensation. Extensive
experiments demonstrate that MCNet can learn representative and complementary
facial memory, and can clearly outperform previous state-of-the-art talking
head generation methods on VoxCeleb1 and CelebV datasets. Please check our
\href{https://github.com/harlanhong/ICCV2023-MCNET}{Project}.Comment: Accepted by ICCV2023, update the reference and figure
Cost-effective 3D scanning and printing technologies for outer ear reconstruction: Current status
Current 3D scanning and printing technologies offer not only state-of-the-art developments in the field of medical imaging and bio-engineering, but also cost and time effective solutions for surgical reconstruction procedures. Besides tissue engineering, where living cells are used, bio-compatible polymers or synthetic resin can be applied. The combination of 3D handheld scanning devices or volumetric imaging, (open-source) image processing packages, and 3D printers form a complete workflow chain that is capable of effective rapid prototyping of outer ear replicas. This paper reviews current possibilities and latest use cases for 3D-scanning, data processing and printing of outer ear replicas with a focus on low-cost solutions for rehabilitation engineering
Learning Neural Parametric Head Models
We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. At the core of our model lies a neural parametric representation that disentangles identity and expressions in disjoint latent spaces. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field. In addition, our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points. To facilitate generalization, we train our model on a newly-captured dataset of over 3700 head scans from 203 different identities using a custom high-end 3D scanning setup. Our dataset significantly exceeds comparable existing datasets, both with respect to quality and completeness of geometry, averaging around 3.5M mesh faces per scan 1 1 We will publicly release our dataset along with a public benchmark for both neural head avatar construction as well as an evaluation on a hidden test-set for inference-time fitting.. Finally, we demonstrate that our approach outperforms state-of-the-art methods in terms of fitting error and reconstruction quality
- âŠ