49 research outputs found
nerf2nerf: Pairwise Registration of Neural Radiance Fields
We introduce a technique for pairwise registration of neural fields that
extends classical optimization-based local registration (i.e. ICP) to operate
on Neural Radiance Fields (NeRF) -- neural 3D scene representations trained
from collections of calibrated images. NeRF does not decompose illumination and
color, so to make registration invariant to illumination, we introduce the
concept of a ''surface field'' -- a field distilled from a pre-trained NeRF
model that measures the likelihood of a point being on the surface of an
object. We then cast nerf2nerf registration as a robust optimization that
iteratively seeks a rigid transformation that aligns the surface fields of the
two scenes. We evaluate the effectiveness of our technique by introducing a
dataset of pre-trained NeRF scenes -- our synthetic scenes enable quantitative
evaluations and comparisons to classical registration techniques, while our
real scenes demonstrate the validity of our technique in real-world scenarios.
Additional results available at: https://nerf2nerf.github.i
Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems
In recent years, algorithm research in the area of recommender systems has
shifted from matrix factorization techniques and their latent factor models to
neural approaches. However, given the proven power of latent factor models,
some newer neural approaches incorporate them within more complex network
architectures. One specific idea, recently put forward by several researchers,
is to consider potential correlations between the latent factors, i.e.,
embeddings, by applying convolutions over the user-item interaction map.
However, contrary to what is claimed in these articles, such interaction maps
do not share the properties of images where Convolutional Neural Networks
(CNNs) are particularly useful. In this work, we show through analytical
considerations and empirical evaluations that the claimed gains reported in the
literature cannot be attributed to the ability of CNNs to model embedding
correlations, as argued in the original papers. Moreover, additional
performance evaluations show that all of the examined recent CNN-based models
are outperformed by existing non-neural machine learning techniques or
traditional nearest-neighbor approaches. On a more general level, our work
points to major methodological issues in recommender systems research.Comment: Source code available here:
https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluatio
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
Multimodal machine translation (MMT), which mainly focuses on enhancing
text-only translation with visual features, has attracted considerable
attention from both computer vision and natural language processing
communities. Most current MMT models resort to attention mechanism, global
context modeling or multimodal joint representation learning to utilize visual
features. However, the attention mechanism lacks sufficient semantic
interactions between modalities while the other two provide fixed visual
context, which is unsuitable for modeling the observed variability when
generating translation. To address the above issues, in this paper, we propose
a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at
each timestep of decoding, we first employ the conventional source-target
attention to produce a timestep-specific source-side context vector. Next, DCCN
takes this vector as input and uses it to guide the iterative extraction of
related visual features via a context-guided dynamic routing mechanism.
Particularly, we represent the input image with global and regional visual
features, we introduce two parallel DCCNs to model multimodal context vectors
with visual features at different granularities. Finally, we obtain two
multimodal context vectors, which are fused and incorporated into the decoder
for the prediction of the target word. Experimental results on the Multi30K
dataset of English-to-German and English-to-French translation demonstrate the
superiority of DCCN. Our code is available on
https://github.com/DeepLearnXMU/MM-DCCN
Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence
A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context