Search CORE

49 research outputs found

nerf2nerf: Pairwise Registration of Neural Radiance Fields

Author: Garg Animesh
Goli Lily
Rebain Daniel
Sabour Sara
Tagliasacchi Andrea
Publication venue
Publication date: 03/11/2022
Field of study

We introduce a technique for pairwise registration of neural fields that extends classical optimization-based local registration (i.e. ICP) to operate on Neural Radiance Fields (NeRF) -- neural 3D scene representations trained from collections of calibrated images. NeRF does not decompose illumination and color, so to make registration invariant to illumination, we introduce the concept of a ''surface field'' -- a field distilled from a pre-trained NeRF model that measures the likelihood of a point being on the surface of an object. We then cast nerf2nerf registration as a robust optimization that iteratively seeks a rigid transformation that aligns the surface fields of the two scenes. We evaluate the effectiveness of our technique by introducing a dataset of pre-trained NeRF scenes -- our synthetic scenes enable quantitative evaluations and comparisons to classical registration techniques, while our real scenes demonstrate the validity of our technique in real-world scenarios. Additional results available at: https://nerf2nerf.github.i

arXiv.org e-Print Archive

Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems

Author: Bennett James
Defferrard Michaël
den Oord Aaron Van
Hernández-Lobato José Miguel
Johnson Christopher C
Krizhevsky Alex
Levy Mark
Lin Jimmy
Rendle Steffen
Rendle Steffen
Sabour Sara
Zachary
Zhang Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques and their latent factor models to neural approaches. However, given the proven power of latent factor models, some newer neural approaches incorporate them within more complex network architectures. One specific idea, recently put forward by several researchers, is to consider potential correlations between the latent factors, i.e., embeddings, by applying convolutions over the user-item interaction map. However, contrary to what is claimed in these articles, such interaction maps do not share the properties of images where Convolutional Neural Networks (CNNs) are particularly useful. In this work, we show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers. Moreover, additional performance evaluations show that all of the examined recent CNN-based models are outperformed by existing non-neural machine learning techniques or traditional nearest-neighbor approaches. On a more general level, our work points to major methodological issues in recommender systems research.Comment: Source code available here: https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluatio

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Author: Anderson Peter
Bahdanau Dzmitry
Caglayan Ozan
Desmond
He Kaiming
Jaiswal Ayush
Klein Guillaume
Michael
Papineni Kishore
Sabour Sara
Singh Maneet
Stig-Arne
Su Jinsong
Vaswani Ashish
Wang Mingxuan
Wu Qi
Xinyi Zhang
Yang Zhengxin
Zhang Xiangwen
Zheng Zaixiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/09/2020
Field of study

Multimodal machine translation (MMT), which mainly focuses on enhancing text-only translation with visual features, has attracted considerable attention from both computer vision and natural language processing communities. Most current MMT models resort to attention mechanism, global context modeling or multimodal joint representation learning to utilize visual features. However, the attention mechanism lacks sufficient semantic interactions between modalities while the other two provide fixed visual context, which is unsuitable for modeling the observed variability when generating translation. To address the above issues, in this paper, we propose a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at each timestep of decoding, we first employ the conventional source-target attention to produce a timestep-specific source-side context vector. Next, DCCN takes this vector as input and uses it to guide the iterative extraction of related visual features via a context-guided dynamic routing mechanism. Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities. Finally, we obtain two multimodal context vectors, which are fused and incorporated into the decoder for the prediction of the target word. Experimental results on the Multi30K dataset of English-to-German and English-to-French translation demonstrate the superiority of DCCN. Our code is available on https://github.com/DeepLearnXMU/MM-DCCN

arXiv.org e-Print Archive

Crossref

Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

Author: Devlin Jacob
Diakopoulos Nicholas
Hinton Geoffrey E.
Lan Zhenzhong
Marcus Adam
Mikolov Tomas
Nixon Lyndon
Sabour Sara
Thomas
Vaswani Ashish
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 30/06/2020
Field of study

A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context

Crossref

webLyzard technology gmbh