Search CORE

383,231 research outputs found

Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images

Author: Sun Xiaoshuai
Xie Haozhe
Yao Hongxun
Zhang Shengping
Zhou Shangchen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/07/2019
Field of study

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.Comment: ICCV 201

arXiv.org e-Print Archive

Crossref

Atypical audiovisual speech integration in infants at risk for autism

Author: A Klin
A Pickles
Andrew Whitehouse
B de Gelder
B Dodd
BS Abrahams
C Koning
CA Binnie
DW Massaro
DW Massaro
DW Massaro
E Kushnerenko
E Kushnerenko
EA Mongillo
EA Mongillo
EG Smith
EJ Gibson
Elena Kushnerenko
G Iarocci
H Gervais
H McGurk
H Tager-Flusberg
Helena Ribeiro
J Townsend
JA Guiraud
Jeanne A. Guiraud
JHG Williams
JR Irwin
K Sekiyama
K Sekiyama
K Sekiyama
K Tiippana
KA Loveland
KA Pelphrey
Kim Davies
KM Dalton
M Elsabbagh
M Legerstee
M Paré
M Rutter
Mark H. Johnson
Mayada Elsabbagh
ML Patterson
ML Spezio
O Megnin
P Hindley
P Howlin
P Tomalski
PD Zelazo
Przemyslaw Tomalski
R Goodman
RN Desjardins
RP Hobson
S Ozonoff
T Teinonen
TL Lewis
Tony Charman
TS Andersen
V Hus
WBA Jones
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/− audio/ba/and the congruent visual/ba/− audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/− audio/ga/display compared with the congruent visual/ga/− audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16) = 17.153, p = 0.001). The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25) = 0.09, p = 0.767), in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41) = 4.466, p = 0.041). In some cases this reduced ability might lead to the poor communication skills characteristic of autism

Crossref

Directory of Open Access Journals

PubMed Central

Birkbeck Institutional Research Online

The University of Manchester - Institutional Repository

King's Research Portal