1,055 research outputs found

    Audio-Visual Learning for Scene Understanding

    Get PDF
    Multimodal deep learning aims at combining the complementary information of different modalities. Among all modalities, audio and video are the predominant ones that humans use to explore the world. In this thesis, we decided to focus our study on audio-visual deep learning to mimic with our networks how humans perceive the world. Our research includes images, audio signals and acoustic images. The latter provide spatial audio information and are obtained from a planar array of microphones combining their raw audios with the beamforming algorithm. They better mimic human auditory systems, which cannot be replicated using just one microphone, not able alone to give spatial sound cues. However, as microphones arrays are not so widespread, we also study how to handle the missing spatialized audio modality at test time. As a solution, we propose to distill acoustic images content to audio features during the training in order to handle their absence at test time. This is done for supervised audio classification using the generalized distillation framework, which we also extend for self-supervised learning. Next, we devise a method for reconstructing acoustic images given a single microphone and an RGB frame. Therefore, in case we just dispose of a standard video, we are able to synthesize spatial audio, which is useful for many audio-visual tasks, including sound localization. Lastly, as another example of restoring one modality from available ones, we inpaint degraded images providing audio features, to reconstruct the missing region not only to be visually plausible but also semantically consistent with the related sound. This includes also cross-modal generation, in the limit case of completely missing or hidden visual modality: our method naturally deals with it, being able to generate images from sound. In summary we show how audio can help visual learning and vice versa, by transferring knowledge between the two modalities at training time, in order to distill, reconstruct, or restore the missing modality at test time

    Memory for sounds: novel technological solutions for the evaluation of mnestic skills

    Get PDF
    Working memory (WM) plays a crucial role in helping individuals to perform everyday activities. The neural structures underlying this system continue to develop during infancy and reach maturity only late in development. Despite useful insights into visual memory mechanisms, audio-spatial memory has not been thoroughly investigated, especially in children and congenitally blind individuals. The main scientific objective of this thesis was to increase knowledge of spatial WM and imagery abilities in the auditory modality. We focused on how these skills change during typical development and on the consequences of early visual deprivation. Our first hypothesis was that the changes in WM functionality and spatial skills occurring in the early years of life, influence the ability to remember and associate spatialized sounds or to explore and learn acoustic spatial layouts. Since vision plays a crucial role in spatial cognition (ThinusBlanc and Gaunet, 1997), we expected blind individuals to encounter specific difficulties when asked to process and manipulate spatial information retained in memory, as already observed in the haptic modality (Cattaneo et al., 2008; Vecchi, 1998). Although some studies demonstrated the superior performance of the blind in various verbal-memory tasks (Amedi et al., 2003; Po\u17e\ue1r, 1982; R\uf6der et al., 2001), very little is known on how they remember and manipulate acoustic spatial information. The investigation of auditory cognition often requires specially adapted hardware and software solutions rarely available on the market. For example, in the case of studying cognitive functions that involve auditory spatial information, multiple acoustic spatial locations are required, such as numerous speakers or dedicated virtual acoustics. Thus, to the aim of this thesis, we took advantage of novel technological solutions developed explicitly for delivering non-visual spatialized stimuli. We worked on the software development of a vertical array of speakers (ARENA2D), an audio-tactile tablet (Audiobrush), and we designed a system based on an acoustic virtual reality (VR) simulation. These novel solutions were used to adapt validated clinical procedures (Corsi-Block test) and games (the card game Memory) to the auditory domain, to be also performed by visually impaired individuals. Thanks to the technologies developed in these years, we could investigate these topics and observed that audio-spatial memory abilities are strongly affected by the developmental stage and the lack of visual experience

    Learning with Contexts

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Guidelines for the development of immersive virtual reality software for cognitive neuroscience and neuropsychology:The development of Virtual Reality Everyday Assessment Lab (VR-EAL), A neuropsychological test battery in immersive virtual reality

    Get PDF
    International audienceVirtual reality (VR) head-mounted displays (HMD) appear to be effective research tools, which may address the problem of ecological validity in neuropsychological testing. However, their widespread implementation is hindered by VR induced symptoms and effects (VRISE) and the lack of skills in VR software development. This study offers guidelines for the development of VR software in cognitive neuroscience and neuropsychology, by describing and discussing the stages of the development of Virtual Reality Everyday Assessment Lab (VR-EAL), the first neuropsychological battery in immersive VR. Techniques for evaluating cognitive functions within a realistic storyline are discussed. The utility of various assets in Unity, software development kits, and other software are described so that cognitive scientists can overcome challenges pertinent to VRISE and the quality of the VR software. In addition, this pilot study attempts to evaluate VR-EAL in accordance with the necessary criteria for VR software for research purposes. The VR neuroscience questionnaire (VRNQ; Kourtesis et al., 2019b) was implemented to appraise the quality of the three versions of VR-EAL in terms of user experience, game mechanics, in-game assistance, and VRISE. Twenty-five participants aged between 20 and 45 years with 12–16 years of full-time education evaluated various versions of VR-EAL. The final version of VR-EAL achieved high scores in every sub-score of the VRNQ and exceeded its parsimonious cut-offs. It also appeared to have better in-game assistance and game mechanics, while its improved graphics substantially increased the quality of the user experience and almost eradicated VRISE. The results substantially support the feasibility of the development of effective VR research and clinical software without the presence of VRISE during a 60-min VR session

    Audio-visual foreground extraction for event characterization

    Get PDF
    This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage, and coupled with an audio BG/FG modelling scheme. The audiovisual association is performed on-line, by exploiting the concept of synchrony. Experimental tests carrying out classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by using the single modalities

    A 4D information system for the exploration of multitemporal images and maps using photogrammetry, web technologies and VR/AR

    Full text link
    [EN] This contribution shows the comparison, investigation, and implementation of different access strategies on multimodal data. The first part of the research is structured as a theoretical part opposing and explaining the terms of conventional access, virtual archival access, and virtual museums while additionally referencing related work. Especially, issues that still persist in repositories like the ambiguity or missing of metadata is pointed out. The second part explains the practical implementation of a workflow from a large image repository to various four-dimensional applications. Mainly, the filtering of images and in the following, the orientation of images is explained. Selection of the relevant images is partly done manually but also with the use of deep convolutional neural networks for image classification. In the following, photogrammetric methods are used for finding the relative orientation between image pairs in a projective frame. For this purpose, an adapted Structure from Motion (SfM) workflow is presented, in which the step of feature detection and matching is replaced by the Radiant-Invariant Feature Transform (RIFT) and Matching On Demand with View Synthesis (MODS). Both methods have been evaluated on a benchmark dataset and performed superior than other approaches. Subsequently, the oriented images are placed interactively and in the future automatically in a 4D browser application showing images, maps, and building models Further usage scenarios are presented in several Virtual Reality (VR) and Augmented Reality (AR) applications. The new representation of the archival data enables spatial and temporal browsing of repositories allowing the research of innovative perspectives and the uncovering of historical details.Highlights:Strategies for a completely automated workflow from image repositories to four-dimensional (4D) access approaches.The orientation of historical images using adapted and evaluated feature matching methods.4D access methods for historical images and 3D models using web technologies and Virtual Reality (VR)/Augmented Reality (AR).[ES] Esta contribución muestra la comparación, investigación e implementación de diferentes estrategias de acceso a datos multimodales. La primera parte de la investigación se estructura en una parte teórica en la que se oponen y explican los términos de acceso convencional, acceso a los archivos virtuales, y museos virtuales, a la vez que se hace referencia a trabajos relacionados. En especial, se señalan los problemas que aún persisten en los repositorios, como la ambigüedad o la falta de metadatos. La segunda parte explica la implementación práctica de un flujo de trabajo desde un gran repositorio de imágenes a varias aplicaciones en cuatro dimensiones (4D). Principalmente, se explica el filtrado de imágenes y, a continuación, la orientación de las mismas. La selección de las imágenes relevantes se hace en parte manualmente, pero también con el uso de redes neuronales convolucionales profundas para la clasificación de las imágenes. A continuación, se utilizan métodos fotogramétricos para encontrar la orientación relativa entre pares de imágenes en un marco proyectivo. Para ello, se presenta un flujo de trabajo adaptado a partir de Structure from Motion, (SfM), en el que el paso de la detección y la correspondencia de entidades es sustituido por la Transformación de entidades invariante a la radiancia (Radiant-Invariant Feature Transform, RIFT) y la Correspondencia a demanda con vistas sintéticas (Matching on Demand with View Synthesis, MODS). Ambos métodos han sido evaluados sobre la base de un conjunto de datos de referencia y funcionaron mejor que otros procedimientos. Posteriormente, las imágenes orientadas se colocan interactivamente y en el futuro automáticamente en una aplicación de navegador 4D que muestra imágenes, mapas y modelos de edificios. Otros escenarios de uso se presentan en varias aplicación es de Realidad Virtual (RV) y Realidad Aumentada (RA). La nueva representación de los datos archivados permite la navegación espacial y temporal de los repositorios, lo que permite la investigación en perspectivas innovadoras y el descubrimiento de detalles históricos.The research upon which this paper is based is part of the junior research group UrbanHistory4D’s activities which has received funding from the German Federal Ministry of Education and Research under grant agreement No 01UG1630. This work was supported by the German Federal Ministry of Education and Research (BMBF, 01IS18026BA-F) by funding the competence center for Big Data “ScaDS Dresden/Leipzig”.Maiwald, F.; Bruschke, J.; Lehmann, C.; Niebling, F. (2019). Un sistema de información 4D para la exploración de imágenes y mapas multitemporales utilizando fotogrametría, tecnologías web y VR/AR. Virtual Archaeology Review. 10(21):1-13. https://doi.org/10.4995/var.2019.11867SWORD1131021Ackerman, A., & Glekas, E. (2017). Digital Capture and Fabrication Tools for Interpretation of Historic Sites. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, IV-2/W2, 107-114. doi:10.5194/isprs-annals-IV-2-W2-107-2017Armingeon, M., Komani, P., Zanwar, T., Korkut, S., & Dornberger, R. (2019). A Case Study: Assessing Effectiveness of the Augmented Reality Application in Augusta Raurica Augmented Reality and Virtual Reality (pp. 99-111): Springer.Artstor. (2019). Artstor Digital Library. Retrieved April 30, 2019, from https://library.artstor.orgBay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded Up Robust Features. Paper presented at the European Conference on Computer Vision, Berlin, Heidelberg.Beaudoin, J. E., & Brady, J. E. (2011). Finding visual information: a study of image resources used by archaeologists, architects, art historians, and artists. Art Documentation: Journal of the Art Libraries Society of North America, 30(2), 24-36.Beltrami, C., Cavezzali, D., Chiabrando, F., Iaccarino Idelson, A., Patrucco, G., & Rinaudo, F. (2019). 3D Digital and Physical Reconstruction of a Collapsed Dome using SFM Techniques from Historical Images. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W11, 217-224. doi:10.5194/isprs-archives-XLII-2-W11-217-2019Bevilacqua, M. G., Caroti, G., Piemonte, A., & Ulivieri, D. (2019). Reconstruction of lost Architectural Volumes by Integration of Photogrammetry from Archive Imagery with 3-D Models of the Status Quo. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W9, 119-125. doi:10.5194/isprs-archives-XLII-2-W9-119-2019Bitelli, G., Dellapasqua, M., Girelli, V. A., Sbaraglia, S., & Tinia, M. A. (2017). Historical Photogrammetry and Terrestrial Laser Scanning for the 3d Virtual Reconstruction of Destroyed Structures: A Case Study in Italy. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLII-5/W1, 113-119. doi:10.5194/isprs-archives-XLII-5-W1-113-2017Bruschke, J., Niebling, F., Maiwald, F., Friedrichs, K., Wacker, M., & Latoschik, M. E. (2017). Towards browsing repositories of spatially oriented historic photographic images in 3D web environments. Paper presented at the Proceedings of the 22nd International Conference on 3D Web Technology.Bruschke, J., Niebling, F., & Wacker, M. (2018). Visualization of Orientations of Spatial Historical Photographs. Paper presented at the Eurographics Workshop on Graphics and Cultural Heritage.Bruschke, J., & Wacker, M. (2014). Application of a Graph Database and Graphical User Interface for the CIDOC CRM. Paper presented at the Access and Understanding-Networking in the Digital Era. Session J1. The 2014 annual conference of CIDOC, the International Committee for Documentation of ICOM.Burdea, G. C., & Coiffet, P. (2003). Virtual reality technology: John Wiley & Sons.Callieri, M., Cignoni, P., Corsini, M., & Scopigno, R. (2008). Masked photo blending: Mapping dense photographic data set on high-resolution sampled 3D models. Computers & Graphics, 32(4), 464-473.Chum, O., & Matas, J. (2005). Matching with PROSAC-progressive sample consensus. Paper presented at the Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on.Coordination and Support Action Virtual Multimodal Museum (ViMM). (2018). ViMM. Retrieved April 30, 2019, from https://www.vi-mm.eu/CultLab3D. (2019). CultLab3D. Retrieved April 30, 2019, from https://www.cultlab3d.deDeng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. Paper presented at the 2009 IEEE conference on computer vision and pattern recognition.Deutsches Archäologisches Institut (DAI). (2019). iDAI.objects arachne (Arachne). Retrieved April 30, 2019, from https://arachne.dainst.org/Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap: CRC press.Europeana. (2019). Europeana Collections. Retrieved 30.04.2019, from https://www.europeana.euEvens, T., & Hauttekeete, L. (2011). Challenges of digital preservation for cultural heritage institutions. Journal of Librarianship and Information Science, 43(3), 157-165.Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381-395.Fleming‐May, R. A., & Green, H. (2016). Digital innovations in poetry: Practices of creative writing faculty in online literary publishing. Journal of the Association for Information Science and Technology, 67(4), 859-873.Franken, T., Dellepiane, M., Ganovelli, F., Cignoni, P., Montani, C., & Scopigno, R. (2005). Minimizing user intervention in registering 2D images to 3D models. The visual computer, 21(8-10), 619-628.Girardi, G., von Schwerin, J., Richards-Rissetto, H., Remondino, F., & Agugiaro, G. (2013). The MayaArch3D project: A 3D WebGIS for analyzing ancient architecture and landscapes. Literary and Linguistic Computing, 28(4), 736-753. doi:10.1093/llc/fqt059Grussenmeyer, P., & Al Khalil, O. (2017). From Metric Image Archives to Point Cloud Reconstruction: Case Study of the Great Mosque of Aleppo in Syria. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W5, 295-301. doi:10.5194/isprs-archives-XLII-2-W5-295-2017Gutierrez, M., Vexo, F., & Thalmann, D. (2008). Stepping into virtual reality: Springer Science & Business Media.Guttentag, D. A. (2010). Virtual reality: Applications and implications for tourism. Tourism Management, 31(5), 637-651.Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision: Cambridge university press.Koutsoudis, A., Arnaoutoglou, F., Tsaouselis, A., Ioannakis, G., & Chamzas, C. (2015). Creating 3D Replicas of Medium-to Large-Scale Monuments for Web-Based Dissemination Within the Framework of the 3D-Icons Project. CAA2015, 971.Li, J., Hu, Q., & Ai, M. (2018). RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform. arXiv preprint arXiv:1804.09493.Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.Maietti, F., Di Giulio, R., Piaia, E., Medici, M., & Ferrari, F. (2018). Enhancing Heritage fruition through 3D semantic modelling and digital tools: the INCEPTION project. Paper presented at the IOP Conference Series: Materials Science and Engineering.Maiwald, F., Schneider, D., Henze, F., Münster, S., & Niebling, F. (2018). Feature Matching of Historical Images Based on Geometry of Quadrilaterals. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLII-2, 643-650. doi:10.5194/isprs-archives-XLII-2-643-2018Maiwald, F., Vietze, T., Schneider, D., Henze, F., Münster, S., & Niebling, F. (2017). Photogrammetric analysis of historical image repositories for virtual reconstruction in the field of digital humanities. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 42, 447.Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761-767.Melero, F. J., Revelles, J., & Bellido, M. L. (2018). Atalaya3D: making universities' cultural heritage accessible through 3D technologies.Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995). Augmented reality: A class of displays on the reality-virtuality continuum. Paper presented at the Telemanipulator and telepresence technologies.Mishkin, D., Matas, J., & Perdoch, M. (2015). MODS: Fast and robust method for two-view matching. Computer Vision and Image Understanding, 141, 81-93.Moulon, P., Monasse, P., & Marlet, R. (2012). Adaptive structure from motion with a contrario model estimation. Paper presented at the Asian Conference on Computer Vision.Münster, S., Kamposiori, C., Friedrichs, K., & Kröber, C. (2018). Image libraries and their scholarly use in the field of art and architectural history. International journal on digital libraries, 19(4), 367-383.Niebling, F., Bruschke, J., & Latoschik, M. E. (2018). Browsing Spatial Photography for Dissemination of Cultural Heritage Research Results using Augmented Models.Niebling, F., Maiwald, F., Barthel, K., & Latoschik, M. E. (2017). 4D Augmented City Models, Photogrammetric Creation and Dissemination Digital Research and Education in Architectural Heritage (pp. 196-212). Cham: Springer International Publishing.Oliva, L. S., Mura, A., Betella, A., Pacheco, D., Martinez, E., & Verschure, P. (2015). Recovering the history of Bergen Belsen using an interactive 3D reconstruction in a mixed reality space the role of pre-knowledge on memory recollection. Paper presented at the 2015 Digital Heritage.Pani Paudel, D., Habed, A., Demonceaux, C., & Vasseur, P. (2015). Robust and optimal sum-of-squares-based point-to-plane registration of image sets and structured scenes. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision.Ross, S., & Hedstrom, M. (2005). Preservation research and sustainable digital libraries. International journal on digital libraries, 5(4), 317-324.Schindler, G., & Dellaert, F. (2012). 4D Cities: Analyzing, Visualizing, and Interacting with Historical Urban Photo Collections. Journal of Multimedia, 7(2), 124-131.Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision.Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Slater, M., & Sanchez-Vives, M. V. (2016). Enhancing our lives with immersive virtual reality. Frontiers in Robotics and AI, 3, 74.Styliani, S., Fotis, L., Kostas, K., & Petros, P. (2009). Virtual museums, a survey and some issues for consideration. Journal of cultural Heritage, 10(4), 520-528.Tschirschwitz, F., Büyüksalih, G., Kersten, T., Kan, T., Enc, G., & Baskaraca, P. (2019). Virtualising an Ottoman Fortress - Laser Scanning and 3D Modelling for the Development of an Interactive, Immersive Virtual Reality Application. International archives of the photogrammetry, remote sensing and spatial information sciences, 42(2/W9).Web3D Consortium. (2019). Open Standards for Real-Time 3D Communication. Retrieved April 30, 2019, from http://www.web3d.org/Wu, C. (2013). Towards linear-time incremental structure from motion. Paper presented at the 3D Vision-3DV 2013, 2013 International conference on.Wu, Y., Ma, W., Gong, M., Su, L., & Jiao, L. (2015). A Novel Point-Matching Algorithm Based on Fast Sample Consensus for Image Registration. IEEE Geosci. Remote Sensing Lett., 12(1), 43-47.Yoon, J., & Chung, E. (2011). Understanding image needs in daily life by analyzing questions in a social Q&A site. Journal of the American Society for Information Science and Technology, 62(11), 2201-2213
    corecore