8,660 research outputs found
Multisensory Motion Perception in 3\u20134 Month-Old Infants
Human infants begin very early in life to take advantage of multisensory information by extracting the invariant amodal information that is conveyed redundantly by multiple senses. Here we addressed the question as to whether infants can bind multisensory moving stimuli, and whether this occurs even if the motion produced by the stimuli is only illusory. Three- to 4-month-old infants were presented with two bimodal pairings: visuo-tactile and audio-visual. Visuo-tactile pairings consisted of apparently vertically moving bars (the Barber Pole illusion) moving in either the same or opposite direction with a concurrent tactile stimulus consisting of strokes given on the infant\u2019s back. Audio-visual pairings consisted of the Barber Pole illusion in its visual and auditory version, the latter giving the impression of a continuous rising or ascending pitch. We found that infants were able to discriminate congruently (same direction) vs. incongruently moving (opposite direction) pairs irrespective of modality (Experiment 1). Importantly, we also found that congruently moving visuo-tactile and audio-visual stimuli were preferred over incongruently moving bimodal stimuli (Experiment 2). Our findings suggest that very young infants are able to extract motion as amodal component and use it to match stimuli that only apparently move in the same direction
Audio-visual Self-Supervised Representation Learning in-the-wild
Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση
Reflections on the Preferential Liberalization of Services Trade
This paper takes stock of the forces that lie behind the recent rise of preferential agreements in services trade. Its initial focus is with a number of distinguishing features of services trade that sets it apart from trade in goods and shapes trade liberalization and rule-making approaches in the services field. The paper then documents the nature, modal and sectoral incidence of the trade and investment preferences spawned by PTAs in services. It does so with a view to addressing the question of how “preferential” is the preferential treatment of services trade? Finally, the paper addresses a number of considerations arising from attempts to multilateralize preferential access and rule-making in services trade.Services, trade in services, preferential trade agreements, General Agreement on Trade in Services, multilateral trading system
Contrastive representation learning: a framework and review
Contrastive Learning has recently received interest due to its success in self-supervised representation learning in the computer vision domain. However, the origins of Contrastive Learning date as far back as the 1990s and its development has spanned across many fields and domains including Metric Learning and natural language processing. In this paper, we provide a comprehensive literature review and we propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods. We also provide a taxonomy for each of the components of contrastive learning in order to summarise it and distinguish it from other forms of machine learning. We then discuss the inductive biases which are present in any contrastive learning system and we analyse our framework under different views from various sub-fields of Machine Learning. Examples of how contrastive learning has been applied in computer vision, natural language processing, audio processing, and others, as well as in Reinforcement Learning are also presented. Finally, we discuss the challenges and some of the most promising future research directions ahead
The audiovisual structure of onomatopoeias: An intrusion of real-world physics in lexical creation
Sound-symbolic word classes are found in different cultures and languages worldwide. These words are continuously produced to code complex information about events. Here we explore the capacity of creative language to transport complex multisensory information in a controlled experiment, where our participants improvised onomatopoeias from noisy moving objects in audio, visual and audiovisual formats. We found that consonants communicate movement types (slide, hit or ring) mainly through the manner of articulation in the vocal tract. Vowels communicate shapes in visual stimuli (spiky or rounded) and sound frequencies in auditory stimuli through the configuration of the lips and tongue. A machine learning model was trained to classify movement types and used to validate generalizations of our results across formats. We implemented the classifier with a list of cross-linguistic onomatopoeias simple actions were correctly classified, while different aspects were selected to build onomatopoeias of complex actions. These results show how the different aspects of complex sensory information are coded and how they interact in the creation of novel onomatopoeias.Fil: Taitz, Alan. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Assaneo, María Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Elisei, Natalia Gabriela. Universidad de Buenos Aires. Facultad de Medicina; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Tripodi, Monica Noemi. Universidad de Buenos Aires; ArgentinaFil: Cohen, Laurent. Centre National de la Recherche Scientifique; Francia. Universite Pierre et Marie Curie; Francia. Institut National de la Santé et de la Recherche Médicale; FranciaFil: Sitt, Jacobo Diego. Centre National de la Recherche Scientifique; Francia. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Institut National de la Santé et de la Recherche Médicale; Francia. Universite Pierre et Marie Curie; FranciaFil: Trevisan, Marcos Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; Argentin
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
Weakly-supervised audio-visual violence detection aims to distinguish
snippets containing multimodal violence events with video-level labels. Many
prior works perform audio-visual integration and interaction in an early or
intermediate manner, yet overlooking the modality heterogeneousness over the
weakly-supervised setting. In this paper, we analyze the modality asynchrony
and undifferentiated instances phenomena of the multiple instance learning
(MIL) procedure, and further investigate its negative impact on
weakly-supervised audio-visual learning. To address these issues, we propose a
modality-aware contrastive instance learning with self-distillation (MACIL-SD)
strategy. Specifically, we leverage a lightweight two-stream network to
generate audio and visual bags, in which unimodal background, violent, and
normal instances are clustered into semi-bags in an unsupervised way. Then
audio and visual violent semi-bag representations are assembled as positive
pairs, and violent semi-bags are combined with background and normal instances
in the opposite modality as contrastive negative pairs. Furthermore, a
self-distillation module is applied to transfer unimodal visual knowledge to
the audio-visual model, which alleviates noises and closes the semantic gap
between unimodal and multimodal features. Experiments show that our framework
outperforms previous methods with lower complexity on the large-scale
XD-Violence dataset. Results also demonstrate that our proposed approach can be
used as plug-in modules to enhance other networks. Codes are available at
https://github.com/JustinYuu/MACIL_SD.Comment: ACM MM 202
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Self-supervised audio-visual source localization aims to locate sound-source
objects in video frames without extra annotations. Recent methods often
approach this goal with the help of contrastive learning, which assumes only
the audio and visual contents from the same video are positive samples for each
other. However, this assumption would suffer from false negative samples in
real-world training. For example, for an audio sample, treating the frames from
the same audio class as negative samples may mislead the model and therefore
harm the learned representations e.g., the audio of a siren wailing may
reasonably correspond to the ambulances in multiple images). Based on this
observation, we propose a new learning strategy named False Negative Aware
Contrastive (FNAC) to mitigate the problem of misleading the training with such
false negative samples. Specifically, we utilize the intra-modal similarities
to identify potentially similar samples and construct corresponding adjacency
matrices to guide contrastive learning. Further, we propose to strengthen the
role of true negative samples by explicitly leveraging the visual features of
sound sources to facilitate the differentiation of authentic sounding source
regions. FNAC achieves state-of-the-art performances on Flickr-SoundNet,
VGG-Sound, and AVSBench, which demonstrates the effectiveness of our method in
mitigating the false negative issue. The code is available at
\url{https://github.com/OpenNLPLab/FNAC_AVL}.Comment: CVPR202
- …