473 research outputs found

    Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100

    Get PDF
    This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (Damen in Scaling egocentric vision: ECCV, 2018), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the “test of time”—i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.Published versionResearch at Bristol is supported by Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1). Research at Catania is sponsored by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, by MISE - PON I&C 2014-2020, ENIGMA project (CUP: B61B19000520008) and by MIUR AIM - Attrazione e Mobilita Internazionale Linea 1 - AIM1893589 - CUP E64118002540007

    More than just perception-action recalibration: walking through a virtual environment causes rescaling of perceived space.

    Get PDF
    Egocentric distances in virtual environments are commonly underperceived by up to 50 % of the intended distance. However, a brief period of interaction in which participants walk through the virtual environment while receiving visual feedback can dramatically improve distance judgments. Two experiments were designed to explore whether the increase in postinteraction distance judgments is due to perception–action recalibration or the rescaling of perceived space. Perception–action recalibration as a result of walking interaction should only affect action-specific distance judgments, whereas rescaling of perceived space should affect all distance judgments based on the rescaled percept. Participants made blind-walking distance judgments and verbal size judgments in response to objects in a virtual environment before and after interacting with the environment through either walking (Experiment 1) or reaching (Experiment 2). Size judgments were used to infer perceived distance under the assumption of size–distance invariance, and these served as an implicit measure of perceived distance. Preinteraction walking and size-based distance judgments indicated an underperception of egocentric distance, whereas postinteraction walking and size-based distance judgments both increased as a result of the walking interaction, indicating that walking through the virtual environment with continuous visual feedback caused rescaling of the perceived space. However, interaction with the virtual environment through reaching had no effect on either type of distance judgment, indicating that physical translation through the virtual environment may be necessary for a rescaling of perceived space. Furthermore, the size-based distance and walking distance judgments were highly correlated, even across changes in perceived distance, providing support for the size–distance invariance hypothesis

    Angular Scale Expansion Theory And The Misperception Of Egocentric Distance In Locomotor Space

    Get PDF
    Perception is crucial for the control of action, but perception need not be scaled accurately to produce accurate actions. This paper reviews evidence for an elegant new theory of locomotor space perception that is based on the dense coding of angular declination so that action control may be guided by richer feedback. The theory accounts for why so much direct-estimation data suggests that egocentric distance is underestimated despite the fact that action measures have been interpreted as indicating accurate perception. Actions are calibrated to the perceived scale of space and thus action measures are typically unable to distinguish systematic (e.g., linearly scaled) misperception from accurate perception. Whereas subjective reports of the scaling of linear extent are difficult to evaluate in absolute terms, study of the scaling of perceived angles (which exist in a known scale, delimited by vertical and horizontal) provides new evidence regarding the perceptual scaling of locomotor space

    Alice in wonderland syndrome. a clinical and pathophysiological review

    Get PDF
    Alice in Wonderland Syndrome (AIWS) is a perceptual disorder, principally involving visual and somesthetic integration, firstly reported by Todd, on the literary suggestion of the strange experiences described by Lewis Carroll in Alice in Wonderland books. Symptoms may comprise among others aschematia and dysmetropsia. This syndrome has many different etiologies; however EBV infection is the most common cause in children, while migraine affects more commonly adults. Many data support a strict relationship between migraine and AIWS, which could be considered in many patients as an aura or a migraine equivalent, particularly in children. Nevertheless, AIWS seems to have anatomical correlates. According to neuroimaging, temporoparietal- occipital carrefour (TPO-C) is a key region for developing many of AIWS symptoms. The final part of this review aims to find the relationship between AIWS symptoms, presenting a pathophysiological model. In brief, AIWS symptoms depend on an alteration of TPO-C where visual-spatial and somatosensory information are integrated. Alterations in these brain regions may cause the cooccurrence of dysmetropsia and disorders of body schema. In our opinion, the association of other symptoms reported in literature could vary depending on different etiologies and the lack of clear diagnostic criteria

    Students taught by multimodal teachers are superior action recognizers

    Full text link
    The focal point of egocentric video understanding is modelling hand-object interactions. Standard models -- CNNs, Vision Transformers, etc. -- which receive RGB frames as input perform well, however, their performance improves further by employing additional modalities such as object detections, optical flow, audio, etc. as input. The added complexity of the required modality-specific modules, on the other hand, makes these models impractical for deployment. The goal of this work is to retain the performance of such multimodal approaches, while using only the RGB images as input at inference time. Our approach is based on multimodal knowledge distillation, featuring a multimodal teacher (in the current experiments trained only using object detections, optical flow and RGB frames) and a unimodal student (using only RGB frames as input). We present preliminary results which demonstrate that the resulting model -- distilled from a multimodal teacher -- significantly outperforms the baseline RGB model (trained without knowledge distillation), as well as an omnivorous version of itself (trained on all modalities jointly), in both standard and compositional action recognition.Comment: Extended abstract accepted at the 2nd Ego4D Workshop @ ECCV 202
    • …
    corecore