31 research outputs found

    Fusion-GRU: A Deep Learning Model for Future Bounding Box Prediction of Traffic Agents in Risky Driving Videos

    Full text link
    To ensure the safe and efficient navigation of autonomous vehicles and advanced driving assistance systems in complex traffic scenarios, predicting the future bounding boxes of surrounding traffic agents is crucial. However, simultaneously predicting the future location and scale of target traffic agents from the egocentric view poses challenges due to the vehicle's egomotion causing considerable field-of-view changes. Moreover, in anomalous or risky situations, tracking loss or abrupt motion changes limit the available observation time, requiring learning of cues within a short time window. Existing methods typically use a simple concatenation operation to combine different cues, overlooking their dynamics over time. To address this, this paper introduces the Fusion-Gated Recurrent Unit (Fusion-GRU) network, a novel encoder-decoder architecture for future bounding box localization. Unlike traditional GRUs, Fusion-GRU accounts for mutual and complex interactions among input features. Moreover, an intermediary estimator coupled with a self-attention aggregation layer is also introduced to learn sequential dependencies for long range prediction. Finally, a GRU decoder is employed to predict the future bounding boxes. The proposed method is evaluated on two publicly available datasets, ROL and HEV-I. The experimental results showcase the promising performance of the Fusion-GRU, demonstrating its effectiveness in predicting future bounding boxes of traffic agents

    Indoor future person localization from an egocentric wearable camera

    Get PDF
    Accurate prediction of future person location and movement trajectory from an egocentric wearable camera can benefit a wide range of applications, such as assisting visually impaired people in navigation, and the development of mobility assistance for people with disability. In this work, a new egocentric dataset was constructed using a wearable camera, with 8,250 short clips of a targeted person either walking 1) toward, 2) away, or 3) across the camera wearer in indoor environments, or 4) staying still in the scene, and 13,817 person bounding boxes were manually labelled. Apart from the bounding boxes, the dataset also contains the estimated pose of the targeted person as well as the IMU signal of the wearable camera at each time point. An LSTM-based encoder-decoder framework was designed to predict the future location and movement trajectory of the targeted person in this egocentric setting. Extensive experiments have been conducted on the new dataset, and have shown that the proposed method is able to reliably and better predict future person location and trajectory in egocentric videos captured by the wearable camera compared to three baselines

    Coaching Imagery to Athletes with Aphantasia

    Get PDF
    We administered the Plymouth Sensory Imagery Questionnaire (Psi-Q) which tests multi-sensory imagery, to athletes (n=329) from 9 different sports to locate poor/aphantasic (baseline scores <4.2/10) imagers with the aim to subsequently enhance imagery ability. The low imagery sample (n=27) were randomly split into two groups who received the intervention: Functional Imagery Training (FIT), either immediately, or delayed by one month at which point the delayed group were tested again on the Psi-Q. All participants were tested after FIT delivery and six months post intervention. The delayed group showed no significant change between baseline and the start of FIT delivery but both groups imagery score improved significantly (p=0.001) after the intervention which was maintained six months post intervention. This indicates that imagery can be trained, with those who identify as having aphantasia (although one participant did not improve on visual scores), and improvements maintained in poor imagers. Follow up interviews (n=22) on sporting application revealed that the majority now use imagery daily on process goals. Recommendations are given for ways to assess and train imagery in an applied sport setting

    Scalable methods for single and multi camera trajectory forecasting

    Get PDF
    Predicting the future trajectory of objects in video is a critical task within computer vision with numerous application domains. For example, reliable anticipation of pedestrian trajectory is imperative for the operation of intelligent vehicles and can significantly enhance the functionality of advanced driver assistance systems. Trajectory forecasting can also enable more accurate tracking of objects in video, particularly if the objects are not always visible, such as during occlusion or entering a blind spot in a non-overlapping multicamera network. However, due to the considerable human labour required to manually annotate data amenable to trajectory forecasting, the scale and variety of existing datasets used to study the problem is limited. In this thesis, we propose a set of strategies for pedestrian trajectory forecasting. We address the lack of training data by introducing a scalable machine annotation scheme that enables models to be trained using a large Single-Camera Trajectory Forecasting (SCTF) dataset without human annotation. Using newly collected datasets annotated using our proposed methods, we develop two models for SCTF. The first model, Dynamic Trajectory Predictor (DTP), forecasts pedestrian trajectory from on board a moving vehicle up to one second into the future. DTP is trained using both human and machine-annotated data and anticipates dynamic motion that linear models do not capture. Our second model, Spatio-Temporal Encoder-Decoder (STED), predicts full object bounding boxes in addition to trajectory. STED combines visual and temporal features to model both object-motion and ego-motion. In addition to our SCTF contributions, we also introduce a new task: Multi-Camera Trajectory Forecasting (MCTF), where the future trajectory of an object is predicted in a network of cameras. Prior works consider forecasting trajectories in a single camera view. Our work is the first to consider the challenging scenario of forecasting across multiple non-overlapping camera views. This has wide applicability in tasks such as re-identification and multitarget multi-camera tracking. To facilitate research in this new area, we collect a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras. We also develop a semi-automated annotation method to accurately label this large dataset containing 600 hours of video footage. We introduce an MCTF framework that simultaneously uses all estimated relative object locations from several camera viewpoints and predicts the object's future location in all possible camera viewpoints. Our framework follows a Which- When-Where approach that predicts in which camera(s) the objects appear and when and where within the camera views they appear. Experimental results demonstrate the effectiveness of our MCTF model, which outperforms existing SCTF approaches adapted to the MCTF framework

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Cognitive Maps

    Get PDF
    undefine

    In the Beginning was the Deed: From Sensorimotor Interactions to Integrative Spatial Encodings

    Get PDF
    Goal-oriented behavior requires reliable predictions regarding action outcomes. The theory of event segmentation and the free energy principle allow to derive hypotheses regarding the formation and maintenance of predictive models and their representational format. According to the free energy principle, cognitive systems constantly try to infer the causes of perceived sensations. This results in the formation of predictive models based on sensorimotor experience. Even if there is an ongoing debate regarding the representational format of these models, an integrative spatial code, which integrates different modalities in an abstract representation seems plausible. The integration process is assumed to be biased towards behaviorally relevant modalities. Moreover, a striving for consistency is assumed to maintain unambiguous states. Besides the representational format, the prediction process itself is of central interest. According to the event segmentation theory, cognitive systems segment the stream of sensorimotor information along significant changes, so-called event boundaries. Hence, it seems likely that predictions are carried out in terms of a simulation of the next, desired event boundary within the proposed integrative spatial code. The spatial code might support mental simulation in general, providing sensorimotor grounding to higher cognitive functions – as proposed by theories of embodied cognition. The proposed properties of the integrative spatial code were investigated in four studies, concerning the questions (i) whether multisensory integration is biased towards action-relevant modalities, (ii) how representations are kept consistent across frames of reference in case of multisensory conflict, (iii) if predictive models provide an anticipatory, event-like structure in the service of behavior control, and (iv) how different modalities are combined through a spatial code in the service of predictive simulations. The obtained results confirm the assumptions regarding the proposed integrative spatial code. The combination of the free energy principle and the theory of event segmentation seems a viable approach to account for the emergence of a predictive, integrative spatial code from sensorimotor interactions. The results allow the derivation of design principles for an artificial spatial reasoning system and the developed experimental paradigms allow further investigations of the causal role of spatial models in higher cognitive functions

    Reference Frames in Human Sensory, Motor, and Cognitive Processing

    Get PDF
    Reference-frames, or coordinate systems, are used to express properties and relationships of objects in the environment. While the use of reference-frames is well understood in physical sciences, how the brain uses reference-frames remains a fundamental question. The goal of this dissertation is to reach a better understanding of reference-frames in human perceptual, motor, and cognitive processing. In the first project, we study reference-frames in perception and develop a model to explain the transition from egocentric (based on the observer) to exocentric (based outside the observer) reference-frames to account for the perception of relative motion. In a second project, we focus on motor behavior, more specifically on goal-directed reaching. We develop a model that explains how egocentric perceptual and motor reference-frames can be coordinated through exocentric reference-frames. Finally, in a third project, we study how the cognitive system can store and recognize objects by using sensorimotor schema that allows mental rotation within an exocentric reference-frame
    corecore