947 research outputs found

    Audio-visual foreground extraction for event characterization

    Get PDF
    This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage, and coupled with an audio BG/FG modelling scheme. The audiovisual association is performed on-line, by exploiting the concept of synchrony. Experimental tests carrying out classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by using the single modalities

    Unveiling the multimedia unconscious: implicit cognitive processes and multimedia content analysis

    Get PDF
    One of the main findings of cognitive sciences is that automatic processes of which we are unaware shape, to a significant extent, our perception of the environment. The phenomenon applies not only to the real world, but also to multimedia data we consume every day. Whenever we look at pictures, watch a video or listen to audio recordings, our conscious attention efforts focus on the observable content, but our cognition spontaneously perceives intentions, beliefs, values, attitudes and other constructs that, while being outside of our conscious awareness, still shape our reactions and behavior. So far, multimedia technologies have neglected such a phenomenon to a large extent. This paper argues that taking into account cognitive effects is possible and it can also improve multimedia approaches. As a supporting proof-of-concept, the paper shows not only that there are visual patterns correlated with the personality traits of 300 Flickr users to a statistically significant extent, but also that the personality traits (both self-assessed and attributed by others) of those users can be inferred from the images these latter post as "favourite"

    SIMCO: SIMilarity-based object COunting

    Full text link
    We present SIMCO, the first agnostic multi-class object counting approach. SIMCO starts by detecting foreground objects through a novel Mask RCNN-based architecture trained beforehand (just once) on a brand-new synthetic 2D shape dataset, InShape; the idea is to highlight every object resembling a primitive 2D shape (circle, square, rectangle, etc.). Each object detected is described by a low-dimensional embedding, obtained from a novel similarity-based head branch; this latter implements a triplet loss, encouraging similar objects (same 2D shape + color and scale) to map close. Subsequently, SIMCO uses this embedding for clustering, so that different types of objects can emerge and be counted, making SIMCO the very first multi-class unsupervised counter. Experiments show that SIMCO provides state-of-the-art scores on counting benchmarks and that it can also help in many challenging image understanding tasks

    The pictures we like are our image: continuous mapping of favorite pictures into self-assessed and attributed personality traits

    Get PDF
    Flickr allows its users to tag the pictures they like as “favorite”. As a result, many users of the popular photo-sharing platform produce galleries of favorite pictures. This article proposes new approaches, based on Computational Aesthetics, capable to infer the personality traits of Flickr users from the galleries above. In particular, the approaches map low-level features extracted from the pictures into numerical scores corresponding to the Big-Five Traits, both self-assessed and attributed. The experiments were performed over 60,000 pictures tagged as favorite by 300 users (the PsychoFlickr Corpus). The results show that it is possible to predict beyond chance both self-assessed and attributed traits. In line with the state-of-the art of Personality Computing, these latter are predicted with higher effectiveness (correlation up to 0.68 between actual and predicted traits)

    Transformer Networks for Trajectory Forecasting

    Full text link
    Most recent successes on forecasting the people motion are based on LSTM models and all most recent progress has been achieved by modelling the social interaction among people and the people interaction with the scene. We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. This is a fundamental switch from the sequential step-by-step processing of LSTMs to the only-attention-based memory mechanisms of Transformers. In particular, we consider both the original Transformer Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art on all natural language processing tasks. Our proposed Transformers predict the trajectories of the individual people in the scene. These are "simple" model because each person is modelled separately without any complex human-human nor scene interaction terms. In particular, the TF model without bells and whistles yields the best score on the largest and most challenging trajectory forecasting benchmark of TrajNet. Additionally, its extension which predicts multiple plausible future trajectories performs on par with more engineered techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers may deal with missing observations, as it may be the case with real sensor data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer.Comment: 18 pages, 3 figure

    The Complexity of Reasoning about Spatial Congruence

    Get PDF
    In the recent literature of Artificial Intelligence, an intensive research effort has been spent, for various algebras of qualitative relations used in the representation of temporal and spatial knowledge, on the problem of classifying the computational complexity of reasoning problems for subsets of algebras. The main purpose of these researches is to describe a restricted set of maximal tractable subalgebras, ideally in an exhaustive fashion with respect to the hosting algebras. In this paper we introduce a novel algebra for reasoning about Spatial Congruence, show that the satisfiability problem in the spatial algebra MC-4 is NP-complete, and present a complete classification of tractability in the algebra, based on the individuation of three maximal tractable subclasses, one containing the basic relations. The three algebras are formed by 14, 10 and 9 relations out of 16 which form the full algebra
    • …
    corecore