23,331 research outputs found

    Towards automatic extraction of expressive elements from motion pictures : tempo

    Full text link
    This paper proposes a unique computational approach to extraction of expressive elements of motion pictures for deriving high level semantics of stories portrayed, thus enabling better video annotation and interpretation systems. This approach, motivated and directed by the existing cinematic conventions known as film grammar, as a first step towards demonstrating its effectiveness, uses the attributes of motion and shot length to define and compute a novel measure of tempo of a movie. Tempo flow plots are defined and derived for four full-length movies and edge analysis is performed leading to the extraction of dramatic story sections and events signaled by their unique tempo. The results confirm tempo as a useful attribute in its own right and a promising component of semantic constructs such as tone or mood of a film

    Detecção de eventos complexos em vídeos baseada em ritmos visuais

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de eventos complexos em vídeos possui várias aplicações práticas relevantes, alavancadas pela grande disponibilidade de câmeras digitais instaladas em aeroportos, estações de ônibus e trens, centros de compras, estádios, hospitais, escolas, prédios, estradas, entre vários outros locais. Avanços na tecnologia digital têm aumentado as capacidades dos sistemas em reconhecer eventos em vídeos por meio do desenvolvimento de dispositivos com alta resolução, dimensões físicas pequenas e altas taxas de amostragem. Muitos trabalhos disponíveis na literatura têm explorado o tema a partir de diferentes pontos de vista. Este trabalho apresenta e avalia uma metodologia para extrair características dos ritmos visuais no contexto de detecção de eventos em vídeos. Um ritmo visual pode ser visto com a projeção de um vídeo em uma imagem, tal que a tarefa de análise de vídeos é reduzida a um problema de análise de imagens, beneficiando-se de seu baixo custo de processamento em termos de tempo e complexidade. Para demonstrar o potencial do ritmo visual na análise de vídeos complexos, três problemas da área de visão computacional são selecionados: detecção de eventos anômalos, classificação de ações humanas e reconhecimento de gestos. No primeiro problema, um modelo e? aprendido com situações de normalidade a partir dos rastros deixados pelas pessoas ao andar, enquanto padro?es representativos das ações são extraídos nos outros dois problemas. Nossa hipo?tese e? de que vídeos similares produzem padro?es semelhantes, tal que o problema de classificação de ações pode ser reduzido a uma tarefa de classificação de imagens. Experimentos realizados em bases públicas de dados demonstram que o método proposto produz resultados promissores com baixo custo de processamento, tornando-o possível aplicar em tempo real. Embora os padro?es dos ritmos visuais sejam extrai?dos como histograma de gradientes, algumas tentativas para adicionar características do fluxo o?tico são discutidas, além de estratégias para obter ritmos visuais alternativosAbstract: The recognition of complex events in videos has currently several important applications, particularly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, hospitals, schools, buildings, roads, among others. Moreover, advances in digital technology have enhanced the capabilities for detection of video events through the development of devices with high resolution, small physical size, and high sampling rates. Many works available in the literature have explored the subject from different perspectives. This work presents and evaluates a methodology for extracting a feature descriptor from visual rhythms of video sequences in order to address the video event detection problem. A visual rhythm can be seen as the projection of a video onto an image, such that the video analysis task can be reduced into an image analysis problem, benefiting from its low processing cost in terms of time and complexity. To demonstrate the potential of the visual rhythm in the analysis of complex videos, three computer vision problems are selected in this work: abnormal event detection, human action classification, and gesture recognition. The former problem learns a normalcy model from the traces that people leave when they walk, whereas the other two problems extract representative patterns from actions. Our hypothesis is that similar videos produce similar patterns, therefore, the action classification problem is reduced into an image classification task. Experiments conducted on well-known public datasets demonstrate that the method produces promising results at high processing rates, making it possible to work in real time. Even though the visual rhythm features are mainly extracted as histogram of gradients, some attempts for adding optical flow features are discussed, as well as strategies for obtaining alternative visual rhythmsMestradoCiência da ComputaçãoMestre em Ciência da Computação1570507, 1406910, 1374943CAPE

    Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project

    Get PDF
    The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system

    Transformation of context-dependent sensory dynamics into motor behavior

    Get PDF
    Latorre R, Levi R, Varona P (2013) Transformation of Context-dependent Sensory Dynamics into Motor Behavior. PLoS Comput Biol 9(2): e1002908. doi:10.1371/journal.pcbi.1002908The intrinsic dynamics of sensory networks play an important role in the sensory-motor transformation. In this paper we use conductance based models and electrophysiological recordings to address the study of the dual role of a sensory network to organize two behavioral context-dependent motor programs in the mollusk Clione limacina. We show that: (i) a winner take-all dynamics in the gravimetric sensory network model drives the typical repetitive rhythm in the wing central pattern generator (CPG) during routine swimming; (ii) the winnerless competition dynamics of the same sensory network organizes the irregular pattern observed in the wing CPG during hunting behavior. Our model also shows that although the timing of the activity is irregular, the sequence of the switching among the sensory cells is preserved whenever the same set of neurons are activated in a given time window. These activation phase locks in the sensory signals are transformed into specific events in the motor activity. The activation phase locks can play an important role in motor coordination driven by the intrinsic dynamics of a multifunctional sensory organThis work was supported by MINECO TIN2012-30883 and IPT-2011-0727-020000

    Emerging Linguistic Functions in Early Infancy

    Get PDF
    This paper presents results from experimental studies on early language acquisition in infants and attempts to interpret the experimental results within the framework of the Ecological Theory of Language Acquisition (ETLA) recently proposed by (Lacerda et al., 2004a). From this perspective, the infant’s first steps in the acquisition of the ambient language are seen as a consequence of the infant’s general capacity to represent sensory input and the infant’s interaction with other actors in its immediate ecological environment. On the basis of available experimental evidence, it will be argued that ETLA offers a productive alternative to traditional descriptive views of the language acquisition process by presenting an operative model of how early linguistic function may emerge through interaction

    Multisensory Motion Perception in 3\u20134 Month-Old Infants

    Get PDF
    Human infants begin very early in life to take advantage of multisensory information by extracting the invariant amodal information that is conveyed redundantly by multiple senses. Here we addressed the question as to whether infants can bind multisensory moving stimuli, and whether this occurs even if the motion produced by the stimuli is only illusory. Three- to 4-month-old infants were presented with two bimodal pairings: visuo-tactile and audio-visual. Visuo-tactile pairings consisted of apparently vertically moving bars (the Barber Pole illusion) moving in either the same or opposite direction with a concurrent tactile stimulus consisting of strokes given on the infant\u2019s back. Audio-visual pairings consisted of the Barber Pole illusion in its visual and auditory version, the latter giving the impression of a continuous rising or ascending pitch. We found that infants were able to discriminate congruently (same direction) vs. incongruently moving (opposite direction) pairs irrespective of modality (Experiment 1). Importantly, we also found that congruently moving visuo-tactile and audio-visual stimuli were preferred over incongruently moving bimodal stimuli (Experiment 2). Our findings suggest that very young infants are able to extract motion as amodal component and use it to match stimuli that only apparently move in the same direction

    Scene extraction in motion pictures

    Full text link
    This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method
    corecore