4,905 research outputs found

    Show and Tell: A Neural Image Caption Generator

    Full text link
    Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art

    Detect the unexpected: a science for surveillance

    Get PDF
    Purpose – The purpose of this paper is to outline a strategy for research development focused on addressing the neglected role of visual perception in real life tasks such as policing surveillance and command and control settings. Approach – The scale of surveillance task in modern control room is expanding as technology increases input capacity at an accelerating rate. The authors review recent literature highlighting the difficulties that apply to modern surveillance and give examples of how poor detection of the unexpected can be, and how surprising this deficit can be. Perceptual phenomena such as change blindness are linked to the perceptual processes undertaken by law-enforcement personnel. Findings – A scientific programme is outlined for how detection deficits can best be addressed in the context of a multidisciplinary collaborative agenda between researchers and practitioners. The development of a cognitive research field specifically examining the occurrence of perceptual “failures” provides an opportunity for policing agencies to relate laboratory findings in psychology to their own fields of day-to-day enquiry. Originality/value – The paper shows, with examples, where interdisciplinary research may best be focussed on evaluating practical solutions and on generating useable guidelines on procedure and practice. It also argues that these processes should be investigated in real and simulated context-specific studies to confirm the validity of the findings in these new applied scenarios

    The Image from the Road: Towards Mapping the Phenomenological

    Get PDF
    An area of focus, used in early and contemporary forms of cognitive geography research, is the ‘cognitive map’, a concept that suggests “that people hold a map-like database in their minds to which they can add and use to tackle geographical tasks”. Kevin Lynch, an urban planner in the 1960s, was an early adopter of the cognitive map approach to reveal spatial cognition, what or how people see their environment, specifically cognition of the urban environment. Lynch’s research aimed to develop empirical methods, to identify how people make spatial relationships. Contemporary tools like machine learning are now considered relevant for such tasks. The proposed methods outline steps for categorizing a neural network image knowledge base grounded in perception theory. Categorizations and cartographic representations are made using GIS and locally weighted regression of the experiential phenomenon of structural density along roadways in Faytteville, Arkanasas. An alternative method of characterizing the city, one that accounts for the phenomenological as experienced from a human field of view during travel is offered

    Interaction Analysis in Smart Work Environments through Fuzzy Temporal Logic

    Get PDF
    Interaction analysis is defined as the generation of situation descriptions from machine perception. World models created through machine perception are used by a reasoning engine based on fuzzy metric temporal logic and situation graph trees, with optional parameter learning and clustering as preprocessing, to deduce knowledge about the observed scene. The system is evaluated in a case study on automatic behavior report generation for staff training purposes in crisis response control rooms

    DFKI publications : the first four years ; 1990 - 1993

    Get PDF

    Proceedings of the 1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020)

    Get PDF
    1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020), 29-30 August, 2020 Santiago de Compostela, SpainThe DC-ECAI 2020 provides a unique opportunity for PhD students, who are close to finishing their doctorate research, to interact with experienced researchers in the field. Senior members of the community are assigned as mentors for each group of students based on the student’s research or similarity of research interests. The DC-ECAI 2020, which is held virtually this year, allows students from all over the world to present their research and discuss their ongoing research and career plans with their mentor, to do networking with other participants, and to receive training and mentoring about career planning and career option

    A Survey on Visual Surveillance of Object Motion and Behaviors

    Full text link

    Implementation, integration, and optimization of a fuzzy foreground segmentation system

    Get PDF
    Foreground segmentation is often an important preliminary step for various video processing systems. By improving the accuracy of the foreground segmentation process, the overall performance of a video processing system has the potential for improvement. This work introduces a Fuzzy Foreground Segmentation System (FFSS) that uses Mamdani-type Fuzzy Inference Systems (FIS) to control pixel-level accumulated statistics. The error of the FFSS is quantified by comparing its output with hand-segmented ground-truth images from a set of image sequences that specifically model canonical problems of foreground segmentation. Optimization of the FFSS parameters is achieved using a Real-Coded Genetic Algorithm (RCGA). Additionally, multiple central composite designed experiments used to analyze the performance of RCGA under selected schemes and their respective parameters. The RCGA schemes and parameters are chosen as to reduce variation and execution time for a set of known multi-dimensional test functions. The selected multi-dimensional test functions represent assorted function landscapes. To demonstrate accuracy of the FFSS and implicate the importance of the foreground segmentation process, the system is applied to real-time human detection from a single-camera security system. The Human Detection System (HDS) is composed of an IP Camera networked to multiple heterogeneous computers for distributed parallel processing. The implementation of the HDS, adheres to a System of Systems (SoS) architecture which standardizes data/communication, reduces overall complexity, and maintains a high level of interoperability
    • 

    corecore