812 research outputs found

    Cortical 3D Face Recognition Framework

    Get PDF
    Empirical studies concerning face recognition suggest that faces may be stored in memory by a few canonical representations. In cortical area V1 exist double-opponent colour blobs, also simple, complex and end-stopped cells which provide input for a multiscale line/edge representation, keypoints for dynamic routing and saliency maps for Focus-of-Attention. All these combined allow us to segregate faces. Events of different facial views are stored in memory and combined in order to identify the view and recognise the face including facial expression. In this paper we show that with five 2D views and their cortical representations it is possible to determine the left-right and frontal-lateral-profile views and to achieve view-invariant recognition of 3D faces

    A dynamic neural field approach to the covert and overt deployment of spatial attention

    Get PDF
    International audienceAbstract The visual exploration of a scene involves the in- terplay of several competing processes (for example to se- lect the next saccade or to keep fixation) and the integration of bottom-up (e.g. contrast) and top-down information (the target of a visual search task). Identifying the neural mech- anisms involved in these processes and in the integration of these information remains a challenging question. Visual attention refers to all these processes, both when the eyes remain fixed (covert attention) and when they are moving (overt attention). Popular computational models of visual attention consider that the visual information remains fixed when attention is deployed while the primates are executing around three saccadic eye movements per second, changing abruptly this information. We present in this paper a model relying on neural fields, a paradigm for distributed, asyn- chronous and numerical computations and show that covert and overt attention can emerge from such a substratum. We identify and propose a possible interaction of four elemen- tary mechanisms for selecting the next locus of attention, memorizing the previously attended locations, anticipating the consequences of eye movements and integrating bottom- up and top-down information in order to perform a visual search task with saccadic eye movements

    Convolutional Neural Networks for Named Entity Recognition in Images of Documents

    Get PDF
    This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates. For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance

    Goal Directed Visual Search Based on Color Cues: Co-operative Effectes of Top-Down & Bottom-Up Visual Attention

    Get PDF
    Focus of Attention plays an important role in perception of the visual environment. Certain objects stand out in the scene irrespective of observers\u27 goals. This form of attention capture, in which stimulus feature saliency captures our attention, is of a bottom-up nature. Often prior knowledge about objects and scenes can influence our attention. This form of attention capture, which is influenced by higher level knowledge about the objects, is called top-down attention. Top-down attention acts as a feedback mechanism for the feed-forward bottom-up attention. Visual search is a result of a combined effort of the top-down (cognitive cue) system and bottom-up (low level feature saliency) system. In my thesis I investigate the process of goal directed visual search based on color cue, which is a process of searching for objects of a certain color. The computational model generates saliency maps that predict the locations of interest during a visual search. Comparison between the model-generated saliency maps and the results of psychophysical human eye -tracking experiments was conducted. The analysis provides a measure of how well the human eye movements correspond with the predicted locations of the saliency maps. Eye tracking equipment in the Visual Perceptual Laboratory in the Center for Imaging Science was used to conduct the experiments

    Object Recognition

    Get PDF
    Vision-based object recognition tasks are very familiar in our everyday activities, such as driving our car in the correct lane. We do these tasks effortlessly in real-time. In the last decades, with the advancement of computer technology, researchers and application developers are trying to mimic the human's capability of visually recognising. Such capability will allow machine to free human from boring or dangerous jobs

    Project SEMACODE : a scale-invariant object recognition system for content-based queries in image databases

    Get PDF
    For the efficient management of large image databases, the automated characterization of images and the usage of that characterization for searching and ordering tasks is highly desirable. The purpose of the project SEMACODE is to combine the still unsolved problem of content-oriented characterization of images with scale-invariant object recognition and modelbased compression methods. To achieve this goal, existing techniques as well as new concepts related to pattern matching, image encoding, and image compression are examined. The resulting methods are integrated in a common framework with the aid of a content-oriented conception. For the application, an image database at the library of the university of Frankfurt/Main (StUB; about 60000 images), the required operations are developed. The search and query interfaces are defined in close cooperation with the StUB project “Digitized Colonial Picture Library”. This report describes the fundamentals and first results of the image encoding and object recognition algorithms developed within the scope of the project

    The Role of Early Recurrence in Improving Visual Representations

    Get PDF
    This dissertation proposes a computational model of early vision with recurrence, termed as early recurrence. The idea is motivated from the research of the primate vision. Specifically, the proposed model relies on the following four observations. 1) The primate visual system includes two main visual pathways: the dorsal pathway and the ventral pathway; 2) The two pathways respond to different visual features; 3) The neurons of the dorsal pathway conduct visual information faster than that of the neurons of the ventral pathway; 4) There are lower-level feedback connections from the dorsal pathway to the ventral pathway. As such, the primate visual system may implement a recurrent mechanism to improve visual representations of the ventral pathway. Our work starts from a comprehensive review of the literature, based on which a conceptualization of early recurrence is proposed. Early recurrence manifests itself as a form of surround suppression. We propose that early recurrence is capable of refining the ventral processing using results of the dorsal processing. Our work further defines a set of computational components to formalize early recurrence. Although we do not intend to model the true nature of biology, to verify that the proposed computation is biologically consistent, we have applied the model to simulate a neurophysiological experiment of a bar-and-checkerboard and a psychological experiment involving a moving contour illusion. Simulation results indicated that the proposed computation behaviourally reproduces the original observations. The ultimate goal of this work is to investigate whether the proposal is capable of improving computer vision applications. To do this, we have applied the model to a variety of applications, including visual saliency and contour detection. Based on comparisons against the state-of-the-art, we conclude that the proposed model of early recurrence sheds light on a generally applicable yet lightweight approach to boost real-life application performance
    corecore