12,302 research outputs found

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Working memory revived in older adults by synchronizing rhythmic brain circuits

    Full text link
    Published in final edited form as: Nat Neurosci. 2019 May ; 22(5): 820–827. doi:10.1038/s41593-019-0371-x.Understanding normal brain aging and developing methods to maintain or improve cognition in older adults are major goals of fundamental and translational neuroscience. Here we show a core feature of cognitive decline—working-memory deficits—emerges from disconnected local and long-range circuits instantiated by theta–gamma phase–amplitude coupling in temporal cortex and theta phase synchronization across frontotemporal cortex. We developed a noninvasive stimulation procedure for modulating long-range theta interactions in adults aged 60–76 years. After 25 min of stimulation, frequency-tuned to individual brain network dynamics, we observed a preferential increase in neural synchronization patterns and the return of sender–receiver relationships of information flow within and between frontotemporal regions. The end result was rapid improvement in working-memory performance that outlasted a 50 min post-stimulation period. The results provide insight into the physiological foundations of age-related cognitive impairment and contribute to groundwork for future non-pharmacological interventions targeting aspects of cognitive decline.Accepted manuscrip

    Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

    Full text link
    Current deep-learning models for object recognition are known to be heavily biased toward texture. In contrast, human visual systems are known to be biased toward shape and structure. What could be the design principles in human visual systems that led to this difference? How could we introduce more shape bias into the deep learning models? In this paper, we report that sparse coding, a ubiquitous principle in the brain, can in itself introduce shape bias into the network. We found that enforcing the sparse coding constraint using a non-differential Top-K operation can lead to the emergence of structural encoding in neurons in convolutional neural networks, resulting in a smooth decomposition of objects into parts and subparts and endowing the networks with shape bias. We demonstrated this emergence of shape bias and its functional benefits for different network structures with various datasets. For object recognition convolutional neural networks, the shape bias leads to greater robustness against style and pattern change distraction. For the image synthesis generative adversary networks, the emerged shape bias leads to more coherent and decomposable structures in the synthesized images. Ablation studies suggest that sparse codes tend to encode structures, whereas the more distributed codes tend to favor texture. Our code is host at the github repository: \url{https://github.com/Crazy-Jack/nips2023_shape_vs_texture}Comment: Published as NeurIPS 2023 (Oral

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version

    Modelling visual search for surface defects

    Get PDF
    Much work has been done on developing algorithms for automated surface defect detection. However, comparisons between these models and human perception are rarely carried out. This thesis aims to investigate how well human observers can nd defects in textured surfaces, over a wide range of task di culties. Stimuli for experiments will be generated using texture synthesis methods and human search strategies will be captured by use of an eye tracker. Two di erent modelling approaches will be explored. A computational LNL-based model will be developed and compared to human performance in terms of the number of xations required to find the target. Secondly, a stochastic simulation, based on empirical distributions of saccades, will be compared to human search strategies

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Increasing the performance and realism of procedurally generated buildings

    Get PDF
    As multimedia such as games and movies grow, so does the need for content. Textures, 3D models, expansive terrain, sound effects, and other data must be generated to support and enrich these multimedia productions. As this need for content continues to grow, two critical problems emerge: the cost of hiring artists to create the content becomes extremely large, as does the amount of memory needed to store and manipulate the content.;To combat these issues, procedural content generation, or content generated algorithmically rather than via an artist, has been introduced. Algorithmically generating content allows for rapid creation of large amounts of certain classes of content with little human effort; further, this content can be represented extremely compactly, often by only exposing a handful of parameters.;In the realm of 3D building generation, split grammars have proven useful for generating a wide variety of buildings while being relatively intuitive. These split grammars have been used to generate entire cities full of detailed buildings with a fairly small number of rules.;Split grammars have two important areas which can be expanded upon: first, the writing of an appropriate grammar can require a significant amount of work and knowledge, especially when a grammar is required that must follow a certain building style while providing a high degree of variation. Second, applying these grammars to produce a building can be slow, often requiring an offline pregeneration phase which eliminates the usefulness the size benefits of the grammar\u27s compactness.;For the first problem, we propose a data mining approach to refining preexisting grammars, wherein a user can specify buildings which they prefer, and from these preferences a set of rules will be generated that will guide future building generation. We will show that the generated rules have a high degree of accuracy when used to predict whether a user will like or dislike a building, often in the upper 90%.;For the second problem, we provide two areas of improvement: a preprocessing step which parses a split grammar to make it easier and more efficient to apply the grammar without loss of generality, and a scheme that allows the execution of a grammar entirely within a geometry shader on a modern graphics processing unit (GPU) such that building generation can take advantage of the parallelization found on modern graphics cards. We will show that this second improvement can provide a speed benefit anywhere between 3 and 10 times a purely CPU approach, with further speed benefits possible depending on the nature of the grammars

    The Chemical Senses

    Get PDF
    Long-standing neglect of the chemical senses in the philosophy of perception is due, mostly, to their being regarded as ‘lower’ senses. Smell, taste, and chemically irritated touch are thought to produce mere bodily sensations. However, empirically informed theories of perception can show how these senses lead to perception of objective properties, and why they cannot be treated as special cases of perception modelled on vision. The senses of taste, touch, and smell also combine to create unified perceptions of flavour. The nature of these multimodal experiences and the character of our awareness of them puts pressure on the traditional idea that each episode of perception goes one or other of the five senses. Thus, the chemical senses, far from being peripheral to the concerns of the philosophy of perception, may hold important clues to the multisensory nature of perception in general
    • …
    corecore