106,645 research outputs found

    Bio-Inspired Stereo Vision Calibration for Dynamic Vision Sensors

    Get PDF
    Many advances have been made in the eld of computer vision. Several recent research trends have focused on mimicking human vision by using a stereo vision system. In multi-camera systems, a calibration process is usually implemented to improve the results accuracy. However, these systems generate a large amount of data to be processed; therefore, a powerful computer is required and, in many cases, this cannot be done in real time. Neuromorphic Engineering attempts to create bio-inspired systems that mimic the information processing that takes place in the human brain. This information is encoded using pulses (or spikes) and the generated systems are much simpler (in computational operations and resources), which allows them to perform similar tasks with much lower power consumption, thus these processes can be developed over specialized hardware with real-time processing. In this work, a bio-inspired stereovision system is presented, where a calibration mechanism for this system is implemented and evaluated using several tests. The result is a novel calibration technique for a neuromorphic stereo vision system, implemented over specialized hardware (FPGA - Field-Programmable Gate Array), which allows obtaining reduced latencies on hardware implementation for stand-alone systems, and working in real time.Ministerio de Economía y Competitividad TEC2016-77785-PMinisterio de Economía y Competitividad TIN2016-80644-

    Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding

    Full text link
    Decoding visual stimuli from brain recordings aims to deepen our understanding of the human visual system and build a solid foundation for bridging human and computer vision through the Brain-Computer Interface. However, reconstructing high-quality images with correct semantics from brain recordings is a challenging problem due to the complex underlying representations of brain signals and the scarcity of data annotations. In this work, we present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding. Firstly, we learn an effective self-supervised representation of fMRI data using mask modeling in a large latent space inspired by the sparse coding of information in the primary visual cortex. Then by augmenting a latent diffusion model with double-conditioning, we show that MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations. We benchmarked our model qualitatively and quantitatively; the experimental results indicate that our method outperformed state-of-the-art in both semantic mapping (100-way semantic classification) and generation quality (FID) by 66% and 41% respectively. An exhaustive ablation study was also conducted to analyze our framework.Comment: 8 pages, 9 figures, 2 tables, accepted by CVPR2023, see https://mind-vis.github.io/ for more informatio

    IEEE Access Special Section Editorial: Biologically Inspired Image Processing Challenges and Future Directions

    Get PDF
    Human kind is exposed to large amounts of data. According to statistics, more than 80% of information received by humans comes from the visual system. Therefore, image information processing is not only an important research topic but also a challenging task. The unique information processing mechanism of the human visual system provides it with fast, accurate, and efficient image processing capabilities. At present, many advanced image analysis and processing techniques have been widely used in image communication, geographic information systems, medical image analysis, and virtual reality. However, there is still a large gap between these technologies and the human visual system. Therefore, building an image system research mechanism based on the biological vision system is an attractive but difficult target. Although it is a challenge, it can also be considered as an opportunity which utilizes biologically inspired ideas. Meanwhile, through the integration of neural biology, biological perception mechanisms, and computer science and mathematical science, related research can bridge biological vision and computer vision. Finally, the biologically inspired image analysis and processing system is expected to be built on the basis of further consideration of the learning mechanism of the human brain

    On the AER Stereo-Vision Processing: A Spike Approach to Epipolar Matching

    Get PDF
    Image processing in digital computer systems usually considers visual information as a sequence of frames. These frames are from cameras that capture reality for a short period of time. They are renewed and transmitted at a rate of 25-30 fps (typical real-time scenario). Digital video processing has to process each frame in order to detect a feature on the input. In stereo vision, existing algorithms use frames from two digital cameras and process them pixel by pixel until it finds a pattern match in a section of both stereo frames. To process stereo vision information, an image matching process is essential, but it needs very high computational cost. Moreover, as more information is processed, the more time spent by the matching algorithm, the more inefficient it is. Spike-based processing is a relatively new approach that implements processing by manipulating spikes one by one at the time they are transmitted, like a human brain. The mammal nervous system is able to solve much more complex problems, such as visual recognition by manipulating neuron’s spikes. The spike-based philosophy for visual information processing based on the neuro-inspired Address-Event- Representation (AER) is achieving nowadays very high performances. The aim of this work is to study the viability of a matching mechanism in a stereo-vision system, using AER codification. This kind of mechanism has not been done before to an AER system. To do that, epipolar geometry basis applied to AER system are studied, and several tests are run, using recorded data and a computer. The results and an average error are shown (error less than 2 pixels per point); and the viability is proved

    Stereo Matching in Address-Event-Representation (AER) Bio-Inspired Binocular Systems in a Field-Programmable Gate Array (FPGA)

    Get PDF
    In stereo-vision processing, the image-matching step is essential for results, although it involves a very high computational cost. Moreover, the more information is processed, the more time is spent by the matching algorithm, and the more ine cient it is. Spike-based processing is a relatively new approach that implements processing methods by manipulating spikes one by one at the time they are transmitted, like a human brain. The mammal nervous system can solve much more complex problems, such as visual recognition by manipulating neuron spikes. The spike-based philosophy for visual information processing based on the neuro-inspired address-event-representation (AER) is currently achieving very high performance. The aim of this work was to study the viability of a matching mechanism in stereo-vision systems, using AER codification and its implementation in a field-programmable gate array (FPGA). Some studies have been done before in an AER system with monitored data using a computer; however, this kind of mechanism has not been implemented directly on hardware. To this end, an epipolar geometry basis applied to AER systems was studied and implemented, with other restrictions, in order to achieve good results in a real-time scenario. The results and conclusions are shown, and the viability of its implementation is proven.Ministerio de Economía y Competitividad TEC2016-77785-

    Machine Vision System to Induct Binocular Wide-Angle Foveated Information into Both the Human and Computers - Feature Generation Algorithm based on DFT for Binocular Fixation

    Get PDF
    This paper introduces a machine vision system, which is suitable for cooperative works between the human and computer. This system provides images inputted from a stereo camera head not only to the processor but also to the user’s sight as binocular wide-angle foveated (WAF) information, thus it is applicable for Virtual Reality (VR) systems such as tele-existence or training experts. The stereo camera head plays a role to get required input images foveated by special wide-angle optics under camera view direction control and 3D head mount display (HMD) displays fused 3D images to the user. Moreover, an analog video signal processing device much inspired from a structure of the human visual system realizes a unique way to provide WAF information to plural processors and the user. Therefore, this developed vision system is also much expected to be applicable for the human brain and vision research, because the design concept is to mimic the human visual system. Further, an algorithm to generate features using Discrete Fourier Transform (DFT) for binocular fixation in order to provide well-fused 3D images to 3D HMD is proposed. This paper examines influences of applying this algorithm to space variant images such as WAF images, based on experimental results

    Ordinal Shape Coding and Correlation for Orientation-invariant 2D Shape Matching

    Get PDF
    The human brain and visual system is highly robust and efficient at recognising objects. Although biologically inspired approaches within the field of Computer Vision are often considered as state of the art, a complete understanding of how the brain and visual system works has not yet been unlocked. Benefits of such an understanding are twofold with respect to Computer Vision: firstly, a more robust object recognition system could be produced and secondly a computer architecture as efficient as the brain and visual system would significantly reduce power requirements. Therefore it is worthy to pursue and evaluate biologically inspired theories of object recognition. This engineering doctorate thesis provides an implementation and evaluation of a biologically inspired theory of object recognition called Ordinal Shape Coding and Correlation (OSCC). The theory is underpinned by relative coding and correlation within the human brain and visual system. A derivation of the theory is illustrated with respect to an implementation alongside proposed extensions. As a result, a hierarchical sequence alignment method is proposed for the correlation of multi- dimensional ordinal shape descriptors for the context of orientation-invariant 2D shape descriptor matching. Orientation-invariant 2D shape descriptor matching evaluations are presented which cover both synthetic data and the public MNIST handwritten digits dataset. Synthetic data evaluations show that the proposed OSCC method can be used as a discriminative orientation-invariant 2D shape descriptor. Furthermore, it is shown that the close competitor Shape Context (SC) method outperforms the OSCC method when applied to the MNIST handwritten digits dataset. However, it is shown that OSCC outperforms the SC method when appearance and bending energy costs are removed from the SC method to compare pure shape descriptors. Future work proposes that bending energy and appearance costs are integrated into the OSCC pipeline for further OCR evaluations

    A Self Organizing Maps Approach to Segmenting Tumors in Computed Tomography (CAT) and Magnetic Resonance Imaging (MRI) Scans

    Get PDF
    Studies and explorations of human visual perception have been the main source of inspiration for computer vision algorithms. Understanding how the human brain represents basic attributes of objects helps in developing computer vision algorithms for automatic object interpretation and understanding. Human visual perception is based on the neural coding of fundamental features, such as object boundaries, color, orientation, shape, etc. Thus, finding the contours and boundaries of objects provides the first step for object recognition and interpretation. Form here, the idea of this research inspired to introduce an automatic boundary detection technique based on active contours that is designed to detect the contours of abnormalities in X-ray and MRI imagery. Our research is aimed to aid healthcare professionals to sort and analyze large amount of imagery more effectively. Our segmentation algorithm incorporates prior information within segmentation framework to enhance the performance of object region and boundary extraction of defected tissue regions in medical imagery. We exploit Self Organizing Map (SOM) unsupervised neural network to train our prior information. One reason to prefer SOMs to other neural network models is the specific ability of SOMs to learn the intensity information via their topology preservation property. In addition, SOMs have several characteristics that make them pretty much similar to the way the human brain works. A dual self-organizing map approach is being used to learn the object of interest and the background independently in order to guide the active contour to extract the target region. The segmentation process is achieved by the construction of a level set cost function, in which, the dynamic variables are the Best Matching Units (BMU)s coming from the SOM maps. We evaluate our algorithm by comparing our detection results to the results of the manually segmented by health professionals.https://ecommons.udayton.edu/stander_posters/1648/thumbnail.jp

    Multilevel Large Language Models for Everyone

    Full text link
    Large language models have made significant progress in the past few years. However, they are either generic {\it or} field specific, splitting the community into different groups. In this paper, we unify these large language models into a larger map, where the generic {\it and} specific models are linked together and can improve each other, based on the user personal input and information from the internet. The idea of linking several large language models together is inspired by the functionality of human brain. The specific regions on the brain cortex are specific for certain low level functionality. And these regions can jointly work together to achieve more complex high level functionality. Such behavior on human brain cortex sheds the light to design the multilevel large language models that contain global level, field level and user level models. The user level models run on local machines to achieve efficient response and protect the user's privacy. Such multilevel models reduce some redundancy and perform better than the single level models. The proposed multilevel idea can be applied in various applications, such as natural language processing, computer vision tasks, professional assistant, business and healthcare
    corecore