106,645 research outputs found
Bio-Inspired Stereo Vision Calibration for Dynamic Vision Sensors
Many advances have been made in the eld of computer vision. Several recent research trends
have focused on mimicking human vision by using a stereo vision system. In multi-camera systems, a
calibration process is usually implemented to improve the results accuracy. However, these systems generate
a large amount of data to be processed; therefore, a powerful computer is required and, in many cases,
this cannot be done in real time. Neuromorphic Engineering attempts to create bio-inspired systems that
mimic the information processing that takes place in the human brain. This information is encoded using
pulses (or spikes) and the generated systems are much simpler (in computational operations and resources),
which allows them to perform similar tasks with much lower power consumption, thus these processes
can be developed over specialized hardware with real-time processing. In this work, a bio-inspired stereovision
system is presented, where a calibration mechanism for this system is implemented and evaluated
using several tests. The result is a novel calibration technique for a neuromorphic stereo vision system,
implemented over specialized hardware (FPGA - Field-Programmable Gate Array), which allows obtaining
reduced latencies on hardware implementation for stand-alone systems, and working in real time.Ministerio de Economía y Competitividad TEC2016-77785-PMinisterio de Economía y Competitividad TIN2016-80644-
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding
Decoding visual stimuli from brain recordings aims to deepen our
understanding of the human visual system and build a solid foundation for
bridging human and computer vision through the Brain-Computer Interface.
However, reconstructing high-quality images with correct semantics from brain
recordings is a challenging problem due to the complex underlying
representations of brain signals and the scarcity of data annotations. In this
work, we present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned
Latent Diffusion Model for Human Vision Decoding. Firstly, we learn an
effective self-supervised representation of fMRI data using mask modeling in a
large latent space inspired by the sparse coding of information in the primary
visual cortex. Then by augmenting a latent diffusion model with
double-conditioning, we show that MinD-Vis can reconstruct highly plausible
images with semantically matching details from brain recordings using very few
paired annotations. We benchmarked our model qualitatively and quantitatively;
the experimental results indicate that our method outperformed state-of-the-art
in both semantic mapping (100-way semantic classification) and generation
quality (FID) by 66% and 41% respectively. An exhaustive ablation study was
also conducted to analyze our framework.Comment: 8 pages, 9 figures, 2 tables, accepted by CVPR2023, see
https://mind-vis.github.io/ for more informatio
IEEE Access Special Section Editorial: Biologically Inspired Image Processing Challenges and Future Directions
Human kind is exposed to large amounts of data. According to statistics, more than 80% of information received by humans comes from the visual system. Therefore, image information processing is not only an important research topic but also a challenging task. The unique information processing mechanism of the human visual system provides it with fast, accurate, and efficient image processing capabilities. At present, many advanced image analysis and processing techniques have been widely used in image communication, geographic information systems, medical image analysis, and virtual reality. However, there is still a large gap between these technologies and the human visual system. Therefore, building an image system research mechanism based on the biological vision system is an attractive but difficult target. Although it is a challenge, it can also be considered as an opportunity which utilizes biologically inspired ideas. Meanwhile, through the integration of neural biology, biological perception mechanisms, and computer science and mathematical science, related research can bridge biological vision and computer vision. Finally, the biologically inspired image analysis and processing system is expected to be built on the basis of further consideration of the learning mechanism of the human brain
On the AER Stereo-Vision Processing: A Spike Approach to Epipolar Matching
Image processing in digital computer systems usually considers
visual information as a sequence of frames. These frames are from cameras that
capture reality for a short period of time. They are renewed and transmitted at a
rate of 25-30 fps (typical real-time scenario). Digital video processing has to
process each frame in order to detect a feature on the input. In stereo vision,
existing algorithms use frames from two digital cameras and process them pixel
by pixel until it finds a pattern match in a section of both stereo frames. To
process stereo vision information, an image matching process is essential, but it
needs very high computational cost. Moreover, as more information is
processed, the more time spent by the matching algorithm, the more inefficient
it is. Spike-based processing is a relatively new approach that implements
processing by manipulating spikes one by one at the time they are transmitted,
like a human brain. The mammal nervous system is able to solve much more
complex problems, such as visual recognition by manipulating neuron’s spikes.
The spike-based philosophy for visual information processing based on the
neuro-inspired Address-Event- Representation (AER) is achieving nowadays
very high performances. The aim of this work is to study the viability of a
matching mechanism in a stereo-vision system, using AER codification. This
kind of mechanism has not been done before to an AER system. To do that,
epipolar geometry basis applied to AER system are studied, and several tests
are run, using recorded data and a computer. The results and an average error
are shown (error less than 2 pixels per point); and the viability is proved
Stereo Matching in Address-Event-Representation (AER) Bio-Inspired Binocular Systems in a Field-Programmable Gate Array (FPGA)
In stereo-vision processing, the image-matching step is essential for results, although it
involves a very high computational cost. Moreover, the more information is processed, the more time
is spent by the matching algorithm, and the more ine cient it is. Spike-based processing is a relatively
new approach that implements processing methods by manipulating spikes one by one at the time
they are transmitted, like a human brain. The mammal nervous system can solve much more complex
problems, such as visual recognition by manipulating neuron spikes. The spike-based philosophy
for visual information processing based on the neuro-inspired address-event-representation (AER)
is currently achieving very high performance. The aim of this work was to study the viability of a
matching mechanism in stereo-vision systems, using AER codification and its implementation in
a field-programmable gate array (FPGA). Some studies have been done before in an AER system
with monitored data using a computer; however, this kind of mechanism has not been implemented
directly on hardware. To this end, an epipolar geometry basis applied to AER systems was studied
and implemented, with other restrictions, in order to achieve good results in a real-time scenario.
The results and conclusions are shown, and the viability of its implementation is proven.Ministerio de Economía y Competitividad TEC2016-77785-
Machine Vision System to Induct Binocular Wide-Angle Foveated Information into Both the Human and Computers - Feature Generation Algorithm based on DFT for Binocular Fixation
This paper introduces a machine vision system, which is suitable for cooperative works between the human and computer. This system provides images inputted from a stereo camera head not only to the processor but also to the user’s sight as binocular wide-angle foveated (WAF) information, thus it is applicable for Virtual Reality (VR) systems such as tele-existence or training experts. The stereo camera head plays a role to get required input images foveated by special wide-angle optics under camera view direction control and 3D head mount display (HMD) displays fused 3D images to the user. Moreover, an analog video signal processing device much inspired from a structure of the human visual system realizes a unique way to provide WAF information to plural processors and the user. Therefore, this developed vision system is also much expected to be applicable for the human brain and vision research, because the design concept is to mimic the human visual system. Further, an algorithm to generate features using Discrete Fourier Transform (DFT) for binocular fixation in order to provide well-fused 3D images to 3D HMD is proposed. This paper examines influences of applying this algorithm to space variant images such as WAF images, based on experimental results
Ordinal Shape Coding and Correlation for Orientation-invariant 2D Shape Matching
The human brain and visual system is highly robust and efficient at recognising objects.
Although biologically inspired approaches within the field of Computer Vision are
often considered as state of the art, a complete understanding of how the brain and
visual system works has not yet been unlocked. Benefits of such an understanding are
twofold with respect to Computer Vision: firstly, a more robust object recognition
system could be produced and secondly a computer architecture as efficient as the
brain and visual system would significantly reduce power requirements. Therefore it
is worthy to pursue and evaluate biologically inspired theories of object recognition.
This engineering doctorate thesis provides an implementation and evaluation of a
biologically inspired theory of object recognition called Ordinal Shape Coding and
Correlation (OSCC). The theory is underpinned by relative coding and correlation
within the human brain and visual system. A derivation of the theory is illustrated
with respect to an implementation alongside proposed extensions. As a result, a
hierarchical sequence alignment method is proposed for the correlation of multi-
dimensional ordinal shape descriptors for the context of orientation-invariant 2D shape
descriptor matching.
Orientation-invariant 2D shape descriptor matching evaluations are presented
which cover both synthetic data and the public MNIST handwritten digits dataset.
Synthetic data evaluations show that the proposed OSCC method can be used as a
discriminative orientation-invariant 2D shape descriptor. Furthermore, it is shown that
the close competitor Shape Context (SC) method outperforms the OSCC method when
applied to the MNIST handwritten digits dataset. However, it is shown that OSCC
outperforms the SC method when appearance and bending energy costs are removed
from the SC method to compare pure shape descriptors. Future work proposes that
bending energy and appearance costs are integrated into the OSCC pipeline for further
OCR evaluations
A Self Organizing Maps Approach to Segmenting Tumors in Computed Tomography (CAT) and Magnetic Resonance Imaging (MRI) Scans
Studies and explorations of human visual perception have been the main source of inspiration for computer vision algorithms. Understanding how the human brain represents basic attributes of objects helps in developing computer vision algorithms for automatic object interpretation and understanding. Human visual perception is based on the neural coding of fundamental features, such as object boundaries, color, orientation, shape, etc. Thus, finding the contours and boundaries of objects provides the first step for object recognition and interpretation. Form here, the idea of this research inspired to introduce an automatic boundary detection technique based on active contours that is designed to detect the contours of abnormalities in X-ray and MRI imagery. Our research is aimed to aid healthcare professionals to sort and analyze large amount of imagery more effectively. Our segmentation algorithm incorporates prior information within segmentation framework to enhance the performance of object region and boundary extraction of defected tissue regions in medical imagery. We exploit Self Organizing Map (SOM) unsupervised neural network to train our prior information. One reason to prefer SOMs to other neural network models is the specific ability of SOMs to learn the intensity information via their topology preservation property. In addition, SOMs have several characteristics that make them pretty much similar to the way the human brain works. A dual self-organizing map approach is being used to learn the object of interest and the background independently in order to guide the active contour to extract the target region. The segmentation process is achieved by the construction of a level set cost function, in which, the dynamic variables are the Best Matching Units (BMU)s coming from the SOM maps. We evaluate our algorithm by comparing our detection results to the results of the manually segmented by health professionals.https://ecommons.udayton.edu/stander_posters/1648/thumbnail.jp
Multilevel Large Language Models for Everyone
Large language models have made significant progress in the past few years.
However, they are either generic {\it or} field specific, splitting the
community into different groups. In this paper, we unify these large language
models into a larger map, where the generic {\it and} specific models are
linked together and can improve each other, based on the user personal input
and information from the internet. The idea of linking several large language
models together is inspired by the functionality of human brain. The specific
regions on the brain cortex are specific for certain low level functionality.
And these regions can jointly work together to achieve more complex high level
functionality. Such behavior on human brain cortex sheds the light to design
the multilevel large language models that contain global level, field level and
user level models. The user level models run on local machines to achieve
efficient response and protect the user's privacy. Such multilevel models
reduce some redundancy and perform better than the single level models. The
proposed multilevel idea can be applied in various applications, such as
natural language processing, computer vision tasks, professional assistant,
business and healthcare
- …