4,999 research outputs found
How does the primate brain combine generative and discriminative computations in vision?
Vision is widely understood as an inference problem. However, two contrasting
conceptions of the inference process have each been influential in research on
biological vision as well as the engineering of machine vision. The first
emphasizes bottom-up signal flow, describing vision as a largely feedforward,
discriminative inference process that filters and transforms the visual
information to remove irrelevant variation and represent behaviorally relevant
information in a format suitable for downstream functions of cognition and
behavioral control. In this conception, vision is driven by the sensory data,
and perception is direct because the processing proceeds from the data to the
latent variables of interest. The notion of "inference" in this conception is
that of the engineering literature on neural networks, where feedforward
convolutional neural networks processing images are said to perform inference.
The alternative conception is that of vision as an inference process in
Helmholtz's sense, where the sensory evidence is evaluated in the context of a
generative model of the causal processes giving rise to it. In this conception,
vision inverts a generative model through an interrogation of the evidence in a
process often thought to involve top-down predictions of sensory data to
evaluate the likelihood of alternative hypotheses. The authors include
scientists rooted in roughly equal numbers in each of the conceptions and
motivated to overcome what might be a false dichotomy between them and engage
the other perspective in the realm of theory and experiment. The primate brain
employs an unknown algorithm that may combine the advantages of both
conceptions. We explain and clarify the terminology, review the key empirical
evidence, and propose an empirical research program that transcends the
dichotomy and sets the stage for revealing the mysterious hybrid algorithm of
primate vision
3D Scene Geometry Estimation from 360 Imagery: A Survey
This paper provides a comprehensive survey on pioneer and state-of-the-art 3D
scene geometry estimation methodologies based on single, two, or multiple
images captured under the omnidirectional optics. We first revisit the basic
concepts of the spherical camera model, and review the most common acquisition
technologies and representation formats suitable for omnidirectional (also
called 360, spherical or panoramic) images and videos. We then survey
monocular layout and depth inference approaches, highlighting the recent
advances in learning-based solutions suited for spherical data. The classical
stereo matching is then revised on the spherical domain, where methodologies
for detecting and describing sparse and dense features become crucial. The
stereo matching concepts are then extrapolated for multiple view camera setups,
categorizing them among light fields, multi-view stereo, and structure from
motion (or visual simultaneous localization and mapping). We also compile and
discuss commonly adopted datasets and figures of merit indicated for each
purpose and list recent results for completeness. We conclude this paper by
pointing out current and future trends.Comment: Published in ACM Computing Survey
Computer Vision from Spatial-Multiplexing Cameras at Low Measurement Rates
abstract: In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the emergence of novel imagers called spatial-multiplexing cameras, which offer compression at the sensing level itself by providing an arbitrary linear measurements of the scene instead of pixel-based sampling. In this dissertation, I discuss various approaches for effective information extraction from spatial-multiplexing measurements and present the trade-offs between reliability of the performance and computational/storage load of the system. In the first part, I present a reconstruction-free approach to high-level inference in computer vision, wherein I consider the specific case of activity analysis, and show that using correlation filters, one can perform effective action recognition and localization directly from a class of spatial-multiplexing cameras, called compressive cameras, even at very low measurement rates of 1\%. In the second part, I outline a deep learning based non-iterative and real-time algorithm to reconstruct images from compressively sensed (CS) measurements, which can outperform the traditional iterative CS reconstruction algorithms in terms of reconstruction quality and time complexity, especially at low measurement rates. To overcome the limitations of compressive cameras, which are operated with random measurements and not particularly tuned to any task, in the third part of the dissertation, I propose a method to design spatial-multiplexing measurements, which are tuned to facilitate the easy extraction of features that are useful in computer vision tasks like object tracking. The work presented in the dissertation provides sufficient evidence to high-level inference in computer vision at extremely low measurement rates, and hence allows us to think about the possibility of revamping the current day computer systems.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
FReSCO: Flow Reconstruction and Segmentation for low-latency Cardiac Output monitoring using deep artifact suppression and segmentation
Purpose: Real-time monitoring of cardiac output (CO) requires low-latency reconstruction and segmentation of real-time phase-contrast MR, which has previously been difficult to perform. Here we propose a deep learning framework for âFReSCOâ (Flow Reconstruction and Segmentation for low latency Cardiac Output monitoring). Methods: Deep artifact suppression and segmentation U-Nets were independently trained. Breath-hold spiral phase-contrast MR data (N = 516) were synthetically undersampled using a variable-density spiral sampling pattern and gridded to create aliased data for training of the artifact suppression U-net. A subset of the data (N = 96) was segmented and used to train the segmentation U-net. Real-time spiral phase-contrast MR was prospectively acquired and then reconstructed and segmented using the trained models (FReSCO) at low latency at the scanner in 10 healthy subjects during rest, exercise, and recovery periods. Cardiac output obtained via FReSCO was compared with a reference rest CO and rest and exercise compressed-sensing CO. Results: The FReSCO framework was demonstrated prospectively at the scanner. Beat-to-beat heartrate, stroke volume, and CO could be visualized with a mean latency of 622 ms. No significant differences were noted when compared with reference at rest (bias = â0.21 ± 0.50 L/min, p = 0.246) or compressed sensing at peak exercise (bias = 0.12 ± 0.48 L/min, p = 0.458). Conclusions: The FReSCO framework was successfully demonstrated for real-time monitoring of CO during exercise and could provide a convenient tool for assessment of the hemodynamic response to a range of stressors
Convolutional Sparse Support Estimator Network (CSEN) From energy efficient support estimation to learning-aided Compressive Sensing
Support estimation (SE) of a sparse signal refers to finding the location
indices of the non-zero elements in a sparse representation. Most of the
traditional approaches dealing with SE problem are iterative algorithms based
on greedy methods or optimization techniques. Indeed, a vast majority of them
use sparse signal recovery techniques to obtain support sets instead of
directly mapping the non-zero locations from denser measurements (e.g.,
Compressively Sensed Measurements). This study proposes a novel approach for
learning such a mapping from a training set. To accomplish this objective, the
Convolutional Support Estimator Networks (CSENs), each with a compact
configuration, are designed. The proposed CSEN can be a crucial tool for the
following scenarios: (i) Real-time and low-cost support estimation can be
applied in any mobile and low-power edge device for anomaly localization,
simultaneous face recognition, etc. (ii) CSEN's output can directly be used as
"prior information" which improves the performance of sparse signal recovery
algorithms. The results over the benchmark datasets show that state-of-the-art
performance levels can be achieved by the proposed approach with a
significantly reduced computational complexity
Convolutional Sparse Support Estimator Based Covid-19 Recognition from X-ray Images
Coronavirus disease (Covid-19) has been the main agenda of the whole world
since it came in sight in December 2019. It has already caused thousands of
causalities and infected several millions worldwide. Any technological tool
that can be provided to healthcare practitioners to save time, effort, and
possibly lives has crucial importance. The main tools practitioners currently
use to diagnose Covid-19 are Reverse Transcription-Polymerase Chain reaction
(RT-PCR) and Computed Tomography (CT), which require significant time,
resources and acknowledged experts. X-ray imaging is a common and easily
accessible tool that has great potential for Covid-19 diagnosis. In this study,
we propose a novel approach for Covid-19 recognition from chest X-ray images.
Despite the importance of the problem, recent studies in this domain produced
not so satisfactory results due to the limited datasets available for training.
Recall that Deep Learning techniques can generally provide state-of-the-art
performance in many classification tasks when trained properly over large
datasets, such data scarcity can be a crucial obstacle when using them for
Covid-19 detection. Alternative approaches such as representation-based
classification (collaborative or sparse representation) might provide
satisfactory performance with limited size datasets, but they generally fall
short in performance or speed compared to Machine Learning methods. To address
this deficiency, Convolution Support Estimation Network (CSEN) has recently
been proposed as a bridge between model-based and Deep Learning approaches by
providing a non-iterative real-time mapping from query sample to ideally sparse
representation coefficient' support, which is critical information for class
decision in representation based techniques.Comment: 10 page
- âŠ