209 research outputs found
Explicit Edge Inconsistency Evaluation Model for Color-Guided Depth Map Enhancement
© 2016 IEEE. Color-guided depth enhancement is used to refine depth maps according to the assumption that the depth edges and the color edges at the corresponding locations are consistent. In methods on such low-level vision tasks, the Markov random field (MRF), including its variants, is one of the major approaches that have dominated this area for several years. However, the assumption above is not always true. To tackle the problem, the state-of-the-art solutions are to adjust the weighting coefficient inside the smoothness term of the MRF model. These methods lack an explicit evaluation model to quantitatively measure the inconsistency between the depth edge map and the color edge map, so they cannot adaptively control the efforts of the guidance from the color image for depth enhancement, leading to various defects such as texture-copy artifacts and blurring depth edges. In this paper, we propose a quantitative measurement on such inconsistency and explicitly embed it into the smoothness term. The proposed method demonstrates promising experimental results compared with the benchmark and state-of-the-art methods on the Middlebury ToF-Mark, and NYU data sets
SC-Fuse: A Feature Fusion Approach for Unpaved Road Detection from Remotely Sensed Images
Road network extraction from remote sensing imagery is crucial for numerous applications, ranging from autonomous navigation to urban and rural planning. A particularly challenging aspect is the detection of unpaved roads, often underrepresented in research and data. These roads display variability in texture, width, shape, and surroundings, making their detection quite complex. This thesis addresses these challenges by creating a specialized dataset and introducing the SC-Fuse model.
Our custom dataset comprises high resolution remote sensing imagery which primarily targets unpaved roads of the American Midwest. To capture the diverse seasonal variation and their impact, the dataset includes images from different times of the year, capturing various weather conditions and offering a comprehensive view of these changing conditions.
To detect roads from our custom dataset we developed SC-Fuse model, a novel deep learning architecture designed to extract unpaved road networks from satellite imagery. This model leverages the strengths of dual feature extractors: the Swin Transformer and a Residual CNN. By combining features from these, SC-fuse captures the local as well as the global context of the images. The fusion of these features is done by a Feature Fusion Module which uses Linear Attention Mechanism, to optimize the computational efficiency. A LinkNet based decoder is used to ensure precise road network reconstruction. The evaluation of SC-Fuse model is done using various metrics, including qualitative visual assessments, to test its effectiveness in unpaved road detection.
Advisors: Ashok Samal and Cody Stoll
Rich probabilistic models for semantic labeling
Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung
On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey
Stereo matching is one of the longest-standing problems in computer vision
with close to 40 years of studies and research. Throughout the years the
paradigm has shifted from local, pixel-level decision to various forms of
discrete and continuous optimization to data-driven, learning-based methods.
Recently, the rise of machine learning and the rapid proliferation of deep
learning enhanced stereo matching with new exciting trends and applications
unthinkable until a few years ago. Interestingly, the relationship between
these two worlds is two-way. While machine, and especially deep, learning
advanced the state-of-the-art in stereo matching, stereo itself enabled new
ground-breaking methodologies such as self-supervised monocular depth
estimation based on deep networks. In this paper, we review recent research in
the field of learning-based depth estimation from single and binocular images
highlighting the synergies, the successes achieved so far and the open
challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial:
"Learning-based depth estimation from stereo and monocular images: successes,
limitations and future challenges"
(https://sites.google.com/view/cvpr-2019-depth-from-image/home
SM-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving
Semantic segmentation and stereo matching are two essential components of 3D
environmental perception systems for autonomous driving. Nevertheless,
conventional approaches often address these two problems independently,
employing separate models for each task. This approach poses practical
limitations in real-world scenarios, particularly when computational resources
are scarce or real-time performance is imperative. Hence, in this article, we
introduce SM-Net, a novel joint learning framework developed to perform
semantic segmentation and stereo matching simultaneously. Specifically,
SM-Net shares the features extracted from RGB images between both tasks,
resulting in an improved overall scene understanding capability. This feature
sharing process is realized using a feature fusion adaption (FFA) module, which
effectively transforms the shared features into semantic space and subsequently
fuses them with the encoded disparity features. The entire joint learning
framework is trained by minimizing a novel semantic consistency-guided (SCG)
loss, which places emphasis on the structural consistency in both tasks.
Extensive experimental results conducted on the vKITTI2 and KITTI datasets
demonstrate the effectiveness of our proposed joint learning framework and its
superior performance compared to other state-of-the-art single-task networks.
Our project webpage is accessible at mias.group/S3M-Net.Comment: accepted to IEEE Trans. on Intelligent Vehicles (T-IV
A Multiscale Pyramid Transform for Graph Signals
Multiscale transforms designed to process analog and discrete-time signals
and images cannot be directly applied to analyze high-dimensional data residing
on the vertices of a weighted graph, as they do not capture the intrinsic
geometric structure of the underlying graph data domain. In this paper, we
adapt the Laplacian pyramid transform for signals on Euclidean domains so that
it can be used to analyze high-dimensional data residing on the vertices of a
weighted graph. Our approach is to study existing methods and develop new
methods for the four fundamental operations of graph downsampling, graph
reduction, and filtering and interpolation of signals on graphs. Equipped with
appropriate notions of these operations, we leverage the basic multiscale
constructs and intuitions from classical signal processing to generate a
transform that yields both a multiresolution of graphs and an associated
multiresolution of a graph signal on the underlying sequence of graphs.Comment: 16 pages, 13 figure
Depth Super-Resolution with Hybrid Camera System
An important field of research in computer vision is the 3D analysis and reconstruction of objects and scenes. Currently, among all the the techniques for 3D acquisition, stereo vision systems are the most common. More recently, Time-of-Flight (ToF) range cameras have been introduced. The focus of this thesis is to combine the information from the ToF with one or two standard cameras, in order to obtain a high- resolution depth imageopenEmbargo per motivi di segretezza e/o di proprietĂ dei risultati e informazioni di enti esterni o aziende private che hanno partecipato alla realizzazione del lavoro di ricerca relativo alla tes
Recommended from our members
Automatic channel detection using deep learning
Picking 3D channel geobodies in seismic volumes is an important objective in seismic interpretation for hydrocarbon exploration. Manual detection of channel geobodies is a time-consuming and subjective process. The interpreter can calculate different seismic attributes such as coherence to aid for manual detection of channel geobodies in seismic volumes. However, these attributes still do not directly identify 3D channel geobodies.
Machine learning and deep learning are data-driven techniques that have been getting more attention recently in different fields, such as medical imaging and computer vision. With large volumes of available data in different types and a development of powerful computational resources, geophysics is a promising field for applying machine learning and deep learning. Many seismic interpretation steps are analogous to different problems in computer vision that have been solved successfully using deep learning. Channel detection in seismic volumes is analogous to segmentation problems for images. Applying deep learning to seismic interpretations, specifically to automatic channel detection in 3D seismic volumes, can make the process faster and the workflow less subjective. Decision-making based on interpretations is uncertain; so uncertainties in interpretation results are very important. Deep learning with different algorithms can also help interpreters quantify this uncertainty.Geological Science
- …