594 research outputs found

    Multi-focus image fusion based on non-negative sparse representation and patch-level consistency rectification

    Get PDF
    Most existing sparse representation-based (SR) fusion methods consider the local information of each image patch independently during fusion. Some spatial artifacts are easily introduced to the fused image. A sliding window technology is often employed by these methods to overcome this issue. However, this comes at the cost of high computational complexity. Alternatively, we come up with a novel multi-focus image fusion method that takes full consideration of the strong correlations among spatially adjacent image patches with NO need for a sliding window. To this end, a non-negative SR model with local consistency constraint (CNNSR) on the representation coefficients is first constructed to encode each image patch. Then a patch-level consistency rectification strategy is presented to merge the input image patches, by which the spatial artifacts in the fused images are greatly reduced. As well, a compact non-negative dictionary is constructed for the CNNSR model. Experimental results demonstrate that the proposed fusion method outperforms some state-of-the art methods. Moreover, the proposed method is computationally efficient, thereby facilitating real-world applications

    A New Robust Multi focus image fusion Method

    Get PDF
    In today's digital era, multi focus picture fusion is a critical problem in the field of computational image processing. In the field of fusion information, multi-focus picture fusion has emerged as a significant research subject. The primary objective of multi focus image fusion is to merge graphical information from several images with various focus points into a single image with no information loss. We provide a robust image fusion method that can combine two or more degraded input photos into a single clear resulting output image with additional detailed information about the fused input images. The targeted item from each of the input photographs is combined to create a secondary image output. The action level quantities and the fusion rule are two key components of picture fusion, as is widely acknowledged. The activity level values are essentially implemented in either the "spatial domain" or the "transform domain" in most common fusion methods, such as wavelet. The brightness information computed from various source photos is compared to the laws developed to produce brightness / focus maps by using local filters to extract high-frequency characteristics. As a result, the focus map provides integrated clarity information, which is useful for a variety of Multi focus picture fusion problems. Image fusion with several modalities, for example. Completing these two jobs, on the other hand. As a consequence, we offer a strategy for achieving good fusion performance in this study paper. A Convolutional Neural Network (CNN) was trained on both high-quality and blurred picture patches to represent the mapping. The main advantage of this idea is that it can create a CNN model that can provide both the Activity level Measurement" and the Fusion rule, overcoming the limitations of previous fusion procedures. Multi focus image fusion is demonstrated using microscopic images, medical imaging, computer visualization, and Image information improvement is also a benefit of multi-focus image fusion. Greater precision is necessary in terms of target detection and identification. Face recognition" and a more compact work load, as well as enhanced system consistency, are among the new features

    Combining Features and Semantics for Low-level Computer Vision

    Get PDF
    Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem VerstĂ€ndnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die SchĂ€tzung der Bewegung von Videokameras sind von grĂ¶ĂŸter Bedeutung fĂŒr Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfĂŒr sind reflektierende und texturlose OberflĂ€chen oder große Bewegungen, bei denen herkömmliche lokale Methoden hĂ€ufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige LichtverhĂ€ltnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. FĂŒr die binokulare Stereo SchĂ€tzung schlagen wir zuallererst vor zusammenhĂ€ngende Bereiche mit objektklassen-spezifischen DisparitĂ€ts VorschlĂ€gen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spĂ€rlichen DisparitĂ€tsschĂ€tzung und semantischen Segmentierung des Bildes erhalten. Die DisparitĂ€ts VorschlĂ€ge kodieren die Tatsache, dass die GegenstĂ€nde bestimmter Kategorien nicht willkĂŒrlich geformt sind, sondern typischerweise regelmĂ€ĂŸige Strukturen aufweisen. Wir integrieren sie fĂŒr die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir fĂŒr die 3D-Rekonstruktion die Tatsache, dass mit der GrĂ¶ĂŸe der rekonstruierten FlĂ€che auch die Wahrscheinlichkeit steigt, Objekte von Ă€hnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders fĂŒr Szenen im Freien, in denen GebĂ€ude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber Ă€hnlichkeit in der Form aufweisen. Wir nutzen diese Ă€hnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, wĂ€hrend fehlende FlĂ€chen vervollstĂ€ndigt werden, da Objekte Ă€hnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstĂ€dtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen AbhĂ€ngigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, prĂ€sentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusĂ€tzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. FĂŒr das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm fĂŒr eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes fĂŒr das Feature Matching

    CNN Features off-the-shelf: an Astounding Baseline for Recognition

    Full text link
    Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the \overfeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the \overfeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the \overfeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.Comment: version 3 revisions: 1)Added results using feature processing and data augmentation 2)Referring to most recent efforts of using CNN for different visual recognition tasks 3) updated text/captio

    THE ROLE OF TEXTURE IN INDOOR SCENE RECOGNITION

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Multi-task near-field perception for autonomous driving using surround-view fisheye cameras

    Get PDF
    Die Bildung der Augen fĂŒhrte zum Urknall der Evolution. Die Dynamik Ă€nderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch MĂ€ngel. Der Mensch hat ĂŒber Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser FĂ€higkeiten fĂŒr Computer ist entscheidend fĂŒr verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented RealitĂ€t und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360° Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. JĂŒngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen fĂŒr die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die fĂŒr die Entwicklung von Echtzeit-Anwendungen zur VerfĂŒgung steht. Aufgrund dieses Engpasses kommt es hĂ€ufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer RechenkomplexitĂ€t fĂŒr verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Überwindung von RechenengpĂ€ssen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks
    • 

    corecore