21 research outputs found

    Implementation of Depth Map Filtering on GPU

    Get PDF
    The thesis work was part of the Mobile 3DTV project which studied the capture, coding and transmission of 3D video representation formats in mobile delivery scenarios. The main focus of study was to determine if it was practical to transmit and view 3D videos on mobile devices. The chosen approach for virtual view synthesis was Depth Image Based Rendering (DIBR). The depth computed is often inaccurate, noisy, low in resolution or even inconsistent over a video sequence. Therefore, the sensed depth map has to be post-processed and refined through proper filtering. Bilateral filter was used for the iterative refinement process, using the information from one of the associated high quality texture (color) image (left or right view). The primary objective of this thesis was to perform the filtering operation in real-time. Therefore, we ported the algorithm to a GPU. As for the programming platform we chose OpenCL from the Khronos Group. The reason was that the platform is capable of programming on heterogeneous parallel computing environments, which means it is platform, vendor, or hardware independent. It was observed that the filtering algorithm was suitable for GPU implementation. This was because, even though every pixel used the information from its neighborhood window, processing for one pixel has no dependency on the results from its surrounding pixels. Thus, once the data for the neighborhood was loaded into the local memory of the multiprocessor, simultaneous processing for several pixels could be carried out by the device. The results obtained from our experiments were quite encouraging. We executed the MEX implementation on a Core2Duo CPU with 2 GB of RAM. On the other hand we used NVIDIA GeForce 240 as the GPU device, which comes with 96 cores, graphics clock of 550 MHz, processor clock of 1340 MHz and 512 MB memory. The processing speed improved significantly and the quality of the depth maps was at par with the same algorithm running on a CPU. In order to test the effect of our filtering algorithm on degraded depth map, we introduced artifacts by compressing it using H.264 encoder. The level of degradation was controlled by varying the quantization parameter. The blocky depth map was filtered separately using our implementation on GPU and then on CPU. The results showed improvement in speed up to 30 times, while obtaining refined depth maps with similar quality measure as the ones processed using the CPU implementation

    Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

    Get PDF
    Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging. Meanwhile, trends in low-cost, low-power processing are towards massive parallelism and heterogeneity, making it difficult for robotics and vision researchers to implement their algorithms in a performance-portable way. In this paper we introduce SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption of a dense RGB-D SLAM system. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and harnesses the ICL-NUIM dataset of synthetic RGB-D sequences with trajectory and scene ground truth for reliable accuracy comparison of different implementation and algorithms. We present an analysis and breakdown of the constituent algorithmic elements of KinectFusion, and experimentally investigate their execution time on a variety of multicore and GPUaccelerated platforms. For a popular embedded platform, we also present an analysis of energy efficiency for different configuration alternatives.Comment: 8 pages, ICRA 2015 conference pape

    Reducing 3D video coding complexity through more efficient disparity estimation

    Get PDF
    3D video coding for transmission exploits the Disparity Estimation (DE) to remove the inter-view redundancies present within both the texture and the depth map multi-view videos. Good estimation accuracy can be achieved by partitioning the macro-block into smaller subblocks partitions. However, the DE process must be performed on each individual sub-block to determine the optimal mode and their disparity vectors, in terms of ratedistortion efficiency. This vector estimation process is heavy on computational resources, thus, the coding computational cost becomes proportional to the number of search points and the inter-view modes tested during the rate-distortion optimization. In this paper, a solution that exploits the available depth map data, together with the multi-view geometry, is proposed to identify a better DE search area; such that it allows a reduction in its search points. It also exploits the number of different depth levels present within the current macro-block to determine which modes can be used for DE to further reduce its computations. Simulation results demonstrate that this can save up to 95% of the encoding time, with little influence on the coding efficiency of the texture and the depth map multi-view video coding. This makes 3D video coding more practical for any consumer devices, which tend to have limited computational power.peer-reviewe

    Real-time rendering of large surface-scanned range data natively on a GPU

    Get PDF
    This thesis presents research carried out for the visualisation of surface anatomy data stored as large range images such as those produced by stereo-photogrammetric, and other triangulation-based capture devices. As part of this research, I explored the use of points as a rendering primitive as opposed to polygons, and the use of range images as the native data representation. Using points as a display primitive as opposed to polygons required the creation of a pipeline that solved problems associated with point-based rendering. The problems inves tigated were scattered-data interpolation (a common problem with point-based rendering), multi-view rendering, multi-resolution representations, anti-aliasing, and hidden-point re- moval. In addition, an efficient real-time implementation on the GPU was carried out

    GPU data structures for graphics and vision

    Get PDF
    Graphics hardware has in recent years become increasingly programmable, and its programming APIs use the stream processor model to expose massive parallelization to the programmer. Unfortunately, the inherent restrictions of the stream processor model, used by the GPU in order to maintain high performance, often pose a problem in porting CPU algorithms for both video and volume processing to graphics hardware. Serial data dependencies which accelerate CPU processing are counterproductive for the data-parallel GPU. This thesis demonstrates new ways for tackling well-known problems of large scale video/volume analysis. In some instances, we enable processing on the restricted hardware model by re-introducing algorithms from early computer graphics research. On other occasions, we use newly discovered, hierarchical data structures to circumvent the random-access read/fixed write restriction that had previously kept sophisticated analysis algorithms from running solely on graphics hardware. For 3D processing, we apply known game graphics concepts such as mip-maps, projective texturing, and dependent texture lookups to show how video/volume processing can benefit algorithmically from being implemented in a graphics API. The novel GPU data structures provide drastically increased processing speed, and lift processing heavy operations to real-time performance levels, paving the way for new and interactive vision/graphics applications.Graphikhardware wurde in den letzen Jahren immer weiter programmierbar. Ihre APIs verwenden das Streamprozessor-Modell, um die massive Parallelisierung auch für den Programmierer verfügbar zu machen. Leider folgen aus dem strikten Streamprozessor-Modell, welches die GPU für ihre hohe Rechenleistung benötigt, auch Hindernisse in der Portierung von CPU-Algorithmen zur Video- und Volumenverarbeitung auf die GPU. Serielle Datenabhängigkeiten beschleunigen zwar CPU-Verarbeitung, sind aber für die daten-parallele GPU kontraproduktiv . Diese Arbeit präsentiert neue Herangehensweisen für bekannte Probleme der Video- und Volumensverarbeitung. Teilweise wird die Verarbeitung mit Hilfe von modifizierten Algorithmen aus der frühen Computergraphik-Forschung an das beschränkte Hardwaremodell angepasst. Anderswo helfen neu entdeckte, hierarchische Datenstrukturen beim Umgang mit den Schreibzugriff-Restriktionen die lange die Portierung von komplexeren Bildanalyseverfahren verhindert hatten. In der 3D-Verarbeitung nutzen wir bekannte Konzepte aus der Computerspielegraphik wie Mipmaps, projektive Texturierung, oder verkettete Texturzugriffe, und zeigen auf welche Vorteile die Video- und Volumenverarbeitung aus hardwarebeschleunigter Graphik-API-Implementation ziehen kann. Die präsentierten GPU-Datenstrukturen bieten drastisch schnellere Verarbeitung und heben rechenintensive Operationen auf Echtzeit-Niveau. Damit werden neue, interaktive Bildverarbeitungs- und Graphik-Anwendungen möglich

    Interaktive Raumzeitrekonstruktion in der Computergraphik

    Get PDF
    High-quality dense spatial and/or temporal reconstructions and correspondence maps from camera images, be it optical flow, stereo or scene flow, are an essential prerequisite for a multitude of computer vision and graphics tasks, e.g. scene editing or view interpolation in visual media production. Due to the ill-posed nature of the estimation problem in typical setups (i.e. limited amount of cameras, limited frame rate), automated estimation approaches are prone to erroneous correspondences and subsequent quality degradation in many non-trivial cases such as occlusions, ambiguous movements, long displacements, or low texture. While improving estimation algorithms is one obvious possible direction, this thesis complementarily concerns itself with creating intuitive, high-level user interactions that lead to improved correspondence maps and scene reconstructions. Where visually convincing results are essential, rendering artifacts resulting from estimation errors are usually repaired by hand with image editing tools, which is time consuming and therefore costly. My new user interactions, which integrate human scene recognition capabilities to guide a semi-automatic correspondence or scene reconstruction algorithm, save considerable effort and enable faster and more efficient production of visually convincing rendered images.Raumzeit-Rekonstruktion in Form von dichten räumlichen und/oder zeitlichen Korrespondenzen zwischen Kamerabildern, sei es optischer Fluss, Stereo oder Szenenfluss, ist eine wesentliche Voraussetzung für eine Vielzahl von Aufgaben in der Computergraphik, zum Beispiel zum Editieren von Szenen oder Bildinterpolation. Da sowohl die Anzahl der Kameras als auch die Bildfrequenz begrenzt sind, ist das Rekonstruktionsproblem unterbestimmt, weswegen automatisierte Schätzungen häufig fehlerhafte Korrespondenzen für nichttriviale Fälle wie Verdeckungen, mehrdeutige oder große Bewegungen, oder einheitliche Texturen enthalten; jede Bildsynthese basierend auf den partiell falschen Schätzungen muß daher Qualitätseinbußen in Kauf nehmen. Man kann nun zum einen versuchen, die Schätzungsalgorithmen zu verbessern. Komplementär dazu kann man möglichst effiziente Interaktionsmöglichkeiten entwickeln, die die Qualität der Rekonstruktion drastisch verbessern. Dies ist das Ziel dieser Dissertation. Für visuell überzeugende Resultate müssen Bildsynthesefehler bislang manuell in einem aufwändigen Nachbearbeitungsschritt mit Hilfe von Bildbearbeitungswerkzeugen korrigiert werden. Meine neuen Benutzerinteraktionen, welche menschliches Szenenverständnis in halbautomatische Algorithmen integrieren, verringern den Nachbearbeitungsaufwand beträchtlich und ermöglichen so eine schnellere und effizientere Produktion qualitativ hochwertiger synthetisierter Bilder

    Recuperación de la geometría de una escena a partir de imágenes plenópticas aplicando técnicas locales

    Get PDF
    En los últimos años, se han producido notables avances en la captura y procesamiento de imágenes con la finalidad de obtener la información tridimensional de la escena frente a la cámara. Para ello se utiliza el concepto de light field, una función de cuatro dimensiones que recoge la intensidad de los rayos de luz que atraviesan el espacio vacío en cada punto del mismo. Dicha función se puede capturar mediante diversos dispositivos. Una vez capturado, el light field permite sintetizar imágenes reenfocadas a distintas distancias o desde distintos puntos de vista. Además, estos datos se pueden utilizar para recuperar la información tridimensional de la escena, ya que se preserva la información angular. Esta tesis doctoral se ha centrado en el procesamiento del light field para la obtención de estimaciones de distancias mediante métodos locales
    corecore