23 research outputs found

    Towards Highly-Integrated Stereovideoscopy for \u3ci\u3ein vivo\u3c/i\u3e Surgical Robots

    Get PDF
    When compared to traditional surgery, laparoscopic procedures result in better patient outcomes: shorter recovery, reduced post-operative pain, and less trauma to incisioned tissue. Unfortunately, laparoscopic procedures require specialized training for surgeons, as these minimally-invasive procedures provide an operating environment that has limited dexterity and limited vision. Advanced surgical robotics platforms can make minimally-invasive techniques safer and easier for the surgeon to complete successfully. The most common type of surgical robotics platforms -- the laparoscopic robots -- accomplish this with multi-degree-of-freedom manipulators that are capable of a diversified set of movements when compared to traditional laparoscopic instruments. Also, these laparoscopic robots allow for advanced kinematic translation techniques that allow the surgeon to focus on the surgical site, while the robot calculates the best possible joint positions to complete any surgical motion. An important component of these systems is the endoscopic system used to transmit a live view of the surgical environment to the surgeon. Coupled with 3D high-definition endoscopic cameras, the entirety of the platform, in effect, eliminates the peculiarities associated with laparoscopic procedures, which allows less-skilled surgeons to complete minimally-invasive surgical procedures quickly and accurately. A much newer approach to performing minimally-invasive surgery is the idea of using in-vivo surgical robots -- small robots that are inserted directly into the patient through a single, small incision; once inside, an in-vivo robot can perform surgery at arbitrary positions, with a much wider range of motion. While laparoscopic robots can harness traditional endoscopic video solutions, these in-vivo robots require a fundamentally different video solution that is as flexible as possible and free of bulky cables or fiber optics. This requires a miniaturized videoscopy system that incorporates an image sensor with a transceiver; because of severe size constraints, this system should be deeply embedded into the robotics platform. Here, early results are presented from the integration of a miniature stereoscopic camera into an in-vivo surgical robotics platform. A 26mm X 24mm stereo camera was designed and manufactured. The proposed device features USB connectivity and 1280 X 720 resolution at 30 fps. Resolution testing indicates the device performs much better than similarly-priced analog cameras. Suitability of the platform for 3D computer vision tasks -- including stereo reconstruction -- is examined. The platform was also tested in a living porcine model at the University of Nebraska Medical Center. Results from this experiment suggest that while the platform performs well in controlled, static environments, further work is required to obtain usable results in true surgeries. Concluding, several ideas for improvement are presented, along with a discussion of core challenges associated with the platform. Adviser: Lance C. PĂ©rez [Document = 28 Mb

    Efficient Object Detection in Mobile and Embedded Devices with Deep Neural Networks

    Get PDF
    [EN] Neural networks have become the standard for high accuracy computer vision. These algorithms can be built with arbitrarily large architectures to handle an ever growing complexity in the data they process. State of the art neural network architectures are primarily concerned with increasing the recognition accuracy when performing inference on an image, which creates an insatiable demand for energy and compute power. These models are primarily targeted to run on dense compute units such as GPUs. In recent years, demand to allow these models to execute in limited capacity environments such as smartphones, however even the most compact variations of these state of the art networks constantly push the boundaries of the power envelop under which they run. With the emergence of the Internet of Things, it is becoming a priority to enable mobile systems to perform image recognition at the edge, but with small energy requirements. This thesis focuses on the design and implementation of an object detection neural network that attempts to solve this problem, providing reasonable accuracy rates with extremely low compute power requirements. This is achieved by re-imagining the meta architecture of traditional object detection models and discovering a mechanism to classify and localize objects through a set of neural network based algorithms that are better aimed to mobile and embedded devices. The main contributions of this thesis are: (i) provide a better image processing algorithm that is more suitable at preparing data for consumption by taking advantage of the characteristics of the ISP available in these devices; (ii) provide a neural network architecture that maintains acceptable accuracy targets with minimal computational requirements by making efficient use of basic neural algorithms; and (iii) provide a programming framework for how these systems can be most efficiently implemented in a manner that is optimized for the underlying hardware units available in these devices by taking into account memory and computation restrictions

    Modulaarisen graafipohjaisen kuvankÀsittelyjÀrjestelmÀn verifiointi

    Get PDF
    Electronic devices today have become complex. Any non-trivial device consists of both hardware and software. Tightening time to market and cost requirements put pressure on the development process of the devices. Software and hardware needs to be developed concurrently and must be verified in an early phase of product development. This thesis introduces a graph based image processing system. Image processing system is a complex system that usually consists of software, firmware and hardware. The possibilities and methods of graph verification are investigated in this thesis. Graphs can be used to handle the complexity of the system by encapsulating the functionality of the underlying implementations. Graphs provide modularity and configurability that can be utilized in the development and verification of the system. Reuse of software is increased due to the consistent and defined nature of graphs and their vertices. Software development shift left can be enabled by performing graph vertex verification in isolation by using pre-silicon development platforms. In this thesis, image processing system graphs were also used in a real life product development project. Graph verification was initiated early in the product development. Shift left was exercised by utilizing the graph verification in several pre-silicon platforms. Functional, performance and stability testing was implemented. Both complete graphs and their vertices were verified in isolation. Graph verification provided many benefits to the product development. Implementations could be tested in several different environments in isolation using only a light test framework. Issues could be found and fixed early. Performance bottlenecks could be pinpointed and acted upon. With the foundations laid in this project, it would be possible in the future to take more advantage of graphs. More advanced automated image quality testing would allow efficient verification. Finer granularity graphs would allow more configurability and more focused testing. Shift left could be further increased by adapting the development of the algorithms to use graphs. This would lower the gap between algorithms and actual vertex implementations and also introduce the available test infrastructure to algorithm development

    Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs

    Get PDF
    In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement. This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration. Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure. In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0. The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data

    Encoding high dynamic range and wide color gamut imagery

    Get PDF
    In dieser Dissertation wird ein szenischer Bewegtbilddatensatz mit erweitertem Dynamikumfang (High Dynamic Range, HDR) und großem Farbumfang (Wide Color Gamut, WCG) eingefĂŒhrt und es werden Modelle zur Kodierung von HDR und WCG Bildern vorgestellt. Die objektive und visuelle Evaluation neuer HDR und WCG Bildverarbeitungsalgorithmen, Kompressionsverfahren und BildwiedergabegerĂ€te erfordert einen Referenzdatensatz hoher QualitĂ€t. Daher wird ein neuer HDR- und WCG-Video-Datensatz mit einem Dynamikumfang von bis zu 18 fotografischen Blenden eingefĂŒhrt. Er enthĂ€lt inszenierte und dokumentarische Szenen. Die einzelnen Szenen sind konzipiert um eine Herausforderung fĂŒr Tone Mapping Operatoren, Gamut Mapping Algorithmen, Kompressionscodecs und HDR und WCG BildanzeigegerĂ€te darzustellen. Die Szenen sind mit professionellem Licht, Maske und Filmausstattung aufgenommen. Um einen cinematischen Bildeindruck zu erhalten, werden digitale Filmkameras mit ‘Super-35 mm’ SensorgrĂ¶ĂŸe verwendet. Der zusĂ€tzliche Informationsgehalt von HDR- und WCG-Videosignalen erfordert im Vergleich zu Signalen mit herkömmlichem Dynamikumfang eine neue und effizientere Signalkodierung. Ein Farbraum fĂŒr HDR und WCG Video sollte nicht nur effizient quantisieren, sondern wegen der unterschiedlichen Monitoreigenschaften auf der EmpfĂ€ngerseite auch fĂŒr die Dynamik- und Farbumfangsanpassung geeignet sein. Bisher wurden Methoden fĂŒr die Quantisierung von HDR Luminanzsignalen vorgeschlagen. Es fehlt jedoch noch ein entsprechendes Modell fĂŒr Farbdifferenzsignale. Es werden daher zwei neue FarbrĂ€ume eingefĂŒhrt, die sich sowohl fĂŒr die effiziente Kodierung von HDR und WCG Signalen als auch fĂŒr die Dynamik- und Farbumfangsanpassung eignen. Diese FarbrĂ€ume werden mit existierenden HDR und WCG Farbsignalkodierungen des aktuellen Stands der Technik verglichen. Die vorgestellten Kodierungsschemata erlauben es, HDR- und WCG-Video mittels drei FarbkanĂ€len mit 12 Bits tonaler Auflösung zu quantisieren, ohne dass Quantisierungsartefakte sichtbar werden. WĂ€hrend die Speicherung und Übertragung von HDR und WCG Video mit 12-Bit Farbtiefe pro Kanal angestrebt wird, unterstĂŒtzen aktuell verbreitete Dateiformate, Videoschnittstellen und Kompressionscodecs oft nur niedrigere Bittiefen. Um diese existierende Infrastruktur fĂŒr die HDR VideoĂŒbertragung und -speicherung nutzen zu können, wird ein neues bildinhaltsabhĂ€ngiges Quantisierungsschema eingefĂŒhrt. Diese Quantisierungsmethode nutzt Bildeigenschaften wie Rauschen und Textur um die benötigte tonale Auflösung fĂŒr die visuell verlustlose Quantisierung zu schĂ€tzen. Die vorgestellte Methode erlaubt es HDR Video mit einer Bittiefe von 10 Bits ohne sichtbare Unterschiede zum Original zu quantisieren und kommt mit weniger Rechenkraft im Vergleich zu aktuellen HDR Bilddifferenzmetriken aus

    AUTOMATED ESTIMATION, REDUCTION, AND QUALITY ASSESSMENT OF VIDEO NOISE FROM DIFFERENT SOURCES

    Get PDF
    Estimating and removing noise from video signals is important to increase either the visual quality of video signals or the performance of video processing algorithms such as compression or segmentation where noise estimation or reduction is a pre-processing step. To estimate and remove noise, effective methods use both spatial and temporal information to increase the reliability of signal extraction from noise. The objective of this thesis is to introduce a video system having three novel techniques to estimate and reduce video noise from different sources, both effectively and efficiently and assess video quality without considering a reference non-noisy video. The first (intensity-variances based homogeneity classification) technique estimates visual noise of different types in images and video signals. The noise can be white Gaussian noise, mixed Poissonian- Gaussian (signal-dependent white) noise, or processed (frequency-dependent) noise. The method is based on the classification of intensity-variances of signal patches in order to find homogeneous regions that best represent the noise signal in the input signal. The method assumes that noise is signal-independent in each intensity class. To find homogeneous regions, the method works on the downsampled input image and divides it into patches. Each patch is assigned to an intensity class, whereas outlier patches are rejected. Then the most homogeneous cluster is selected and its noise variance is considered as the peak of noise variance. To account for processed noise, we estimate the degree of spatial correlation. To account for temporal noise variations a stabilization process is proposed. We show that the proposed method competes related state-of-the-art in noise estimation. The second technique provides solutions to remove real-world camera noise such as signal-independent, signal-dependent noise, and frequency-dependent noise. Firstly, we propose a noise equalization method in intensity and frequency domain which enables a white Gaussian noise filter to handle real noise. Our experiments confirm the quality improvement under real noise while white Gaussian noise filter is used with our equalization method. Secondly, we propose a band-limited time-space video denoiser which reduces video noise of different types. This denoiser consists of: 1) intensity-domain noise equalization to account for signal dependency, 2) band-limited anti-blocking time-domain filtering of current frame using motion-compensated previous and subsequent frames, 3) spatial filtering combined with noise frequency equalizer to remove residual noise left from temporal filtering, and 4) intensity de-equalization to invert the first step. To decrease the chance of motion blur, temporal weights are calculated using two levels of error estimation; coarse (blocklevel) and fine (pixel-level). We correct the erroneous motion vectors by creating a homography from reliable motion vectors. To eliminate blockiness in block-based temporal filter, we propose three ideas: interpolation of block-level error, a band-limited filtering by subtracting the back-signal beforehand, and two-band motion compensation. The proposed time-space filter is parallelizable to be significantly accelerated by GPU. We show that the proposed method competes related state-ofthe- art in video denoising. The third (sparsity and dominant orientation quality index) technique is a new method to assess the quality of the denoised video frames without a reference (clean frames). In many image and video applications, a quantitative measure of image content, noise, and blur is required to facilitate quality assessment, when the ground-truth is not available. We propose a fast method to find the dominant orientation of image patches, which is used to decompose them into singular values. Combining singular values with the sparsity of the patch in the transform domain, we measure the possible image content and noise of the patches and of the whole image. To measure the effect of noise accurately, our method takes both low and high textured patches into account. Before analyzing the patches, we apply a shrinkage in the transform domain to increase the contrast of genuine image structure. We show that the proposed method is useful to select parameters of denoising algorithms automatically in different noise scenarios such as white Gaussian and real noise. Our objective and subjective results confirm the correspondence between the measured quality and the ground-truth and proposed method rivals related state-of-the-art approaches
    corecore