855 research outputs found

    Underwater Imaging Using Underwater Vehicle for Subsea Surveillance

    Get PDF
    This Final Year Project (FYP) focuses on the improvement of images captured through the built-in underwater camera in the HydroView MAX, which is a Remotely-operated Vehicle (ROV) used to perform inspections in subsea environment. Images captured underwater are always degraded due to issues such as light scattering and colour changes. Image-processing algorithms are applied to improve the degraded images so that the images obtained will be enhanced and closer to their true colours for further analysis. These qualities are required so that the degree of corrosion of the underwater pipelines can be estimated with considerable reliability. The estimation of the corrosion degree is made possible by judging on the percentage of corroded surface over the pipeline surface based on the binary image generated

    Automatic road network extraction in suburban areas from aerial images

    Get PDF
    [no abstract

    Low-Resolution Vision for Autonomous Mobile Robots

    Get PDF
    The goal of this research is to develop algorithms using low-resolution images to perceive and understand a typical indoor environment and thereby enable a mobile robot to autonomously navigate such an environment. We present techniques for three problems: autonomous exploration, corridor classification, and minimalistic geometric representation of an indoor environment for navigation. First, we present a technique for mobile robot exploration in unknown indoor environments using only a single forward-facing camera. Rather than processing all the data, the method intermittently examines only small 32X24 downsampled grayscale images. We show that for the task of indoor exploration the visual information is highly redundant, allowing successful navigation even using only a small fraction (0.02%) of the available data. The method keeps the robot centered in the corridor by estimating two state parameters: the orientation within the corridor and the distance to the end of the corridor. The orientation is determined by combining the results of five complementary measures, while the estimated distance to the end combines the results of three complementary measures. These measures, which are predominantly information-theoretic, are analyzed independently, and the combined system is tested in several unknown corridor buildings exhibiting a wide variety of appearances, showing the sufficiency of low-resolution visual information for mobile robot exploration. Because the algorithm discards such a large percentage (99.98%) of the information both spatially and temporally, processing occurs at an average of 1000 frames per second, or equivalently takes a small fraction of the CPU. Second, we present an algorithm using image entropy to detect and classify corridor junctions from low resolution images. Because entropy can be used to perceive depth, it can be used to detect an open corridor in a set of images recorded by turning a robot at a junction by 360 degrees. Our algorithm involves detecting peaks from continuously measured entropy values and determining the angular distance between the detected peaks to determine the type of junction that was recorded (either middle, L-junction, T-junction, dead-end, or cross junction). We show that the same algorithm can be used to detect open corridors from both monocular as well as omnidirectional images. Third, we propose a minimalistic corridor representation consisting of the orientation line (center) and the wall-floor boundaries (lateral limit). The representation is extracted from low-resolution images using a novel combination of information theoretic measures and gradient cues. Our study investigates the impact of image resolution upon the accuracy of extracting such a geometry, showing that centerline and wall-floor boundaries can be estimated with reasonable accuracy even in texture-poor environments with low-resolution images. In a database of 7 unique corridor sequences for orientation measurements, less than 2% additional error was observed as the resolution of the image decreased by 99.9%

    Description Logic for Scene Understanding at the Example of Urban Road Intersections

    Get PDF
    Understanding a natural scene on the basis of external sensors is a task yet to be solved by computer algorithms. The present thesis investigates the suitability of a particular family of explicit, formal representation and reasoning formalisms for this task, which are subsumed under the term Description Logic

    Monocular depth estimation in images and sequences using occlusion cues

    Get PDF
    When humans observe a scene, they are able to perfectly distinguish the different parts composing it. Moreover, humans can easily reconstruct the spatial position of these parts and conceive a consistent structure. The mechanisms involving visual perception have been studied since the beginning of neuroscience but, still today, not all the processes composing it are known. In usual situations, humans can make use of three different methods to estimate the scene structure. The first one is the so called divergence and it makes use of both eyes. When objects lie in front of the observed at a distance up to hundred meters, subtle differences in the image formation in each eye can be used to determine depth. When objects are not in the field of view of both eyes, other mechanisms should be used. In these cases, both visual cues and prior learned information can be used to determine depth. Even if these mechanisms are less accurate than divergence, humans can almost always infer the correct depth structure when using them. As an example of visual cues, occlusion, perspective or object size provide a lot of information about the structure of the scene. A priori information depends on each observer, but it is normally used subconsciously by humans to detect commonly known regions such as the sky, the ground or different types of objects. In the last years, since technology has been able to handle the processing burden of vision systems, there has been lots of efforts devoted to design automated scene interpreting systems. In this thesis we address the problem of depth estimation using only one point of view and using only occlusion depth cues. The thesis objective is to detect occlusions present in the scene and combine them with a segmentation system so as to generate a relative depth order depth map for a scene. We explore both static and dynamic situations such as single images, frame inside sequences or full video sequences. In the case where a full image sequence is available, a system exploiting motion information to recover depth structure is also designed. Results are promising and competitive with respect to the state of the art literature, but there is still much room for improvement when compared to human depth perception performance.Quan els humans observen una escena, son capaços de distingir perfectament les parts que la composen i organitzar-les espacialment per tal de poder-se orientar. Els mecanismes que governen la percepció visual han estat estudiats des dels principis de la neurociència, però encara no es coneixen tots els processos biològic que hi prenen part. En situacions normals, els humans poden fer servir tres eines per estimar l’estructura de l’escena. La primera és l’anomenada divergència. Aprofita l’ús de dos punts de vista (els dos ulls) i és capaç¸ de determinar molt acuradament la posició dels objectes ,que a una distància de fins a cent metres, romanen enfront de l’observador. A mesura que augmenta la distància o els objectes no es troben en el camp de visió dels dos ulls, altres mecanismes s’han d’utilitzar. Tant l’experiència anterior com certs indicis visuals s’utilitzen en aquests casos i, encara que la seva precisió és menor, els humans aconsegueixen quasi bé sempre interpretar bé el seu entorn. Els indicis visuals que aporten informació de profunditat més coneguts i utilitzats són per exemple, la perspectiva, les oclusions o el tamany de certs objectes. L’experiència anterior permet resoldre situacions vistes anteriorment com ara saber quins regions corresponen al terra, al cel o a objectes. Durant els últims anys, quan la tecnologia ho ha permès, s’han intentat dissenyar sistemes que interpretessin automàticament diferents tipus d’escena. En aquesta tesi s’aborda el tema de l’estimació de la profunditat utilitzant només un punt de vista i indicis visuals d’oclusió. L’objectiu del treball es la detecció d’aquests indicis i combinar-los amb un sistema de segmentació per tal de generar automàticament els diferents plans de profunditat presents a una escena. La tesi explora tant situacions estàtiques (imatges fixes) com situacions dinàmiques, com ara trames dins de seqüències de vídeo o seqüències completes. En el cas de seqüències completes, també es proposa un sistema automàtic per reconstruir l’estructura de l’escena només amb informació de moviment. Els resultats del treball son prometedors i competitius amb la literatura del moment, però mostren encara que la visió per computador té molt marge de millora respecte la precisió dels humans

    Object detection and activity recognition in digital image and video libraries

    Get PDF
    This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates. The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties. The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients
    • …
    corecore