175 research outputs found

    Mesh and Pyramid Algorithms for Iconic Indexing

    Get PDF
    In this paper parallel algorithms on meshes and pyramids for iconic indexing are presented. Our algorithms are asymptotically superior to previously known parallel algorithms

    A multiple-SIMD architecture for image and tracking analysis

    Get PDF
    The computational requirements for real-time image based applications are such as to warrant the use of a parallel architecture. Commonly used parallel architectures conform to the classifications of Single Instruction Multiple Data (SIMD), or Multiple Instruction Multiple Data (MIMD). Each class of architecture has its advantages and dis-advantages. For example, SIMD architectures can be used on data-parallel problems, such as the processing of an image. Whereas MIMD architectures are more flexible and better suited to general purpose computing. Both types of processing are typically required for the analysis of the contents of an image. This thesis describes a novel massively parallel heterogeneous architecture, implemented as the Warwick Pyramid Machine. Both SIMD and MIMD processor types are combined within this architecture. Furthermore, the SIMD array is partitioned, into smaller SIMD sub-arrays, forming a Multiple-SIMD array. Thus, local data parallel, global data parallel, and control parallel processing are supported. After describing the present options available in the design of massively parallel machines and the nature of the image analysis problem, the architecture of the Warwick Pyramid Machine is described in some detail. The performance of this architecture is then analysed, both in terms of peak available computational power and in terms of representative applications in image analysis and numerical computation. Two tracking applications are also analysed to show the performance of this architecture. In addition, they illustrate the possible partitioning of applications between the SIMD and MIMD processor arrays. Load-balancing techniques are then described which have the potential to increase the utilisation of the Warwick Pyramid Machine at run-time. These include mapping techniques for image regions across the Multiple-SIMD arrays, and for the compression of sparse data. It is envisaged that these techniques may be found useful in other parallel systems

    Computer vision algorithms on reconfigurable logic arrays

    Full text link

    Enabling Neural Radiance Fields (NeRF) for Large-scale Aerial Images -- A Multi-tiling Approach and the Geometry Assessment of NeRF

    Full text link
    Neural Radiance Fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well-documented for large-scale aerial assets,since such datasets usually result in very high memory consumption and slow convergence.. In this paper, we aim to scale the NeRF on large-scael aerial datasets and provide a thorough geometry assessment of NeRF. Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory, and increase the convergence rate within tiles. MCT decomposes a large-frame image into multiple tiled images with different camera models, allowing these small-frame images to be fed into the training process as needed for specific locations without a loss of accuracy. We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines on two typical aerial datasets against LiDAR reference data. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy.Comment: 9 Figur

    Concrete Surface Crack Detection with Convolutional-based Deep Learning Models

    Full text link
    Effective crack detection is pivotal for the structural health monitoring and inspection of buildings. This task presents a formidable challenge to computer vision techniques due to the inherently subtle nature of cracks, which often exhibit low-level features that can be easily confounded with background textures, foreign objects, or irregularities in construction. Furthermore, the presence of issues like non-uniform lighting and construction irregularities poses significant hurdles for autonomous crack detection during building inspection and monitoring. Convolutional neural networks (CNNs) have emerged as a promising framework for crack detection, offering high levels of accuracy and precision. Additionally, the ability to adapt pre-trained networks through transfer learning provides a valuable tool for users, eliminating the need for an in-depth understanding of algorithm intricacies. Nevertheless, it is imperative to acknowledge the limitations and considerations when deploying CNNs, particularly in contexts where the outcomes carry immense significance, such as crack detection in buildings. In this paper, our approach to surface crack detection involves the utilization of various deep-learning models. Specifically, we employ fine-tuning techniques on pre-trained deep learning architectures: VGG19, ResNet50, Inception V3, and EfficientNetV2. These models are chosen for their established performance and versatility in image analysis tasks. We compare deep learning models using precision, recall, and F1 scores.Comment: 11 pages, 3 figures, Journal pape

    Variable Resolution & Dimensional Mapping For 3d Model Optimization

    Get PDF
    Three-dimensional computer models, especially geospatial architectural data sets, can be visualized in the same way humans experience the world, providing a realistic, interactive experience. Scene familiarization, architectural analysis, scientific visualization, and many other applications would benefit from finely detailed, high resolution, 3D models. Automated methods to construct these 3D models traditionally has produced data sets that are often low fidelity or inaccurate; otherwise, they are initially highly detailed, but are very labor and time intensive to construct. Such data sets are often not practical for common real-time usage and are not easily updated. This thesis proposes Variable Resolution & Dimensional Mapping (VRDM), a methodology that has been developed to address some of the limitations of existing approaches to model construction from images. Key components of VRDM are texture palettes, which enable variable and ultra-high resolution images to be easily composited; texture features, which allow image features to integrated as image or geometry, and have the ability to modify the geometric model structure to add detail. These components support a primary VRDM objective of facilitating model refinement with additional data. This can be done until the desired fidelity is achieved as practical limits of infinite detail are approached. Texture Levels, the third component, enable real-time interaction with a very detailed model, along with the flexibility of having alternate pixel data for a given area of the model and this is achieved through extra dimensions. Together these techniques have been used to construct models that can contain GBs of imagery data

    A unified programming system for a multi-paradigm parallel architecture

    Get PDF
    Real time image understanding and image generation require very large amounts of computing power. A possible way to meet these requirements is to make use of the power available from parallel computing systems. However parallel machines exhibit performance which is highly dependent on the algorithms being executed. Both image understanding and image generation involve the use of a wide variety of algorithms. A parallel machine suited to some of these algorithms may be unsuited to others. This thesis describes a novel heterogeneous parallel architecture optimised for image based applications. It achieves its performance by combining two different forms of parallel architecture, namely fine grain SIMD and course grain MIMD, into a single architecture. In this way it is possible to match the most appropriate computing resource to each algorithm in a given application. As important as the architecture itself is a method for programming it. This thesis describes a novel multi-paradigm programming language based on C++, which allows programs which make use of both control and data parallelism to be expressed in a single coherent framework, based on object oriented programming. To demonstrate the utility of both the architecture and the programming system, two applications, one from the field of image understanding the other image generation are examined. These applications combine some novel algorithms with other novel implementation approaches to provide the most effective mapping onto this architecture

    Mathematical modeling for partial object detection.

    Get PDF
    From a computer vision point of view, the image is a scene consisting of objects of interest and a background represented by everything else in the image. The relations and interactions among these objects are the key factors for scene understanding. In this dissertation, a mathematical model is designed for the detection of partially occluded faces captured in unconstrained real life conditions. The proposed model novelty comes from explicitly considering certain objects that are common to occlude faces and embedding them in the face model. This enables the detection of faces in difficult settings and provides more information to subsequent analysis in addition to the bounding box of the face. In the proposed Selective Part Models (SPM), the face is modelled as a collection of parts that can be selected from the visible regular facial parts and some of the occluding objects which commonly interact with faces such as sunglasses, caps, hands, shoulders, and other faces. With the face detection being the first step in the face recognition pipeline, the proposed model does not only detect partially occluded faces efficiently but it also suggests the occluded parts to be excluded from the subsequent recognition step. The model was tested on several recent face detection databases and benchmarks and achieved state of the art performance. In addition, detailed analysis for the performance with respect to different types of occlusion were provided. Moreover, a new database was collected for evaluating face detectors focusing on the partial occlusion problem. This dissertation highlights the importance of explicitly handling the partial occlusion problem in face detection and shows its efficiency in enhancing both the face detection performance and the subsequent recognition performance of partially occluded faces. The broader impact of the proposed detector exceeds the common security applications by using it for human robot interaction. The humanoid robot Nao is used to help in teaching children with autism and the proposed detector is used to achieve natural interaction between the robot and the children by detecting their faces which can be used for recognition or more interestingly for adaptive interaction by analyzing their expressions

    Contributions to the Content-Based Image Retrieval Using Pictorial Queris

    Get PDF
    L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca.Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal.La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final
    corecore