9 research outputs found

    3D Object Reconstruction from Imperfect Depth Data Using Extended YOLOv3 Network

    Get PDF
    State-of-the-art intelligent versatile applications provoke the usage of full 3D, depth-based streams, especially in the scenarios of intelligent remote control and communications, where virtual and augmented reality will soon become outdated and are forecasted to be replaced by point cloud streams providing explorable 3D environments of communication and industrial data. One of the most novel approaches employed in modern object reconstruction methods is to use a priori knowledge of the objects that are being reconstructed. Our approach is different as we strive to reconstruct a 3D object within much more difficult scenarios of limited data availability. Data stream is often limited by insufficient depth camera coverage and, as a result, the objects are occluded and data is lost. Our proposed hybrid artificial neural network modifications have improved the reconstruction results by 8.53 which allows us for much more precise filling of occluded object sides and reduction of noise during the process. Furthermore, the addition of object segmentation masks and the individual object instance classification is a leap forward towards a general-purpose scene reconstruction as opposed to a single object reconstruction task due to the ability to mask out overlapping object instances and using only masked object area in the reconstruction process

    3DGen: Triplane Latent Diffusion for Textured Mesh Generation

    Full text link
    Latent diffusion models for image generation have crossed a quality threshold which enabled them to achieve mass adoption. Recently, a series of works have made advancements towards replicating this success in the 3D domain, introducing techniques such as point cloud VAE, triplane representation, neural implicit surfaces and differentiable rendering based training. We take another step along this direction, combining these developments in a two-step pipeline consisting of 1) a triplane VAE which can learn latent representations of textured meshes and 2) a conditional diffusion model which generates the triplane features. For the first time this architecture allows conditional and unconditional generation of high quality textured or untextured 3D meshes across multiple diverse categories in a few seconds on a single GPU. It outperforms previous work substantially on image-conditioned and unconditional generation on mesh quality as well as texture generation. Furthermore, we demonstrate the scalability of our model to large datasets for increased quality and diversity. We will release our code and trained models

    Signature Verification Using Siamese Convolutional Neural Networks

    Get PDF
    This research entails the processes undergone in building a Siamese Neural Network for Signature Verification. This Neural Network which uses two similar base neural networks as its underlying architecture was built, trained and evaluated in this project. The base networks were made up of two similar convolutional neural networks sharing the same weights during training. The architecture commonly known as the Siamese network helped reduce the amount of training data needed for its implementation and thus increased the model’s efficiency by 13%. The convolutional network was made up of three convolutional layers, three pooling layers and one fully connected layer onto which the final results were passed to the contrastive loss function for comparison. A threshold function determined if the signatures were forged or not. An accuracy of 78% initially achieved led to the tweaking and improvement of the model to achieve a better prediction accuracy of 93%

    Deep-Learning-Based 3-D Surface Reconstruction—A Survey

    Get PDF
    In the last decade, deep learning (DL) has significantly impacted industry and science. Initially largely motivated by computer vision tasks in 2-D imagery, the focus has shifted toward 3-D data analysis. In particular, 3-D surface reconstruction, i.e., reconstructing a 3-D shape from sparse input, is of great interest to a large variety of application fields. DL-based approaches show promising quantitative and qualitative surface reconstruction performance compared to traditional computer vision and geometric algorithms. This survey provides a comprehensive overview of these DL-based methods for 3-D surface reconstruction. To this end, we will first discuss input data modalities, such as volumetric data, point clouds, and RGB, single-view, multiview, and depth images, along with corresponding acquisition technologies and common benchmark datasets. For practical purposes, we also discuss evaluation metrics enabling us to judge the reconstructive performance of different methods. The main part of the document will introduce a methodological taxonomy ranging from point- and mesh-based techniques to volumetric and implicit neural approaches. Recent research trends, both methodological and for applications, are highlighted, pointing toward future developments

    Explain what you see:argumentation-based learning and robotic vision

    Get PDF
    In this thesis, we have introduced new techniques for the problems of open-ended learning, online incremental learning, and explainable learning. These methods have applications in the classification of tabular data, 3D object category recognition, and 3D object parts segmentation. We have utilized argumentation theory and probability theory to develop these methods. The first proposed open-ended online incremental learning approach is Argumentation-Based online incremental Learning (ABL). ABL works with tabular data and can learn with a small number of learning instances using an abstract argumentation framework and bipolar argumentation framework. It has a higher learning speed than state-of-the-art online incremental techniques. However, it has high computational complexity. We have addressed this problem by introducing Accelerated Argumentation-Based Learning (AABL). AABL uses only an abstract argumentation framework and uses two strategies to accelerate the learning process and reduce the complexity. The second proposed open-ended online incremental learning approach is the Local Hierarchical Dirichlet Process (Local-HDP). Local-HDP aims at addressing two problems of open-ended category recognition of 3D objects and segmenting 3D object parts. We have utilized Local-HDP for the task of object part segmentation in combination with AABL to achieve an interpretable model to explain why a certain 3D object belongs to a certain category. The explanations of this model tell a user that a certain object has specific object parts that look like a set of the typical parts of certain categories. Moreover, integrating AABL and Local-HDP leads to a model that can handle a high degree of occlusion

    On the Automation and Diagnosis of Visual Intelligence

    Get PDF
    One of the ultimate goals of computer vision is to equip machines with visual intelligence: the ability to understand a scene at the level that is indistinguishable from human's. This not only requires detecting the 2D or 3D locations of objects, but also recognizing their semantic categories, or even higher level interactions. Thanks to decades of vision research as well as recent developments in deep learning, we are closer to this goal than ever. But to keep closing the gap, more research is needed on two themes. One, current models are still far from perfect, so we need a mechanism to keep proposing new, better models to improve performance. Two, while we are pushing for performance, it is also important to do careful analysis and diagnosis of existing models, to make sure we are indeed moving in the right direction. In this dissertation, I study either of the two research themes for various steps in the visual intelligence pipeline. The first part of the dissertation focuses on category-level understanding of 2D images, which is arguably the most critical step in the visual intelligence pipeline as it bridges vision and language. The theme is on automating the process of model improvement: in particular, the architecture of neural networks. The second part extends the visual intelligence pipeline along the language side, and focuses on the more challenging language-level understanding of 2D images. The theme also shifts to diagnosis, by examining existing models, proposing interpretable models, or building diagnostic datasets. The third part continues in the diagnosis theme, this time extending along the vision side, focusing on how incorporating 3D scene knowledge may facilitate the evaluation of image recognition models

    Reconstruction of 3D Object Shape Using Hybrid Modular Neural Network Architecture Trained on 3D Models from <i>ShapeNetCore</i> Dataset

    Get PDF
    Depth-based reconstruction of three-dimensional (3D) shape of objects is one of core problems in computer vision with a lot of commercial applications. However, the 3D scanning for point cloud-based video streaming is expensive and is generally unattainable to an average user due to required setup of multiple depth sensors. We propose a novel hybrid modular artificial neural network (ANN) architecture, which can reconstruct smooth polygonal meshes from a single depth frame, using a priori knowledge. The architecture of neural network consists of separate nodes for recognition of object type and reconstruction thus allowing for easy retraining and extension for new object types. We performed recognition of nine real-world objects using the neural network trained on the ShapeNetCore model dataset. The results evaluated quantitatively using the Intersection-over-Union (IoU), Completeness, Correctness and Quality metrics, and qualitative evaluation by visual inspection demonstrate the robustness of the proposed architecture with respect to different viewing angles and illumination conditions
    corecore