3,853 research outputs found

    What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

    Full text link
    Intelligent Mesh Generation (IMG) represents a novel and promising field of research, utilizing machine learning techniques to generate meshes. Despite its relative infancy, IMG has significantly broadened the adaptability and practicality of mesh generation techniques, delivering numerous breakthroughs and unveiling potential future pathways. However, a noticeable void exists in the contemporary literature concerning comprehensive surveys of IMG methods. This paper endeavors to fill this gap by providing a systematic and thorough survey of the current IMG landscape. With a focus on 113 preliminary IMG methods, we undertake a meticulous analysis from various angles, encompassing core algorithm techniques and their application scope, agent learning objectives, data types, targeted challenges, as well as advantages and limitations. We have curated and categorized the literature, proposing three unique taxonomies based on key techniques, output mesh unit elements, and relevant input data types. This paper also underscores several promising future research directions and challenges in IMG. To augment reader accessibility, a dedicated IMG project page is available at \url{https://github.com/xzb030/IMG_Survey}

    Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity

    Get PDF
    When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data

    Learning, Categorization, Rule Formation, and Prediction by Fuzzy Neural Networks

    Full text link
    National Science Foundation (IRI 94-01659); Office of Naval Research (N00014-91-J-4100, N00014-92-J-4015) Air Force Office of Scientific Research (90-0083, N00014-92-J-4015

    Towards deep unsupervised inverse graphics

    Full text link
    Un objectif de longue date dans le domaine de la vision par ordinateur est de déduire le contenu 3D d’une scène à partir d’une seule photo, une tâche connue sous le nom d’inverse graphics. L’apprentissage automatique a, dans les dernières années, permis à de nombreuses approches de faire de grands progrès vers la résolution de ce problème. Cependant, la plupart de ces approches requièrent des données de supervision 3D qui sont coûteuses et parfois impossible à obtenir, ce qui limite les capacités d’apprentissage de telles œuvres. Dans ce travail, nous explorons l’architecture des méthodes d’inverse graphics non-supervisées et proposons deux méthodes basées sur des représentations 3D et algorithmes de rendus différentiables distincts: les surfels ainsi qu’une nouvelle représentation basée sur Voronoï. Dans la première méthode basée sur les surfels, nous montrons que, bien qu’efficace pour maintenir la cohérence visuelle, la production de surfels à l’aide d’une carte de profondeur apprise entraîne des ambiguïtés car la relation entre la carte de profondeur et le rendu n’est pas bijective. Dans notre deuxième méthode, nous introduisons une nouvelle représentation 3D basée sur les diagrammes de Voronoï qui modélise des objets/scènes à la fois explicitement et implicitement, combinant ainsi les avantages des deux approches. Nous montrons comment cette représentation peut être utilisée à la fois dans un contexte supervisé et non-supervisé et discutons de ses avantages par rapport aux représentations 3D traditionnellesA long standing goal of computer vision is to infer the underlying 3D content in a scene from a single photograph, a task known as inverse graphics. Machine learning has, in recent years, enabled many approaches to make great progress towards solving this problem. However, most approaches rely on 3D supervision data which is expensive and sometimes impossible to obtain and therefore limits the learning capabilities of such work. In this work, we explore the deep unsupervised inverse graphics training pipeline and propose two methods based on distinct 3D representations and associated differentiable rendering algorithms: namely surfels and a novel Voronoi-based representation. In the first method based on surfels, we show that, while effective at maintaining view-consistency, producing view-dependent surfels using a learned depth map results in ambiguities as the mapping between depth map and rendering is non-bijective. In our second method, we introduce a novel 3D representation based on Voronoi diagrams which models objects/scenes both explicitly and implicitly simultaneously, thereby combining the benefits of both. We show how this representation can be used in both a supervised and unsupervised context and discuss its advantages compared to traditional 3D representations

    High-Capacity Directional Graph Networks

    Get PDF
    Deep Neural Networks (DNN) have proven themselves to be a useful tool in many computer vision problems. One of the most popular forms of the DNN is the Convolutional Neural Network (CNN). The CNN effectively learns features on images by learning a weighted sum of local neighborhoods of pixels, creating filtered versions of the image. Point cloud analysis seems like it would benefit from this useful model. However, point clouds are much less structured than images. Many analogues to CNNs for point clouds have been proposed in the literature, but they are often much more constrained networks than the typical CNN. This is a matter of necessity: common point cloud benchmark datasets are fairly small and thus require strong regularization to mitigate overfitting. In this dissertation we propose two point cloud network models based on graph structures that achieve the high-capacity modeling capability of CNNs. In addition to showing their effectiveness on point cloud classification and segmentation in typical benchmark scenarios, we also propose two novel point cloud problems: ATLAS Detector segmentation and Computational Fluid Dynamics (CFD) surrogate modeling. We show that our networks are much more effective than others on these new problems because they benefit from deeper networks and extra capacity that other researchers have not pursued. These novel networks and datasets pave the way for future development of deeper, more sophisticated point cloud networks

    3D Representation Learning for Shape Reconstruction and Understanding

    Get PDF
    The real world we are living in is inherently composed of multiple 3D objects. However, most of the existing works in computer vision traditionally either focus on images or videos where the 3D information inevitably gets lost due to the camera projection. Traditional methods typically rely on hand-crafted algorithms and features with many constraints and geometric priors to understand the real world. However, following the trend of deep learning, there has been an exponential growth in the number of research works based on deep neural networks to learn 3D representations for complex shapes and scenes, which lead to many cutting-edged applications in augmented reality (AR), virtual reality (VR) and robotics as one of the most important directions for computer vision and computer graphics. This thesis aims to build an intelligent system with dynamic 3D representations that can change over time to understand and recover the real world with semantic, instance and geometric information and eventually bridge the gap between the real world and the digital world. As the first step towards the challenges, this thesis explores both explicit representations and implicit representations by explicitly addressing the existing open problems in these areas. This thesis starts from neural implicit representation learning on 3D scene representation learning and understanding and moves to a parametric model based explicit 3D reconstruction method. Extensive experimentation over various benchmarks on various domains demonstrates the superiority of our method against previous state-of-the-art approaches, enabling many applications in the real world. Based on the proposed methods and current observations of open problems, this thesis finally presents a comprehensive conclusion with potential future research directions
    • …
    corecore