38 research outputs found

    PREF: Predictability Regularized Neural Motion Fields

    Full text link
    Knowing the 3D motions in a dynamic scene is essential to many vision applications. Recent progress is mainly focused on estimating the activity of some specific elements like humans. In this paper, we leverage a neural motion field for estimating the motion of all points in a multiview setting. Modeling the motion from a dynamic scene with multiview data is challenging due to the ambiguities in points of similar color and points with time-varying color. We propose to regularize the estimated motion to be predictable. If the motion from previous frames is known, then the motion in the near future should be predictable. Therefore, we introduce a predictability regularization by first conditioning the estimated motion on latent embeddings, then by adopting a predictor network to enforce predictability on the embeddings. The proposed framework PREF (Predictability REgularized Fields) achieves on par or better results than state-of-the-art neural motion field-based dynamic scene representation methods, while requiring no prior knowledge of the scene.Comment: Accepted at ECCV 2022 (oral). Paper + supplementary materia

    Progressive Multi-view Human Mesh Recovery with Self-Supervision

    Full text link
    To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e.g., motion capture, sport analysis) and robustness to single-view ambiguities. Existing solutions typically suffer from poor generalization performance to new settings, largely due to the limited diversity of image-mesh pairs in multi-view training data. To address this shortcoming, people have explored the use of synthetic images. But besides the usual impact of visual gap between rendered and target data, synthetic-data-driven multi-view estimators also suffer from overfitting to the camera viewpoint distribution sampled during training which usually differs from real-world distributions. Tackling both challenges, we propose a novel simulation-based training pipeline for multi-view human mesh recovery, which (a) relies on intermediate 2D representations which are more robust to synthetic-to-real domain gap; (b) leverages learnable calibration and triangulation to adapt to more diversified camera setups; and (c) progressively aggregates multi-view information in a canonical 3D space to remove ambiguities in 2D representations. Through extensive benchmarking, we demonstrate the superiority of the proposed solution especially for unseen in-the-wild scenarios.Comment: Accepted by AAAI202

    Topographic divergence of atypical cortical asymmetry and atrophy patterns in temporal lobe epilepsy

    Get PDF
    Temporal lobe epilepsy, a common drug-resistant epilepsy in adults, is primarily a limbic network disorder associated with predominant unilateral hippocampal pathology. Structural MRI has provided an in vivo window into whole-brain grey matter structural alterations in temporal lobe epilepsy relative to controls, by either mapping (i) atypical inter-hemispheric asymmetry; or (ii) regional atrophy. However, similarities and differences of both atypical asymmetry and regional atrophy measures have not been systematically investigated. Here, we addressed this gap using the multisite ENIGMA-Epilepsy dataset comprising MRI brain morphological measures in 732 temporal lobe epilepsy patients and 1418 healthy controls. We compared spatial distributions of grey matter asymmetry and atrophy in temporal lobe epilepsy, contextualized their topographies relative to spatial gradients in cortical microstructure and functional connectivity calculated using 207 healthy controls obtained from Human Connectome Project and an independent dataset containing 23 temporal lobe epilepsy patients and 53 healthy controls and examined clinical associations using machine learning. We identified a marked divergence in the spatial distribution of atypical inter-hemispheric asymmetry and regional atrophy mapping. The former revealed a temporo-limbic disease signature while the latter showed diffuse and bilateral patterns. Our findings were robust across individual sites and patients. Cortical atrophy was significantly correlated with disease duration and age at seizure onset, while degrees of asymmetry did not show a significant relationship to these clinical variables. Our findings highlight that the mapping of atypical inter-hemispheric asymmetry and regional atrophy tap into two complementary aspects of temporal lobe epilepsy-related pathology, with the former revealing primary substrates in ipsilateral limbic circuits and the latter capturing bilateral disease effects. These findings refine our notion of the neuropathology of temporal lobe epilepsy and may inform future discovery and validation of complementary MRI biomarkers in temporal lobe epilepsy.11Nsciescopu

    Grid-based local feature bundling for efficient object search and localization

    No full text
    We propose a new grid-based image representation for dis- criminative visual object search, with the goal to efficiently locate the query object in a large image collection. After ex- tracting local invariant features, we partition the image into non-overlapped rectangular grid cells. Each grid bundles the local features within it and is characterized by a histogram of visual words. Given both positive and negative queries, each grid is assigned a mutual information score to match and lo- cate the query object. This new image representation brings in two great benefits for efficient object search: 1) as the grid bundles local features, the spatial contextual information en- hances the discriminative matching; and 2) it enables faster object localization by searching visual object on the grid-level image. To evaluate our approach, we perform experiments on a very challenging logo database BelgaLogos [1] of 10,000 images. The comparison with the state-of-the-art methods highlights the effectiveness of our approach in both accuracy and speed.Accepted versio

    Distributed Composite Quantization

    No full text
    Approximate nearest neighbor (ANN) search is a fundamental problem in computer vision, machine learning and information retrieval. Recently, quantization-based methods have drawn a lot of attention due to their superior accuracy and comparable efficiency compared with traditional hashing techniques. However, despite the prosperity of quantization techniques, they are all designed for the centralized setting, i.e., quantization is performed on the data on a single machine. This makes it difficult to scale these techniques to large-scale datasets. Built upon the Composite Quantization, we propose a novel quantization algorithm for data dis- tributed across different nodes of an arbitrary network. The proposed Distributed Composite Quantization (DCQ) decom-poses Composite Quantization into a set of decentralized sub-problems such that each node solves its own sub-problem on its local data, meanwhile is still able to attain consistent quantizers thanks to the consensus constraint. Since there is no exchange of training data across the nodes in the learning process, the communication cost of our method is low. Ex- tensive experiments on ANN search and image retrieval tasks validate that the proposed DCQ significantly improves Composite Quantization in both efficiency and scale, while still maintaining competitive accuracy

    Randomized visual phrases for object search

    No full text
    Accurate matching of local features plays an essential role in visual object search. Instead of matching individual features separately, using the spatial context, e.g., bundling a group of co-located features into a visual phrase, has shown to enable more discriminative matching. Despite previous work, it remains a challenging problem to extract appropriate spatial context for matching. We propose a randomized approach to deriving visual phrase, in the form of spatial random partition. By averaging the matching scores over multiple randomized visual phrases, our approach offers three benefits: 1) the aggregation of the matching scores over a collection of visual phrases of varying sizes and shapes provides robust local matching; 2) object localization is achieved by simple thresholding on the voting map, which is more efficient than subimage search; 3) our algorithm lends itself to easy parallelization and also allows a flexible trade-off between accuracy and speed by adjusting the number of partition times. Both theoretical studies and experimental comparisons with the state-of-the-art methods validate the advantages of our approach.Accepted versio

    Tensorized Projection for High-Dimensional Binary Embedding

    No full text
    Embedding high-dimensional visual features (d-dimensional) to binary codes (b-dimensional) has shown advantages in various vision tasks such as object recognition and image retrieval. Meanwhile, recent works have demonstrated that to fully utilize the representation power of high-dimensional features, it is critical to encode them into long binary codes rather than short ones, i.e., b ~ O(d). However, generating long binary codes involves large projection matrix and high-dimensional matrix-vector multiplication, thus is memory and computationally intensive. To tackle these problems, we propose Tensorized Projection (TP) to decompose the projection matrix using Tensor-Train (TT) format, which is a chain-like representation that allows to operate tensor in an efficient manner. As a result, TP can drastically reduce the computational complexity and memory cost. Moreover, by using the TT-format, TP can regulate the projection matrix against the risk of over-fitting, consequently, lead to better performance than using either dense projection matrix (like ITQ) or sparse projection matrix. Experimental comparisons with state-of-the-art methods over various visual tasks demonstrate both the efficiency and performance ad- vantages of our proposed TP, especially when generating high dimensional binary codes, e.g., when b ≥ d
    corecore