155,926 research outputs found

    Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning

    Full text link
    Deep neural network models have achieved remarkable progress in 3D scene understanding while trained in the closed-set setting and with full labels. However, the major bottleneck for current 3D recognition approaches is that they do not have the capacity to recognize any unseen novel classes beyond the training categories in diverse kinds of real-world applications. In the meantime, current state-of-the-art 3D scene understanding approaches primarily require high-quality labels to train neural networks, which merely perform well in a fully supervised manner. This work presents a generalized and simple framework for dealing with 3D scene understanding when the labeled scenes are quite limited. To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy to extract and distill meaningful information from large-scale vision-language models, which helps benefit the open-vocabulary scene understanding tasks. To leverage the boundary information, we propose a novel energy-based loss with boundary awareness benefiting from the region-level boundary predictions. To encourage latent instance discrimination and to guarantee efficiency, we propose the unsupervised region-level semantic contrastive learning scheme for point clouds, using confident predictions of the neural network to discriminate the intermediate feature embeddings at multiple stages. Extensive experiments with both indoor and outdoor scenes demonstrated the effectiveness of our approach in both data-efficient learning and open-world few-shot learning. All codes, models, and data are made publicly available at: https://drive.google.com/drive/folders/1M58V-PtR8DBEwD296zJkNg_m2qq-MTAP?usp=sharing.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence, Manuscript Info: 22 Pages, 16 Figures, and 8 Table

    Local feature selection for multiple instance learning with applications.

    Get PDF
    Feature selection is a data processing approach that has been successfully and effectively used in developing machine learning algorithms for various applications. It has been proven to effectively reduce the dimensionality of the data and increase the accuracy and interpretability of machine learning algorithms. Conventional feature selection algorithms assume that there is an optimal global subset of features for the whole sample space. Thus, only one global subset of relevant features is learned. An alternative approach is based on the concept of Local Feature Selection (LFS), where each training sample can have its own subset of relevant features. Multiple Instance Learning (MIL) is a variation of traditional supervised learning, also known as single instance learning. In MIL, each object is represented by a set of instances, or a bag. While bags are labeled, the labels of their instances are unknown. The ambiguity of the instance labels makes the feature selection for MIL challenging. Although feature selection in traditional supervised learning has been researched extensively, there are only a few methods for the MIL framework. Moreover, localized feature selection for MIL has not been researched. This dissertation focuses on developing a local feature selection method for the MIL framework. Our algorithm, called Multiple Instance Local Salient Feature Selection (MI-LSFS), searches the feature space to find the relevant features within each bag. We also propose a new multiple instance classification algorithm, called MILES-LFS, that integrates information learned by MI-LSFS during the feature selection process to identify a reduced subset of representative bags and instances. We show that using a more focused subset of prototypes can improve the performance while significantly reducing the computational complexity. Other applications of the proposed MI-LSFS include a new method that uses our MI-LSFS algorithm to explore and investigate the features learned by a Convolutional Neural Network (CNN) model; a visualization method for CNN models, called Gradient-weighted Sample Activation Map (Grad-SAM), that uses the locally learned features of each sample to highlight their relevant and salient parts, and a novel explanation method, called Classifier Explanation by Local Feature Selection (CE-LFS), to explain the decisions of trained models. The proposed MI-LSFS and its applications are validated using several synthetic and real data sets. We report and compare quantitative measures such as Rand Index, Area Under Curve (AUC), and accuracy. We also provide qualitative measures by visualizing and interpreting the selected features and their effects

    Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization

    Get PDF
    Artificial autonomous agents and robots interacting in complex environments are required to continually acquire and fine-tune knowledge over sustained periods of time. The ability to learn from continuous streams of information is referred to as lifelong learning and represents a long-standing challenge for neural network models due to catastrophic forgetting. Computational models of lifelong learning typically alleviate catastrophic forgetting in experimental scenarios with given datasets of static images and limited complexity, thereby differing significantly from the conditions artificial agents are exposed to. In more natural settings, sequential information may become progressively available over time and access to previous experience may be restricted. In this paper, we propose a dual-memory self-organizing architecture for lifelong learning scenarios. The architecture comprises two growing recurrent networks with the complementary tasks of learning object instances (episodic memory) and categories (semantic memory). Both growing networks can expand in response to novel sensory experience: the episodic memory learns fine-grained spatiotemporal representations of object instances in an unsupervised fashion while the semantic memory uses task-relevant signals to regulate structural plasticity levels and develop more compact representations from episodic experience. For the consolidation of knowledge in the absence of external sensory input, the episodic memory periodically replays trajectories of neural reactivations. We evaluate the proposed model on the CORe50 benchmark dataset for continuous object recognition, showing that we significantly outperform current methods of lifelong learning in three different incremental learning scenario
    • …
    corecore