155,926 research outputs found
Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning
Deep neural network models have achieved remarkable progress in 3D scene
understanding while trained in the closed-set setting and with full labels.
However, the major bottleneck for current 3D recognition approaches is that
they do not have the capacity to recognize any unseen novel classes beyond the
training categories in diverse kinds of real-world applications. In the
meantime, current state-of-the-art 3D scene understanding approaches primarily
require high-quality labels to train neural networks, which merely perform well
in a fully supervised manner. This work presents a generalized and simple
framework for dealing with 3D scene understanding when the labeled scenes are
quite limited. To extract knowledge for novel categories from the pre-trained
vision-language models, we propose a hierarchical feature-aligned pre-training
and knowledge distillation strategy to extract and distill meaningful
information from large-scale vision-language models, which helps benefit the
open-vocabulary scene understanding tasks. To leverage the boundary
information, we propose a novel energy-based loss with boundary awareness
benefiting from the region-level boundary predictions. To encourage latent
instance discrimination and to guarantee efficiency, we propose the
unsupervised region-level semantic contrastive learning scheme for point
clouds, using confident predictions of the neural network to discriminate the
intermediate feature embeddings at multiple stages. Extensive experiments with
both indoor and outdoor scenes demonstrated the effectiveness of our approach
in both data-efficient learning and open-world few-shot learning. All codes,
models, and data are made publicly available at:
https://drive.google.com/drive/folders/1M58V-PtR8DBEwD296zJkNg_m2qq-MTAP?usp=sharing.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence,
Manuscript Info: 22 Pages, 16 Figures, and 8 Table
Local feature selection for multiple instance learning with applications.
Feature selection is a data processing approach that has been successfully and effectively used in developing machine learning algorithms for various applications. It has been proven to effectively reduce the dimensionality of the data and increase the accuracy and interpretability of machine learning algorithms. Conventional feature selection algorithms assume that there is an optimal global subset of features for the whole sample space. Thus, only one global subset of relevant features is learned. An alternative approach is based on the concept of Local Feature Selection (LFS), where each training sample can have its own subset of relevant features. Multiple Instance Learning (MIL) is a variation of traditional supervised learning, also known as single instance learning. In MIL, each object is represented by a set of instances, or a bag. While bags are labeled, the labels of their instances are unknown. The ambiguity of the instance labels makes the feature selection for MIL challenging. Although feature selection in traditional supervised learning has been researched extensively, there are only a few methods for the MIL framework. Moreover, localized feature selection for MIL has not been researched. This dissertation focuses on developing a local feature selection method for the MIL framework. Our algorithm, called Multiple Instance Local Salient Feature Selection (MI-LSFS), searches the feature space to find the relevant features within each bag. We also propose a new multiple instance classification algorithm, called MILES-LFS, that integrates information learned by MI-LSFS during the feature selection process to identify a reduced subset of representative bags and instances. We show that using a more focused subset of prototypes can improve the performance while significantly reducing the computational complexity. Other applications of the proposed MI-LSFS include a new method that uses our MI-LSFS algorithm to explore and investigate the features learned by a Convolutional Neural Network (CNN) model; a visualization method for CNN models, called Gradient-weighted Sample Activation Map (Grad-SAM), that uses the locally learned features of each sample to highlight their relevant and salient parts, and a novel explanation method, called Classifier Explanation by Local Feature Selection (CE-LFS), to explain the decisions of trained models. The proposed MI-LSFS and its applications are validated using several synthetic and real data sets. We report and compare quantitative measures such as Rand Index, Area Under Curve (AUC), and accuracy. We also provide qualitative measures by visualizing and interpreting the selected features and their effects
Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization
Artificial autonomous agents and robots interacting in complex environments
are required to continually acquire and fine-tune knowledge over sustained
periods of time. The ability to learn from continuous streams of information is
referred to as lifelong learning and represents a long-standing challenge for
neural network models due to catastrophic forgetting. Computational models of
lifelong learning typically alleviate catastrophic forgetting in experimental
scenarios with given datasets of static images and limited complexity, thereby
differing significantly from the conditions artificial agents are exposed to.
In more natural settings, sequential information may become progressively
available over time and access to previous experience may be restricted. In
this paper, we propose a dual-memory self-organizing architecture for lifelong
learning scenarios. The architecture comprises two growing recurrent networks
with the complementary tasks of learning object instances (episodic memory) and
categories (semantic memory). Both growing networks can expand in response to
novel sensory experience: the episodic memory learns fine-grained
spatiotemporal representations of object instances in an unsupervised fashion
while the semantic memory uses task-relevant signals to regulate structural
plasticity levels and develop more compact representations from episodic
experience. For the consolidation of knowledge in the absence of external
sensory input, the episodic memory periodically replays trajectories of neural
reactivations. We evaluate the proposed model on the CORe50 benchmark dataset
for continuous object recognition, showing that we significantly outperform
current methods of lifelong learning in three different incremental learning
scenario
- …