904 research outputs found
A Survey on Continual Semantic Segmentation: Theory, Challenge, Method and Application
Continual learning, also known as incremental learning or life-long learning,
stands at the forefront of deep learning and AI systems. It breaks through the
obstacle of one-way training on close sets and enables continuous adaptive
learning on open-set conditions. In the recent decade, continual learning has
been explored and applied in multiple fields especially in computer vision
covering classification, detection and segmentation tasks. Continual semantic
segmentation (CSS), of which the dense prediction peculiarity makes it a
challenging, intricate and burgeoning task. In this paper, we present a review
of CSS, committing to building a comprehensive survey on problem formulations,
primary challenges, universal datasets, neoteric theories and multifarious
applications. Concretely, we begin by elucidating the problem definitions and
primary challenges. Based on an in-depth investigation of relevant approaches,
we sort out and categorize current CSS models into two main branches including
\textit{data-replay} and \textit{data-free} sets. In each branch, the
corresponding approaches are similarity-based clustered and thoroughly
analyzed, following qualitative comparison and quantitative reproductions on
relevant datasets. Besides, we also introduce four CSS specialities with
diverse application scenarios and development tendencies. Furthermore, we
develop a benchmark for CSS encompassing representative references, evaluation
results and reproductions, which is available
at~\url{https://github.com/YBIO/SurveyCSS}. We hope this survey can serve as a
reference-worthy and stimulating contribution to the advancement of the
life-long learning field, while also providing valuable perspectives for
related fields.Comment: 20 pages, 12 figures. Undergoing Revie
CAFE Learning to Condense Dataset by Aligning Features
Dataset condensation aims at reducing the network training effort through
condensing a cumbersome training set into a compact synthetic one.
State-of-the-art approaches largely rely on learning the synthetic data by
matching the gradients between the real and synthetic data batches. Despite the
intuitive motivation and promising results, such gradient-based methods, by
nature, easily overfit to a biased set of samples that produce dominant
gradients, and thus lack global supervision of data distribution. In this
paper, we propose a novel scheme to Condense dataset by Aligning FEatures
(CAFE), which explicitly attempts to preserve the real-feature distribution as
well as the discriminant power of the resulting synthetic set, lending itself
to strong generalization capability to various architectures. At the heart of
our approach is an effective strategy to align features from the real and
synthetic data across various scales, while accounting for the classification
of real samples. Our scheme is further backed up by a novel dynamic bi-level
optimization, which adaptively adjusts parameter updates to prevent
over-/under-fitting. We validate the proposed CAFE across various datasets, and
demonstrate that it generally outperforms the state of the art: on the SVHN
dataset, for example, the performance gain is up to 11%. Extensive experiments
and analyses verify the effectiveness and necessity of proposed designs.Comment: The manuscript has been accepted by CVPR-2022
On-line learning with minimal degradation in feedforward networks
Dealing with non-stationary processes requires quick adaptation while at the same time avoiding catastrophic forgetting. A neural learning technique that satisfies these requirements, without sacrifying the benefits of distributed representations, is presented. It relies on a formalization of the problem as the minimization of the error over the previously learned input-output (i-o) patterns, subject to the constraint of perfect encoding of the new pattern. Then this constrained optimization problem is transformed into an unconstrained one with hidden-unit activations as variables. This new formulation naturally leads to an algorithm for solving the problem, which we call Learning with Minimal Degradation (LMD). Some experimental comparisons of the performance of LMD with back-propagation are provided which, besides showing the advantages of using LMD, reveal the dependence of forgetting on the learning rate in back-propagation. We also explain why overtraining affects forgetting and fault-tolerance, which are seen as related problems.Peer Reviewe
THE PSYCHOMETRIC PROPERTIES OF A SOCIAL-EMOTIONAL LEARNING MEASURE
Each year many students take college admissions exams (i.e., SAT® and ACT®), hoping to demonstrate their ability to perform at a collegiate level and gain admission to desired universities. However, a growing movement encourages colleges and universities to abandon this practice in their admissions protocol and instead consider alternative factors, such as, social-emotional learning skills, to identify promising applicants. As such, this study examined the psychometric properties of a novel social-emotional learning measure, ACT® Tessera®, which conceptualizes social-emotional traits through the Five-Factor Model lens using different measurement methods (Self Report Likert, Situational Judgement Tests, Forced Choice). Using data obtained from an undergraduate student sample at a metropolitan university, reliability and validity analyses revealed promising evidence for the scale\u27s ability to measure social-emotional skills. However, recommendations for future scale iterations are made to improve the scales\u27 psychometric properties. Then, ACT® Tessera® social-emotional trait measures were assessed alongside traditional college achievement predictors (intelligence, cognitive ability, standardized test scores) to determine their ability to predict undergraduate success. Preliminary evidence provided by this study suggests that considering social-emotional traits in conjunction with high school GPA may provide useful predictions of university success, without standardized test scores. Suggestions for future research and implications for school psychologists are discussed
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
Aligned large language models (LLMs) demonstrate exceptional capabilities in
task-solving, following instructions, and ensuring safety. However, the
continual learning aspect of these aligned LLMs has been largely overlooked.
Existing continual learning benchmarks lack sufficient challenge for leading
aligned LLMs, owing to both their simplicity and the models' potential exposure
during instruction tuning. In this paper, we introduce TRACE, a novel benchmark
designed to evaluate continual learning in LLMs. TRACE consists of 8 distinct
datasets spanning challenging tasks including domain-specific tasks,
multilingual capabilities, code generation, and mathematical reasoning. All
datasets are standardized into a unified format, allowing for effortless
automatic evaluation of LLMs. Our experiments show that after training on
TRACE, aligned LLMs exhibit significant declines in both general ability and
instruction-following capabilities. For example, the accuracy of llama2-chat
13B on gsm8k dataset declined precipitously from 28.8\% to 2\% after training
on our datasets. This highlights the challenge of finding a suitable tradeoff
between achieving performance on specific tasks while preserving the original
prowess of LLMs. Empirical findings suggest that tasks inherently equipped with
reasoning paths contribute significantly to preserving certain capabilities of
LLMs against potential declines. Motivated by this, we introduce the
Reasoning-augmented Continual Learning (RCL) approach. RCL integrates
task-specific cues with meta-rationales, effectively reducing catastrophic
forgetting in LLMs while expediting convergence on novel tasks
- …