22 research outputs found
Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
In contrast to the natural capabilities of humans to learn new tasks in a
sequential fashion, neural networks are known to suffer from catastrophic
forgetting, where the model's performances on old tasks drop dramatically after
being optimized for a new task. Since then, the continual learning (CL)
community has proposed several solutions aiming to equip the neural network
with the ability to learn the current task (plasticity) while still achieving
high accuracy on the previous tasks (stability). Despite remarkable
improvements, the plasticity-stability trade-off is still far from being solved
and its underlying mechanism is poorly understood. In this work, we propose
Auxiliary Network Continual Learning (ANCL), a novel method that applies an
additional auxiliary network which promotes plasticity to the continually
learned model which mainly focuses on stability. More concretely, the proposed
framework materializes in a regularizer that naturally interpolates between
plasticity and stability, surpassing strong baselines on task incremental and
class incremental scenarios. Through extensive analyses on ANCL solutions, we
identify some essential principles beneath the stability-plasticity trade-off.Comment: CVPR 202
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
The cost of hyperparameter tuning in deep learning has been rising with model
sizes, prompting practitioners to find new tuning methods using a proxy of
smaller networks. One such proposal uses P parameterized networks, where
the optimal hyperparameters for small width networks transfer to networks with
arbitrarily large width. However, in this scheme, hyperparameters do not
transfer across depths. As a remedy, we study residual networks with a residual
branch scale of in combination with the P
parameterization. We provide experiments demonstrating that residual
architectures including convolutional ResNets and Vision Transformers trained
with this parameterization exhibit transfer of optimal hyperparameters across
width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings
are supported and motivated by theory. Using recent developments in the
dynamical mean field theory (DMFT) description of neural network learning
dynamics, we show that this parameterization of ResNets admits a well-defined
feature learning joint infinite-width and infinite-depth limit and show
convergence of finite-size network dynamics towards this limit
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Autoregressive Transformers adopted in Large Language Models (LLMs) are hard
to scale to long sequences. Despite several works trying to reduce their
computational cost, most of LLMs still adopt attention layers between all pairs
of tokens in the sequence, thus incurring a quadratic cost. In this study, we
present a novel approach that dynamically prunes contextual information while
preserving the model's expressiveness, resulting in reduced memory and
computational requirements during inference. Our method employs a learnable
mechanism that determines which uninformative tokens can be dropped from the
context at any point across the generation process. By doing so, our approach
not only addresses performance concerns but also enhances interpretability,
providing valuable insight into the model's decision-making process. Our
technique can be applied to existing pre-trained models through a
straightforward fine-tuning process, and the pruning strength can be specified
by a sparsity parameter. Notably, our empirical findings demonstrate that we
can effectively prune up to 80\% of the context without significant performance
degradation on downstream tasks, offering a valuable tool for mitigating
inference costs. Our reference implementation achieves up to increase
in inference throughput and even greater memory savings
How Tempering Fixes Data Augmentation in Bayesian Neural Networks
While Bayesian neural networks (BNNs) provide a sound and principled
alternative to standard neural networks, an artificial sharpening of the
posterior usually needs to be applied to reach comparable performance. This is
in stark contrast to theory, dictating that given an adequate prior and a
well-specified model, the untempered Bayesian posterior should achieve optimal
performance. Despite the community's extensive efforts, the observed gains in
performance still remain disputed with several plausible causes pointing at its
origin. While data augmentation has been empirically recognized as one of the
main drivers of this effect, a theoretical account of its role, on the other
hand, is largely missing. In this work we identify two interlaced factors
concurrently influencing the strength of the cold posterior effect, namely the
correlated nature of augmentations and the degree of invariance of the employed
model to such transformations. By theoretically analyzing simplified settings,
we prove that tempering implicitly reduces the misspecification arising from
modeling augmentations as i.i.d. data. The temperature mimics the role of the
effective sample size, reflecting the gain in information provided by the
augmentations. We corroborate our theoretical findings with extensive empirical
evaluations, scaling to realistic BNNs. By relying on the framework of group
convolutions, we experiment with models of varying inherent degree of
invariance, confirming its hypothesized relationship with the optimal
temperature
Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse
Transformers have achieved remarkable success in several domains, ranging
from natural language processing to computer vision. Nevertheless, it has been
recently shown that stacking self-attention layers - the distinctive
architectural component of Transformers - can result in rank collapse of the
tokens' representations at initialization. The question of if and how rank
collapse affects training is still largely unanswered, and its investigation is
necessary for a more comprehensive understanding of this architecture. In this
work, we shed new light on the causes and the effects of this phenomenon.
First, we show that rank collapse of the tokens' representations hinders
training by causing the gradients of the queries and keys to vanish at
initialization. Furthermore, we provide a thorough description of the origin of
rank collapse and discuss how to prevent it via an appropriate depth-dependent
scaling of the residual branches. Finally, our analysis unveils that specific
architectural hyperparameters affect the gradients of queries and values
differently, leading to disproportionate gradient norms. This suggests an
explanation for the widespread use of adaptive methods for Transformers'
optimization
Over-Expression of the LH Receptor Increases Distant Metastases in an Endometrial Cancer Mouse Model
Objective: The aim of the present study was to define the role of luteinizing hormone receptor (LH-R) expression in endometrial cancer (EC), using preclinical mouse models, to further transfer these data to the clinical setting. Materials and Methods: The role of LH-R over-expression was studied using EC cells (Hec1A, e.g., cells with low endogenous LH-R expression) transfected with the LH-R (Hec1A-LH-R). In vitro cell proliferation was measured through the WST-1 assay, whereas cell invasion was measured trough the matrigel assay. The effects of LH-R over-expression in vivo were analyzed in an appropriately developed preclinical mouse model of EC, which mimicked postmenopausal conditions. The model consisted in an orthotopic xenograft of Hec1A cells into immunodeficient mice treated daily with recombinant LH, to assure high levels of LH. Results: In vitro data indicated that LH-R over-expression increased Hec1A invasiveness. In vivo results showed that tumors arising from Hec1A-LH-R cells injection displayed a higher local invasion and a higher number of distant metastases, mainly in the lung, compared to tumors obtained from the injection of Hec1A cells. LH withdrawal strongly inhibited local and distant metastatic spread of tumors, especially those arising from Hec1A-LH-R cells. Conclusion: The over-expression of the LH-R increases the ability of EC cells to undergo local invasion and metastatic spread. This occurs in the presence of high LH serum concentrations
Guía de práctica clínica para el cuidado de personas con úlceras neoplásicas
Material adicional: La Guía de práctica clínica se completa con la "Guía rápida de consulta para el cuidado de personas con úlceras neoplásicas" y la guía para pacientes y cuidadores "Aprendiendo a conocer y mejorar sus cuidados. Versión para personas que padecen úlceras neoplásicas"YesEsta Guía de Práctica Clínica para el cuidado de personas con úlceras neoplásicas es la quinta de una serie que edita el Servicio Andaluz de Salud con el propósito de aunar criterios, homogeneizar la atención y ofrecer al ciudadano los mejores cuidados (en las anteriores se abordaron las úlceras por presión, úlceras arteriales, epidermolisis ampollosa y quemaduras). Este
documento, que es el primero de estas características dedicado a los cuidados de este tipo de úlceras en el conjunto del Sistema Nacional de Salud, trata de ofrecer a la población cuidados excelentes, poniendo a disposición de los profesionales una herramienta para acercarse a este
problema de salud. Las úlceras neoplásicas representan un importante problema de salud con graves consecuencias y repercusión en la calidad de vida del paciente (dolor, sangrado, mal olor, autoestima y aislamiento social). Generalmente se producen por tumores muy avanzados, recidivados o
metastásicos en los que el grado de infiltración presiona la piel produciendo la ruptura de la integridad cutánea. Se sabe que cerca del 5% de los cánceres presentan afectación cutánea, sin embargo no se conoce cuál es la proporción que llega a desarrollar úlceras neoplásicas.
El manual ha sido elaborado por profesionales de enfermería pertenecientes al Hospital Universitario Reina Sofía y al Complejo Hospitalario Torrecárdenas, con amplia experiencia clínica, docente e investigadora en el cuidado de personas con úlceras neoplásicas y en la elaboración
de protocolos o documentos relacionados con las mismas.
Los principales objetivos de esta guía, recogidos en el Plan Integral de Oncología de Andalucía,
son mejorar la salud y la calidad en los cuidados que precisan estos pacientes, así como
reducir la variabilidad y la incertidumbre en la práctica clínica para el abordaje de las úlceras
neoplásicas. En su elaboración se han introducido aspectos metodológicos vanguardistas,
como son la clasificación de evidencias GRADE y taxonomías enfermeras, además de servirse de la herramienta AGREE como instrumento de evaluación de su calidad metodológica