22 research outputs found

    Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

    Full text link
    In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task. Since then, the continual learning (CL) community has proposed several solutions aiming to equip the neural network with the ability to learn the current task (plasticity) while still achieving high accuracy on the previous tasks (stability). Despite remarkable improvements, the plasticity-stability trade-off is still far from being solved and its underlying mechanism is poorly understood. In this work, we propose Auxiliary Network Continual Learning (ANCL), a novel method that applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability. More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability, surpassing strong baselines on task incremental and class incremental scenarios. Through extensive analyses on ANCL solutions, we identify some essential principles beneath the stability-plasticity trade-off.Comment: CVPR 202

    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

    Full text link
    The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses μ\muP parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of 1/depth1/\sqrt{\text{depth}} in combination with the μ\muP parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit

    Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

    Full text link
    Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context at any point across the generation process. By doing so, our approach not only addresses performance concerns but also enhances interpretability, providing valuable insight into the model's decision-making process. Our technique can be applied to existing pre-trained models through a straightforward fine-tuning process, and the pruning strength can be specified by a sparsity parameter. Notably, our empirical findings demonstrate that we can effectively prune up to 80\% of the context without significant performance degradation on downstream tasks, offering a valuable tool for mitigating inference costs. Our reference implementation achieves up to 2×2\times increase in inference throughput and even greater memory savings

    How Tempering Fixes Data Augmentation in Bayesian Neural Networks

    Full text link
    While Bayesian neural networks (BNNs) provide a sound and principled alternative to standard neural networks, an artificial sharpening of the posterior usually needs to be applied to reach comparable performance. This is in stark contrast to theory, dictating that given an adequate prior and a well-specified model, the untempered Bayesian posterior should achieve optimal performance. Despite the community's extensive efforts, the observed gains in performance still remain disputed with several plausible causes pointing at its origin. While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing. In this work we identify two interlaced factors concurrently influencing the strength of the cold posterior effect, namely the correlated nature of augmentations and the degree of invariance of the employed model to such transformations. By theoretically analyzing simplified settings, we prove that tempering implicitly reduces the misspecification arising from modeling augmentations as i.i.d. data. The temperature mimics the role of the effective sample size, reflecting the gain in information provided by the augmentations. We corroborate our theoretical findings with extensive empirical evaluations, scaling to realistic BNNs. By relying on the framework of group convolutions, we experiment with models of varying inherent degree of invariance, confirming its hypothesized relationship with the optimal temperature

    Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

    Full text link
    Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers - the distinctive architectural component of Transformers - can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training is still largely unanswered, and its investigation is necessary for a more comprehensive understanding of this architecture. In this work, we shed new light on the causes and the effects of this phenomenon. First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization. Furthermore, we provide a thorough description of the origin of rank collapse and discuss how to prevent it via an appropriate depth-dependent scaling of the residual branches. Finally, our analysis unveils that specific architectural hyperparameters affect the gradients of queries and values differently, leading to disproportionate gradient norms. This suggests an explanation for the widespread use of adaptive methods for Transformers' optimization

    Over-Expression of the LH Receptor Increases Distant Metastases in an Endometrial Cancer Mouse Model

    Get PDF
    Objective: The aim of the present study was to define the role of luteinizing hormone receptor (LH-R) expression in endometrial cancer (EC), using preclinical mouse models, to further transfer these data to the clinical setting. Materials and Methods: The role of LH-R over-expression was studied using EC cells (Hec1A, e.g., cells with low endogenous LH-R expression) transfected with the LH-R (Hec1A-LH-R). In vitro cell proliferation was measured through the WST-1 assay, whereas cell invasion was measured trough the matrigel assay. The effects of LH-R over-expression in vivo were analyzed in an appropriately developed preclinical mouse model of EC, which mimicked postmenopausal conditions. The model consisted in an orthotopic xenograft of Hec1A cells into immunodeficient mice treated daily with recombinant LH, to assure high levels of LH. Results: In vitro data indicated that LH-R over-expression increased Hec1A invasiveness. In vivo results showed that tumors arising from Hec1A-LH-R cells injection displayed a higher local invasion and a higher number of distant metastases, mainly in the lung, compared to tumors obtained from the injection of Hec1A cells. LH withdrawal strongly inhibited local and distant metastatic spread of tumors, especially those arising from Hec1A-LH-R cells. Conclusion: The over-expression of the LH-R increases the ability of EC cells to undergo local invasion and metastatic spread. This occurs in the presence of high LH serum concentrations

    Guía de práctica clínica para el cuidado de personas con úlceras neoplásicas

    No full text
    Material adicional: La Guía de práctica clínica se completa con la "Guía rápida de consulta para el cuidado de personas con úlceras neoplásicas" y la guía para pacientes y cuidadores "Aprendiendo a conocer y mejorar sus cuidados. Versión para personas que padecen úlceras neoplásicas"YesEsta Guía de Práctica Clínica para el cuidado de personas con úlceras neoplásicas es la quinta de una serie que edita el Servicio Andaluz de Salud con el propósito de aunar criterios, homogeneizar la atención y ofrecer al ciudadano los mejores cuidados (en las anteriores se abordaron las úlceras por presión, úlceras arteriales, epidermolisis ampollosa y quemaduras). Este documento, que es el primero de estas características dedicado a los cuidados de este tipo de úlceras en el conjunto del Sistema Nacional de Salud, trata de ofrecer a la población cuidados excelentes, poniendo a disposición de los profesionales una herramienta para acercarse a este problema de salud. Las úlceras neoplásicas representan un importante problema de salud con graves consecuencias y repercusión en la calidad de vida del paciente (dolor, sangrado, mal olor, autoestima y aislamiento social). Generalmente se producen por tumores muy avanzados, recidivados o metastásicos en los que el grado de infiltración presiona la piel produciendo la ruptura de la integridad cutánea. Se sabe que cerca del 5% de los cánceres presentan afectación cutánea, sin embargo no se conoce cuál es la proporción que llega a desarrollar úlceras neoplásicas. El manual ha sido elaborado por profesionales de enfermería pertenecientes al Hospital Universitario Reina Sofía y al Complejo Hospitalario Torrecárdenas, con amplia experiencia clínica, docente e investigadora en el cuidado de personas con úlceras neoplásicas y en la elaboración de protocolos o documentos relacionados con las mismas. Los principales objetivos de esta guía, recogidos en el Plan Integral de Oncología de Andalucía, son mejorar la salud y la calidad en los cuidados que precisan estos pacientes, así como reducir la variabilidad y la incertidumbre en la práctica clínica para el abordaje de las úlceras neoplásicas. En su elaboración se han introducido aspectos metodológicos vanguardistas, como son la clasificación de evidencias GRADE y taxonomías enfermeras, además de servirse de la herramienta AGREE como instrumento de evaluación de su calidad metodológica
    corecore