16,254 research outputs found

    On information captured by neural networks: connections with memorization and generalization

    Full text link
    Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.Comment: PhD thesi

    Convergence of Dynamics on Inductive Systems of Banach Spaces

    Full text link
    Many features of physical systems, both qualitative and quantitative, become sharply defined or tractable only in some limiting situation. Examples are phase transitions in the thermodynamic limit, the emergence of classical mechanics from quantum theory at large action, and continuum quantum field theory arising from renormalization group fixed points. It would seem that few methods can be useful in such diverse applications. However, we here present a flexible modeling tool for the limit of theories: soft inductive limits constituting a generalization of inductive limits of Banach spaces. In this context, general criteria for the convergence of dynamics will be formulated, and these criteria will be shown to apply in the situations mentioned and more.Comment: Comments welcom

    Implicit Loss of Surjectivity and Facial Reduction: Theory and Applications

    Get PDF
    Facial reduction, pioneered by Borwein and Wolkowicz, is a preprocessing method that is commonly used to obtain strict feasibility in the reformulated, reduced constraint system. The importance of strict feasibility is often addressed in the context of the convergence results for interior point methods. Beyond the theoretical properties that the facial reduction conveys, we show that facial reduction, not only limited to interior point methods, leads to strong numerical performances in different classes of algorithms. In this thesis we study various consequences and the broad applicability of facial reduction. The thesis is organized in two parts. In the first part, we show the instabilities accompanied by the absence of strict feasibility through the lens of facially reduced systems. In particular, we exploit the implicit redundancies, revealed by each nontrivial facial reduction step, resulting in the implicit loss of surjectivity. This leads to the two-step facial reduction and two novel related notions of singularity. For the area of semidefinite programming, we use these singularities to strengthen a known bound on the solution rank, the Barvinok-Pataki bound. For the area of linear programming, we reveal degeneracies caused by the implicit redundancies. Furthermore, we propose a preprocessing tool that uses the simplex method. In the second part of this thesis, we continue with the semidefinite programs that do not have strictly feasible points. We focus on the doubly-nonnegative relaxation of the binary quadratic program and a semidefinite program with a nonlinear objective function. We closely work with two classes of algorithms, the splitting method and the Gauss-Newton interior point method. We elaborate on the advantages in building models from facial reduction. Moreover, we develop algorithms for real-world problems including the quadratic assignment problem, the protein side-chain positioning problem, and the key rate computation for quantum key distribution. Facial reduction continues to play an important role for providing robust reformulated models in both the theoretical and the practical aspects, resulting in successful numerical performances

    The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

    Full text link
    The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The goal is for participants to devise a single system that can generalize across different array geometries and use cases with no a-priori information. Another departure from earlier CHiME iterations is that participants are allowed to use open-source pre-trained models and datasets. In this paper, we describe the challenge design, motivation, and fundamental research questions in detail. We also present the baseline system, which is fully array-topology agnostic and features multi-channel diarization, channel selection, guided source separation and a robust ASR model that leverages self-supervised speech representations (SSLR)

    Lifting Elementary Abelian Covers of Curves

    Full text link
    Given a Galois cover of curves ff over a field of characteristic pp, the lifting problem asks whether there exists a Galois cover over a complete mixed characteristic discrete valuation ring whose reduction is ff. In this paper, we consider the case where the Galois groups are elementary abelian pp-groups. We prove a combinatorial criterion for lifting an elementary abelian pp-cover, dependent on the branch loci of lifts of its pp-cyclic subcovers. We also study how branch points of a lift coalesce on the special fiber. Finally, we analyze lifts for several families of (Z/2)3(\mathbb{Z}/2)^3-covers of various conductor types, both with equidistant branch locus geometry and non-equidistant branch locus geometry, including the first known lifts for elementary abelian covers with non-equidistant geometry beyond (Z/p)2(\mathbb{Z}/p)^2-covers.Comment: 19 pages, 5 figure

    Accelerated Benders Decomposition for Variable-Height Transport Packaging Optimisation

    Full text link
    This paper tackles the problem of finding optimal variable-height transport packaging. The goal is to reduce the empty space left in a box when shipping goods to customers, thereby saving on filler and reducing waste. We cast this problem as a large-scale mixed integer problem (with over seven billion variables) and demonstrate various acceleration techniques to solve it efficiently in about three hours on a laptop. We present a KD-Tree algorithm to avoid exhaustive grid evaluation of the 3D-bin-packing, provide analytical transformations to accelerate the Benders decomposition, and an efficient implementation of the Benders sub problem for significant memory savings and a three order of magnitude runtime speedup

    Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks

    Get PDF
    Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations. Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes

    Locally-symplectic neural networks for learning volume-preserving dynamics

    Full text link
    We propose locally-symplectic neural networks LocSympNets for learning the flow of phase volume-preserving dynamics. The construction of LocSympNets stems from the theorem of the local Hamiltonian description of the divergence-free vector field and the splitting methods based on symplectic integrators. Symplectic gradient modules of the recently proposed symplecticity-preserving neural networks SympNets are used to construct invertible locally-symplectic modules. To further preserve properties of the flow of a dynamical system LocSympNets are extended to symmetric locally-symplectic neural networks SymLocSympNets, such that the inverse of SymLocSympNets is equal to the feed-forward propagation of SymLocSympNets with the negative time step, which is a general property of the flow of a dynamical system. LocSympNets and SymLocSympNets are studied numerically considering learning linear and nonlinear volume-preserving dynamics. We demonstrate learning of linear traveling wave solutions to the semi-discretized advection equation, periodic trajectories of the Euler equations of the motion of a free rigid body, and quasi-periodic solutions of the charged particle motion in an electromagnetic field. LocSympNets and SymLocSympNets can learn linear and nonlinear dynamics to a high degree of accuracy even when random noise is added to the training data. When learning a single trajectory of the rigid body dynamics locally-symplectic neural networks can learn both quadratic invariants of the system with absolute relative errors below 1%. In addition, SymLocSympNets produce qualitatively good long-time predictions, when the learning of the whole system from randomly sampled data is considered. LocSympNets and SymLocSympNets can produce accurate short-time predictions of quasi-periodic solutions, which is illustrated in the example of the charged particle motion in an electromagnetic field

    Large deviations for the interchange process on the interval and incompressible flows

    Full text link
    We use the framework of permuton processes to show that large deviations of the interchange process are controlled by the Dirichlet energy. This establishes a rigorous connection between processes of permutations and one-dimensional incompressible Euler equations. While our large deviation upper bound is valid in general, the lower bound applies to processes corresponding to incompressible flows, studied in this context by Brenier. These results imply the Archimedean limit for relaxed sorting networks and allow us to asymptotically count such networks.Comment: 68 pages, journal versio

    Phase-specific signatures of wound fibroblasts and matrix patterns define cancer-associated fibroblast subtypes

    Full text link
    Healing wounds and cancers present remarkable cellular and molecular parallels, but the specific roles of the healing phases are largely unknown. We developed a bioinformatics pipeline to identify genes and pathways that define distinct phases across the time-course of healing. Their comparison to cancer transcriptomes revealed that a resolution phase wound signature is associated with increased severity in skin cancer and enriches for extracellular matrix-related pathways. Comparisons of transcriptomes of early- and late-phase wound fibroblasts vs skin cancer-associated fibroblasts (CAFs) identified an "early wound" CAF subtype, which localizes to the inner tumor stroma and expresses collagen-related genes that are controlled by the RUNX2 transcription factor. A "late wound" CAF subtype localizes to the outer tumor stroma and expresses elastin-related genes. Matrix imaging of primary melanoma tissue microarrays validated these matrix signatures and identified collagen- vs elastin-rich niches within the tumor microenvironment, whose spatial organization predicts survival and recurrence. These results identify wound-regulated genes and matrix patterns with prognostic potential in skin cancer
    corecore