16,254 research outputs found
On information captured by neural networks: connections with memorization and generalization
Despite the popularity and success of deep learning, there is limited
understanding of when, how, and why neural networks generalize to unseen
examples. Since learning can be seen as extracting information from data, we
formally study information captured by neural networks during training.
Specifically, we start with viewing learning in presence of noisy labels from
an information-theoretic perspective and derive a learning algorithm that
limits label noise information in weights. We then define a notion of unique
information that an individual sample provides to the training of a deep
network, shedding some light on the behavior of neural networks on examples
that are atypical, ambiguous, or belong to underrepresented subpopulations. We
relate example informativeness to generalization by deriving nonvacuous
generalization gap bounds. Finally, by studying knowledge distillation, we
highlight the important role of data and label complexity in generalization.
Overall, our findings contribute to a deeper understanding of the mechanisms
underlying neural network generalization.Comment: PhD thesi
Convergence of Dynamics on Inductive Systems of Banach Spaces
Many features of physical systems, both qualitative and quantitative, become
sharply defined or tractable only in some limiting situation. Examples are
phase transitions in the thermodynamic limit, the emergence of classical
mechanics from quantum theory at large action, and continuum quantum field
theory arising from renormalization group fixed points. It would seem that few
methods can be useful in such diverse applications. However, we here present a
flexible modeling tool for the limit of theories: soft inductive limits
constituting a generalization of inductive limits of Banach spaces. In this
context, general criteria for the convergence of dynamics will be formulated,
and these criteria will be shown to apply in the situations mentioned and more.Comment: Comments welcom
Implicit Loss of Surjectivity and Facial Reduction: Theory and Applications
Facial reduction, pioneered by Borwein and Wolkowicz, is a preprocessing method that is commonly used to obtain strict feasibility in the reformulated, reduced constraint system.
The importance of strict feasibility is often addressed in the context of the convergence results for interior point methods.
Beyond the theoretical properties that the facial reduction conveys, we show that facial reduction, not only limited to interior point methods, leads to strong numerical performances in different classes of algorithms.
In this thesis we study various consequences and the broad applicability of facial reduction.
The thesis is organized in two parts.
In the first part, we show the instabilities accompanied by the absence
of strict feasibility through the lens of facially reduced systems.
In particular, we exploit the implicit redundancies, revealed by each nontrivial facial reduction step, resulting in the implicit loss of surjectivity.
This leads to the two-step facial reduction and two novel related notions of singularity.
For the area of semidefinite programming, we use these singularities to strengthen a known bound on the solution rank, the Barvinok-Pataki bound.
For the area of linear programming, we reveal degeneracies caused by the implicit redundancies.
Furthermore, we propose a preprocessing tool that uses the simplex method.
In the second part of this thesis, we continue with the semidefinite programs that do not have strictly feasible points.
We focus on the doubly-nonnegative relaxation of the binary quadratic program and a semidefinite program with a nonlinear objective function.
We closely work with two classes of algorithms, the splitting method and the Gauss-Newton interior point method.
We elaborate on the advantages in building models from facial reduction. Moreover, we develop algorithms for real-world problems including the quadratic assignment problem, the protein side-chain positioning problem, and the key rate computation for quantum key distribution.
Facial reduction continues to play an important role for
providing robust reformulated models in both the theoretical and the practical aspects, resulting in successful numerical performances
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
The CHiME challenges have played a significant role in the development and
evaluation of robust automatic speech recognition (ASR) systems. We introduce
the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task
comprises joint ASR and diarization in far-field settings with multiple, and
possibly heterogeneous, recording devices. Different from previous challenges,
we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The
goal is for participants to devise a single system that can generalize across
different array geometries and use cases with no a-priori information. Another
departure from earlier CHiME iterations is that participants are allowed to use
open-source pre-trained models and datasets. In this paper, we describe the
challenge design, motivation, and fundamental research questions in detail. We
also present the baseline system, which is fully array-topology agnostic and
features multi-channel diarization, channel selection, guided source separation
and a robust ASR model that leverages self-supervised speech representations
(SSLR)
Lifting Elementary Abelian Covers of Curves
Given a Galois cover of curves over a field of characteristic , the
lifting problem asks whether there exists a Galois cover over a complete mixed
characteristic discrete valuation ring whose reduction is . In this paper,
we consider the case where the Galois groups are elementary abelian -groups.
We prove a combinatorial criterion for lifting an elementary abelian -cover,
dependent on the branch loci of lifts of its -cyclic subcovers. We also
study how branch points of a lift coalesce on the special fiber. Finally, we
analyze lifts for several families of -covers of various
conductor types, both with equidistant branch locus geometry and
non-equidistant branch locus geometry, including the first known lifts for
elementary abelian covers with non-equidistant geometry beyond
-covers.Comment: 19 pages, 5 figure
Accelerated Benders Decomposition for Variable-Height Transport Packaging Optimisation
This paper tackles the problem of finding optimal variable-height transport
packaging. The goal is to reduce the empty space left in a box when shipping
goods to customers, thereby saving on filler and reducing waste. We cast this
problem as a large-scale mixed integer problem (with over seven billion
variables) and demonstrate various acceleration techniques to solve it
efficiently in about three hours on a laptop. We present a KD-Tree algorithm to
avoid exhaustive grid evaluation of the 3D-bin-packing, provide analytical
transformations to accelerate the Benders decomposition, and an efficient
implementation of the Benders sub problem for significant memory savings and a
three order of magnitude runtime speedup
Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks
Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations.
Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes
Locally-symplectic neural networks for learning volume-preserving dynamics
We propose locally-symplectic neural networks LocSympNets for learning the
flow of phase volume-preserving dynamics. The construction of LocSympNets stems
from the theorem of the local Hamiltonian description of the divergence-free
vector field and the splitting methods based on symplectic integrators.
Symplectic gradient modules of the recently proposed symplecticity-preserving
neural networks SympNets are used to construct invertible locally-symplectic
modules. To further preserve properties of the flow of a dynamical system
LocSympNets are extended to symmetric locally-symplectic neural networks
SymLocSympNets, such that the inverse of SymLocSympNets is equal to the
feed-forward propagation of SymLocSympNets with the negative time step, which
is a general property of the flow of a dynamical system. LocSympNets and
SymLocSympNets are studied numerically considering learning linear and
nonlinear volume-preserving dynamics. We demonstrate learning of linear
traveling wave solutions to the semi-discretized advection equation, periodic
trajectories of the Euler equations of the motion of a free rigid body, and
quasi-periodic solutions of the charged particle motion in an electromagnetic
field. LocSympNets and SymLocSympNets can learn linear and nonlinear dynamics
to a high degree of accuracy even when random noise is added to the training
data. When learning a single trajectory of the rigid body dynamics
locally-symplectic neural networks can learn both quadratic invariants of the
system with absolute relative errors below 1%. In addition, SymLocSympNets
produce qualitatively good long-time predictions, when the learning of the
whole system from randomly sampled data is considered. LocSympNets and
SymLocSympNets can produce accurate short-time predictions of quasi-periodic
solutions, which is illustrated in the example of the charged particle motion
in an electromagnetic field
Large deviations for the interchange process on the interval and incompressible flows
We use the framework of permuton processes to show that large deviations of
the interchange process are controlled by the Dirichlet energy. This
establishes a rigorous connection between processes of permutations and
one-dimensional incompressible Euler equations. While our large deviation upper
bound is valid in general, the lower bound applies to processes corresponding
to incompressible flows, studied in this context by Brenier. These results
imply the Archimedean limit for relaxed sorting networks and allow us to
asymptotically count such networks.Comment: 68 pages, journal versio
Phase-specific signatures of wound fibroblasts and matrix patterns define cancer-associated fibroblast subtypes
Healing wounds and cancers present remarkable cellular and molecular parallels, but the specific roles of the healing phases are largely unknown. We developed a bioinformatics pipeline to identify genes and pathways that define distinct phases across the time-course of healing. Their comparison to cancer transcriptomes revealed that a resolution phase wound signature is associated with increased severity in skin cancer and enriches for extracellular matrix-related pathways. Comparisons of transcriptomes of early- and late-phase wound fibroblasts vs skin cancer-associated fibroblasts (CAFs) identified an "early wound" CAF subtype, which localizes to the inner tumor stroma and expresses collagen-related genes that are controlled by the RUNX2 transcription factor. A "late wound" CAF subtype localizes to the outer tumor stroma and expresses elastin-related genes. Matrix imaging of primary melanoma tissue microarrays validated these matrix signatures and identified collagen- vs elastin-rich niches within the tumor microenvironment, whose spatial organization predicts survival and recurrence. These results identify wound-regulated genes and matrix patterns with prognostic potential in skin cancer
- …