9,138 research outputs found
On information captured by neural networks: connections with memorization and generalization
Despite the popularity and success of deep learning, there is limited
understanding of when, how, and why neural networks generalize to unseen
examples. Since learning can be seen as extracting information from data, we
formally study information captured by neural networks during training.
Specifically, we start with viewing learning in presence of noisy labels from
an information-theoretic perspective and derive a learning algorithm that
limits label noise information in weights. We then define a notion of unique
information that an individual sample provides to the training of a deep
network, shedding some light on the behavior of neural networks on examples
that are atypical, ambiguous, or belong to underrepresented subpopulations. We
relate example informativeness to generalization by deriving nonvacuous
generalization gap bounds. Finally, by studying knowledge distillation, we
highlight the important role of data and label complexity in generalization.
Overall, our findings contribute to a deeper understanding of the mechanisms
underlying neural network generalization.Comment: PhD thesi
Neutron scattering studies of heterogeneous catalysis
Understanding the structural dynamics/evolution of catalysts and the related surface chemistry is essential for establishing structure–catalysis relationships, where spectroscopic and scattering tools play a crucial role. Among many such tools, neutron scattering, though less-known, has a unique power for investigating catalytic phenomena. Since neutrons interact with the nuclei of matter, the neutron–nucleon interaction provides unique information on light elements (mainly hydrogen), neighboring elements, and isotopes, which are complementary to X-ray and photon-based techniques. Neutron vibrational spectroscopy has been the most utilized neutron scattering approach for heterogeneous catalysis research by providing chemical information on surface/bulk species (mostly H-containing) and reaction chemistry. Neutron diffraction and quasielastic neutron scattering can also supply important information on catalyst structures and dynamics of surface species. Other neutron approaches, such as small angle neutron scattering and neutron imaging, have been much less used but still give distinctive catalytic information. This review provides a comprehensive overview of recent advances in neutron scattering investigations of heterogeneous catalysis, focusing on surface adsorbates, reaction mechanisms, and catalyst structural changes revealed by neutron spectroscopy, diffraction, quasielastic neutron scattering, and other neutron techniques. Perspectives are also provided on the challenges and future opportunities in neutron scattering studies of heterogeneous catalysis
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Machine learning approach towards predicting turbulent fluid flow using convolutional neural networks
Using convolutional neural networks, we present a novel method for predicting turbulent fluid flow through an array of obstacles in this thesis. In recent years, machine learning has exploded in popularity due to its ability to create accurate data driven models and the abundance of available data. In an attempt to understand the characteristics of turbulent fluid flow, we utilise a novel convolutional autoencoder neural network to predict the first ten POD modes of turbulent fluid flow. We find
that the model is able to predict the first two POD modes well although and with less accuracy for the remaining eight POD modes. In addition, we find that the
ML-predicted POD modes are accurate enough to be used to reconstruct turbulent flow that adequately captures the large-scale details of the original simulation
Application of multi-scale computational techniques to complex materials systems
The applications of computational materials science are ever-increasing, connecting fields far beyond traditional subfields in materials science. This dissertation demonstrates the broad scope of multi-scale computational techniques by investigating multiple unrelated complex material systems, namely scandate thermionic cathodes and the metallic foam component of micrometeoroid and orbital debris (MMOD) shielding. Sc-containing scandate cathodes have been widely reported to exhibit superior properties compared to previous thermionic cathodes; however, knowledge of their precise operating mechanism remains elusive. Here, quantum mechanical calculations were utilized to map the phase space of stable, highly-faceted and chemically-complex W nanoparticles, accounting for both finite temperature and chemical environment. The precise processing conditions required to form the characteristic W nanoparticle observed experimentally were then distilled. Metallic foams, a central component of MMOD shielding, also represent a highly-complex materials system, albeit at a far higher length scale than W nanoparticles. The non-periodic, randomly-oriented constituent ligaments of metallic foams and similar materials create a significant variability in properties that is generally difficult to model. Rather than homogenizing the material such that its unique characteristic structural features are neglected, here, a stochastic modeling approach is applied that integrates complex geometric structure and utilizes continuum calculations to predict the resulting probabilistic distributions of elastic properties. Though different in many aspects, scandate cathodes and metallic foams are united by complexity that is impractical, even dangerous, to ignore and well-suited to exploration with multi-scale computational methods
A Statistical View of Column Subset Selection
We consider the problem of selecting a small subset of representative
variables from a large dataset. In the computer science literature, this
dimensionality reduction problem is typically formalized as Column Subset
Selection (CSS). Meanwhile, the typical statistical formalization is to find an
information-maximizing set of Principal Variables. This paper shows that these
two approaches are equivalent, and moreover, both can be viewed as maximum
likelihood estimation within a certain semi-parametric model. Using these
connections, we show how to efficiently (1) perform CSS using only summary
statistics from the original dataset; (2) perform CSS in the presence of
missing and/or censored data; and (3) select the subset size for CSS in a
hypothesis testing framework
Recommended from our members
Path properties of KPZ models
In this thesis we investigate large deviation and path properties of a few models within the Kardar-Parisi-Zhang (KPZ) universality class.
The KPZ equation is the central object in the KPZ universality class. It is a stochastic PDE describing various objects in statistical mechanics such as random interface growth, directed polymers, interacting particle systems. In the first project we study one point upper tail large deviations of the KPZ equation (t,x) started from narrow wedge initial data. We obtain precise expression of the upper tail LDP in the long time regime for the KPZ equation. We then extend our techniques and methods to obtain upper tail LDP for the asymmetric exclusion process model, which is a prelimit of the KPZ equation.
In the next direction, we investigate temporal path properties of the KPZ equation. We show that the upper and lower law of iterated logarithms for the rescaled KPZ temporal process occurs at a scale (log log )²/³ and (log log )¹/³ respectively. We also compute the exact Hausdorff dimension of the upper level sets of the solution, i.e., the set of times when the rescaled solution exceeds (log log )²/³. This has relevance from the point of view of fractal geometry of the KPZ equation.
We next study superdiffusivity and localization features of the (1+1)-dimensional continuum directed random polymer whose free energy is given by the KPZ equation. We show that for a point-to-point polymer of length and any ⋲ (0,1), the point on the path which is distance away from the origin stays within a (1) stochastic window around a random point _, that depends on the environment. This provides an affirmative case of the folklore `favorite region' conjecture. Furthermore, the quenched density of the point when centered around _, converges in law to an explicit random density function as → ∞ without any scaling. The limiting random density is proportional to ^{-(x)} where (x) is a two-sided 3D Bessel process with diffusion coefficient 2. Our proof techniques also allow us to prove properties of the KPZ equation such as ergodicity and limiting Bessel behaviors around the maximum. In a follow up project, we show that the annealed law of polymer of length , upon ²/³ superdiffusive scaling, is tight (as → ∞) in the space of ([0,1]) valued random variables. On the other hand, as → 0, under diffusive scaling, we show that the annealed law of the polymer converges to Brownian bridge.
In the final part of this thesis, we focus on an integrable discrete half-space variant of the CDRP, called half-space log-gamma polymer.We consider the point-to-point log-gamma polymer of length 2 in a half-space with i.i.d.Gamma⁻¹(2) distributed bulk weights and i.i.d. Gamma⁻¹(+) distributed boundary weights for > 0 and > -. We establish the KPZ exponents (1/3 fluctuation and 2/3 transversal) for this model when ≥ 0. In particular, in this regime, we show that after appropriate centering, the free energy process with spatial coordinate scaled by ²/³ and fluctuations scaled by ¹/³ is tight.
The primary technical contribution of our work is to construct the half-space log-gamma Gibbsian line ensemble and develop a toolbox for extracting tightness and absolute continuity results from minimal information about the top curve of such half-space line ensembles. This is the first study of half-space line ensembles. The ≥ 0 regime correspond to a polymer measure which is not pinned at the boundary. In a companion work, we investigate the < 0 setting. We show that in this case, the endpoint of the point-to-line polymer stays within (1) window of the diagonal. We also show that the limiting quenched endpoint distribution of the polymer around the diagonal is given by a random probability mass function proportional to the exponential of a random walk with log-gamma type increments
Understanding Data Manipulation and How to Leverage it To Improve Generalization
Augmentations and other transformations of data, either in the input or latent space, are a critical component of modern machine learning systems. While these techniques are widely used in practice and known to provide improved generalization in many cases, it is still unclear how data manipulation impacts learning and generalization. To take a step toward addressing the problem, this thesis focuses on understanding and leveraging data augmentation and alignment for improving machine learning performance and transfer. In the first part of the thesis, we establish a novel theoretical framework to understand how data augmentation (DA) impacts learning in linear regression and classification tasks. The results demonstrate how the augmented transformed data spectrum plays a key role in characterizing the behavior of different augmentation strategies, especially in the overparameterized regime. The tools developed in this aim provide simple guidelines to build new augmentation strategies and a simple framework for comparing the generalization of different types of DA. In the second part of the thesis, we demonstrate how latent data alignment can be used to tackle the domain transfer problem, where training and testing datasets vary in distribution. Our algorithm builds upon joint clustering and data-matching through optimal transport, and outperforms the pure matching algorithm baselines in both synthetic and real datasets. Extension of the generalization analysis and algorithm design for data augmentation and alignment for nonlinear models such as artificial neural networks and random feature models are discussed. This thesis provides tools and analyses for better data manipulation design, which benefit both supervised and unsupervised learning schemes.Ph.D
Modified Theories of Gravity and Cosmological Applications
This reprint focuses on recent aspects of gravitational theory and cosmology. It contains subjects of particular interest for modified gravity theories and applications to cosmology, special attention is given to Einstein–Gauss–Bonnet, f(R)-gravity, anisotropic inflation, extra dimension theories of gravity, black holes, dark energy, Palatini gravity, anisotropic spacetime, Einstein–Finsler gravity, off-diagonal cosmological solutions, Hawking-temperature and scalar-tensor-vector theories
- …