4 research outputs found
Novel Architectures and Optimization Algorithms for Training Neural Networks and Applications
The two main areas of Deep Learning are Unsupervised and Supervised Learning. Unsupervised Learning studies a class of data processing problems in which only descriptions of objects are known, without label information. Generative Adversarial Networks (GANs) have become among the most widely used unsupervised neural net models. GAN combines two neural nets, generative and discriminative, that work simultaneously. We introduce a new family of discriminator loss functions that adopts a weighted sum of real and fake parts, which we call adaptive weighted loss functions. Using the gradient information, we can adaptively choose weights to train a discriminator in the direction that benefits the GAN\u27s stability. Also, we propose several improvements to the GAN training schemes. One is self-correcting optimization for training a GAN discriminator on Speech Enhancement tasks, which helps avoid ``harmful\u27\u27 training directions for parts of the discriminator loss. The other improvement is a consistency loss, which targets the inconsistency in time and time-frequency domains caused by Fourier Transforms. Contrary to Unsupervised Learning, Supervised Learning uses labels for each object, and it is required to find the relationship between objects and labels. Building computing methods to interpret and represent human language automatically is known as Natural Language Processing which includes tasks such as word prediction, machine translation, etc. In this area, we propose a novel Neumann-Cayley Gated Recurrent Unit (NC-GRU) architecture based on a Neumann series-based Scaled Cayley transformation. The NC-GRU uses orthogonal matrices to prevent exploding gradient problems and enhance long-term memory on various prediction tasks. In addition, we propose using our newly introduced NC-GRU unit inside Neural Nets model to create neural molecular fingerprints. Integrating novel NC-GRU fingerprints and Multi-Task Deep Neural Networks schematics help to improve the performance of several molecular-related tasks. We also introduce a new normalization method - Assorted-Time Normalization, that helps to preserve information from multiple consecutive time steps and normalize using them in Recurrent Nets like architectures. Finally, we propose a Symmetry Structured Convolutional Neural Network (SCNN), an architecture with 2D structured symmetric features over spatial dimensions, that generates and preserves the symmetry structure in the network\u27s convolutional layers
Recommended from our members
Using data-derived charge densities in electronic structure methods
For supplementary information, see: https://www.repository.cam.ac.uk/handle/1810/294380In Condensed Matter Physics, the computational expense to evaluate the total potential
energy of a collection of atoms using standard ab initio methods is typically large. This limits
the scale of phenomena that can be studied in both length and time. Data-driven techniques
have established a pragmatic extension to ab initio calculations, balancing reductions in the
calculation time with potential losses of accuracy in the properties of interest. Both paradigms
compliment one another and when used appropriately, are valuable tools that enable and
stimulate research in Materials Science. Unlike traditional efforts, modern techniques to
include data employ flexible functional forms, extending the applicability of such methods
to a diverse range of physical quantities. Recently, interest in utilising data in total energy
calculations has turned towards the electron density. With an electron density that is close to
the ground state, data-derived kinetic energy functionals in orbital-free density functional
theory can be applied to evaluate the total energy without using gradients of the functional
with respect to the electron density. For this purpose, a number of approaches to calculate
data-derived densities have been proposed in recent years.
In this thesis, we begin by reviewing several fixed-form expressions to approximate the
potential energy of hexagonal layered crystals and show how a flexible form is essential to
fully utilise the available data. We then focus on developing new approaches to approximate
ground state electron densities and on novel applications that help to further unify data-driven
and ab initio techniques within electronic structure. By calculating reliable uncertainty
estimates, we show that data-derived densities can be incorporated into density functional
theory in a “safe” manner. We also show that with accurate initial densities and for systems
that otherwise have a poor initial estimate, we can reduce the number of self-consistent
field iterations that are necessary to reach self-consistency in Kohn-Sham density functional
theory. We hope that the work in this thesis will contribute to improving initial states
in density functional theory, support the application of data-derived orbital-free kinetic
energy functionals and encourage an ever closer and mutually beneficial cohesion between
data-driven and ab initio techniques throughout the Natural Sciences.EPSRC Centre for Doctoral Training in Computational Methods for Materials Science, grant numberEP/L015552/1