4,689 research outputs found
Gaussian Error Linear Units (GELUs)
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural
network activation function. The GELU activation function is , where
the standard Gaussian cumulative distribution function. The GELU
nonlinearity weights inputs by their value, rather than gates inputs by their
sign as in ReLUs (). We perform an empirical evaluation of
the GELU nonlinearity against the ReLU and ELU activations and find performance
improvements across all considered computer vision, natural language
processing, and speech tasks.Comment: Trimmed version of 2016 draft; add exact formul
Scalable neural networks for the efficient learning of disordered quantum systems
Supervised machine learning is emerging as a powerful computational tool to
predict the properties of complex quantum systems at a limited computational
cost. In this article, we quantify how accurately deep neural networks can
learn the properties of disordered quantum systems as a function of the system
size. We implement a scalable convolutional network that can address arbitrary
system sizes. This network is compared with a recently introduced extensive
convolutional architecture [K. Mills et al., Chem. Sci. 10, 4129 (2019)] and
with conventional dense networks with all-to-all connectivity. The networks are
trained to predict the exact ground-state energies of various disordered
systems, namely a continuous-space single-particle Hamiltonian for cold-atoms
in speckle disorder, and different setups of a quantum Ising chain with random
couplings, including one with only short-range interactions and one augmented
with a long-range term. In all testbeds we consider, the scalable network
retains high accuracy as the system size increases. Furthermore, we demonstrate
that the network scalability enables a transfer-learning protocol, whereby a
pre-training performed on small systems drastically accelerates the learning of
large-system properties, allowing reaching high accuracy with small training
sets. In fact, with the scalable network one can even extrapolate to sizes
larger than those included in the training set, accurately reproducing the
results of state-of-the-art quantum Monte Carlo simulations.Comment: 12 pages, 11 figure
Language Modeling with Deep Transformers
We explore deep autoregressive Transformer models in language modeling for
speech recognition. We focus on two aspects. First, we revisit Transformer
model configurations specifically for language modeling. We show that well
configured Transformer models outperform our baseline models based on the
shallow stack of LSTM recurrent neural network layers. We carry out experiments
on the open-source LibriSpeech 960hr task, for both 200K vocabulary word-level
and 10K byte-pair encoding subword-level language modeling. We apply our
word-level models to conventional hybrid speech recognition by lattice
rescoring, and the subword-level models to attention based encoder-decoder
models by shallow fusion. Second, we show that deep Transformer language models
do not require positional encoding. The positional encoding is an essential
augmentation for the self-attention mechanism which is invariant to sequence
ordering. However, in autoregressive setup, as is the case for language
modeling, the amount of information increases along the position dimension,
which is a positional signal by its own. The analysis of attention weights
shows that deep autoregressive self-attention models can automatically make use
of such positional information. We find that removing the positional encoding
even slightly improves the performance of these models.Comment: To appear in the proceedings of INTERSPEECH 201
Blended alkali-activated aluminosilicate binders
První část práce představuje souhrn dosavadních poznatků o směsných dvousložkových alkalicky aktivovaných pojivech na bázi vysokopecní strusky, popílku, metakaolinu a cihelného prachu s důrazem na jejich zpracovatelnost, mikrostrukturu, mechanické vlastnosti a trvanlivost. V experimentální části byla vyrobena alkalicky aktivovaná pojiva s různým poměrem popílku a metakaolinu a byla porovnána jejich zpracovatelnost a mechanické vlastnosti. Struktura pojiva byla hodnocena rastrovacím elektronovým mikroskopem (SEM) a vysokotlakou rtuťovou porozimetrií. Z prostudované literatury i vlastních dosažených výsledků vyplývá, že kombinace jednotlivých prekurzorů má pozitivní vliv na většinu posuzovaných vlastností materiálu.The first part of the work presents a summary of the current knowledge of two-component blended alkali-activated binders based on blast furnace slag, fly ash, metakaolin and ground brick with emphasis on their workability, microstructure, mechanical properties and durability. In the experimental part the alkali-activated binders with different ratio of fly ash and metakaolin were made to compare their workability and mechanical properties. The binder structure was evaluated by means of scanning electron microscopy (SEM) and mercury intrusion porosimetry. From the literature review and my own achievements it can be concluded that the combination of single precursors has a positive effect on the majority of the assessed properties of the material.
- …
