4,689 research outputs found

    Gaussian Error Linear Units (GELUs)

    Full text link
    We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is xΦ(x)x\Phi(x), where Φ(x)\Phi(x) the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (x1x>0x\mathbf{1}_{x>0}). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.Comment: Trimmed version of 2016 draft; add exact formul

    Scalable neural networks for the efficient learning of disordered quantum systems

    Get PDF
    Supervised machine learning is emerging as a powerful computational tool to predict the properties of complex quantum systems at a limited computational cost. In this article, we quantify how accurately deep neural networks can learn the properties of disordered quantum systems as a function of the system size. We implement a scalable convolutional network that can address arbitrary system sizes. This network is compared with a recently introduced extensive convolutional architecture [K. Mills et al., Chem. Sci. 10, 4129 (2019)] and with conventional dense networks with all-to-all connectivity. The networks are trained to predict the exact ground-state energies of various disordered systems, namely a continuous-space single-particle Hamiltonian for cold-atoms in speckle disorder, and different setups of a quantum Ising chain with random couplings, including one with only short-range interactions and one augmented with a long-range term. In all testbeds we consider, the scalable network retains high accuracy as the system size increases. Furthermore, we demonstrate that the network scalability enables a transfer-learning protocol, whereby a pre-training performed on small systems drastically accelerates the learning of large-system properties, allowing reaching high accuracy with small training sets. In fact, with the scalable network one can even extrapolate to sizes larger than those included in the training set, accurately reproducing the results of state-of-the-art quantum Monte Carlo simulations.Comment: 12 pages, 11 figure

    Language Modeling with Deep Transformers

    Full text link
    We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured Transformer models outperform our baseline models based on the shallow stack of LSTM recurrent neural network layers. We carry out experiments on the open-source LibriSpeech 960hr task, for both 200K vocabulary word-level and 10K byte-pair encoding subword-level language modeling. We apply our word-level models to conventional hybrid speech recognition by lattice rescoring, and the subword-level models to attention based encoder-decoder models by shallow fusion. Second, we show that deep Transformer language models do not require positional encoding. The positional encoding is an essential augmentation for the self-attention mechanism which is invariant to sequence ordering. However, in autoregressive setup, as is the case for language modeling, the amount of information increases along the position dimension, which is a positional signal by its own. The analysis of attention weights shows that deep autoregressive self-attention models can automatically make use of such positional information. We find that removing the positional encoding even slightly improves the performance of these models.Comment: To appear in the proceedings of INTERSPEECH 201

    Blended alkali-activated aluminosilicate binders

    Get PDF
    První část práce představuje souhrn dosavadních poznatků o směsných dvousložkových alkalicky aktivovaných pojivech na bázi vysokopecní strusky, popílku, metakaolinu a cihelného prachu s důrazem na jejich zpracovatelnost, mikrostrukturu, mechanické vlastnosti a trvanlivost. V experimentální části byla vyrobena alkalicky aktivovaná pojiva s různým poměrem popílku a metakaolinu a byla porovnána jejich zpracovatelnost a mechanické vlastnosti. Struktura pojiva byla hodnocena rastrovacím elektronovým mikroskopem (SEM) a vysokotlakou rtuťovou porozimetrií. Z prostudované literatury i vlastních dosažených výsledků vyplývá, že kombinace jednotlivých prekurzorů má pozitivní vliv na většinu posuzovaných vlastností materiálu.The first part of the work presents a summary of the current knowledge of two-component blended alkali-activated binders based on blast furnace slag, fly ash, metakaolin and ground brick with emphasis on their workability, microstructure, mechanical properties and durability. In the experimental part the alkali-activated binders with different ratio of fly ash and metakaolin were made to compare their workability and mechanical properties. The binder structure was evaluated by means of scanning electron microscopy (SEM) and mercury intrusion porosimetry. From the literature review and my own achievements it can be concluded that the combination of single precursors has a positive effect on the majority of the assessed properties of the material.
    corecore