960 research outputs found

    Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks

    Full text link
    Iterative differential approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures

    Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

    Full text link
    Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive, especially when used at scale. This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications. We will discuss a general result about feed-forward neural networks and then extend this solution to compositional (mult-layer) networks, which are applied to a simplified transformer block containing feed-forward and self-attention layers. These models are used to train highly-specified and complex multi-layer neural architectures that we refer to as self-attentive feed-forward unit (SAFFU) layers, which we use to develop a transformer that appears to generalize well over small, cognitively-feasible, volumes of data. Testing demonstrates explicit solutions outperform models optimized by backpropagation alone. Moreover, further application of backpropagation after explicit solutions leads to better optima from smaller scales of data, training effective models from much less data is enabled by explicit solution warm starts. We then carry out ablation experiments training a roadmap of about 250 transformer models over 1-million tokens to determine ideal settings. We find that multiple different architectural variants produce highly-performant models, and discover from this ablation that some of the best are not the most parameterized. This appears to indicate well-generalized models could be reached using less data by using explicit solutions, and that architectural exploration using explicit solutions pays dividends in guiding the search for efficient variants with fewer parameters, and which could be incorporated into low-resource hardware where AI might be embodied

    Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

    Full text link
    While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance

    Boosting Continuous Control with Consistency Policy

    Full text link
    Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline reinforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL. We will release our code later.Comment: 18 pages, 9 page

    Identifiability and Consistent Estimation for Gaussian Chain Graph Models

    Full text link
    The chain graph model admits both undirected and directed edges in one graph, where symmetric conditional dependencies are encoded via undirected edges and asymmetric causal relations are encoded via directed edges. Though frequently encountered in practice, the chain graph model has been largely under investigated in literature, possibly due to the lack of identifiability conditions between undirected and directed edges. In this paper, we first establish a set of novel identifiability conditions for the Gaussian chain graph model, exploiting a low rank plus sparse decomposition of the precision matrix. Further, an efficient learning algorithm is built upon the identifiability conditions to fully recover the chain graph structure. Theoretical analysis on the proposed method is conducted, assuring its asymptotic consistency in recovering the exact chain graph structure. The advantage of the proposed method is also supported by numerical experiments on both simulated examples and a real application on the Standard & Poor 500 index data

    The effect of bioactive glass particle size on viscosity, stickiness and packability of resin composites

    Get PDF
    This study investigated the impact of particle size variation in un-silanized bioactive glass 45S5 (BAG) on the viscosity, stickiness and packability of experimental resin composites, while altering the ratio of 45S5 to inert glass. A series of 25 groups experimental resin composites with a constant total filler loading (65 vol%) were formulated. Filler system contains silanized barium borosilicate (BB, D50 [µm]: 0.7) and un-silanized BAG (45S5). Different ratios of BB and 45S5 were achieved by substituting 65 vol% BB with 0, 5, 10, 15, 20, 25 or 30 vol% of 45S5 (D50 [µm]: 7.0, 10.6, 17.4 and 43.6). The morphology of filler particles and formulated experimental resin composites was examined using a scanning electron microscope (SEM). Viscosity measurements were conducted on both experimental and commercial resin composites measured using an AR-G2 rheometer. A texture analyser instrument was used to measure the stickiness and packability of resin composites. Two commercial resin composites served as control groups for comparison. Statistical analysis, including one-way ANOVA tests, Tukey post-hoc tests (p < 0.05), and Kruskal-Wallis Tests (p < 0.05) for homogeneity of variables, was performed using SPSS software. Results indicated that filler loading and particle size of 45S5 significantly influenced the viscosity, stickiness, and packability of resin composites. Resin composites containing smaller 45S5 particles at lower ratios exhibited characteristics akin to packable resin composites, whereas those with larger 45S5 particles tended to resemble commercial flowable resin composites, contingent upon the particle size and ratio of 45S5

    GenFormer: A Deep-Learning-Based Approach for Generating Multivariate Stochastic Processes

    Full text link
    Stochastic generators are essential to produce synthetic realizations that preserve target statistical properties. We propose GenFormer, a stochastic generator for spatio-temporal multivariate stochastic processes. It is constructed using a Transformer-based deep learning model that learns a mapping between a Markov state sequence and time series values. The synthetic data generated by the GenFormer model preserves the target marginal distributions and approximately captures other desired statistical properties even in challenging applications involving a large number of spatial locations and a long simulation horizon. The GenFormer model is applied to simulate synthetic wind speed data at various stations in Florida to calculate exceedance probabilities for risk management

    Dual Grid Voltage Modulated Direct Power Control of Grid-Connected Voltage Source Converter under Unbalanced Network Condition

    Get PDF
    • …
    corecore