1,139 research outputs found

    Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks

    Full text link
    Iterative differential approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures

    Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

    Full text link
    While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance

    Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

    Full text link
    Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive, especially when used at scale. This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications. We will discuss a general result about feed-forward neural networks and then extend this solution to compositional (mult-layer) networks, which are applied to a simplified transformer block containing feed-forward and self-attention layers. These models are used to train highly-specified and complex multi-layer neural architectures that we refer to as self-attentive feed-forward unit (SAFFU) layers, which we use to develop a transformer that appears to generalize well over small, cognitively-feasible, volumes of data. Testing demonstrates explicit solutions outperform models optimized by backpropagation alone. Moreover, further application of backpropagation after explicit solutions leads to better optima from smaller scales of data, training effective models from much less data is enabled by explicit solution warm starts. We then carry out ablation experiments training a roadmap of about 250 transformer models over 1-million tokens to determine ideal settings. We find that multiple different architectural variants produce highly-performant models, and discover from this ablation that some of the best are not the most parameterized. This appears to indicate well-generalized models could be reached using less data by using explicit solutions, and that architectural exploration using explicit solutions pays dividends in guiding the search for efficient variants with fewer parameters, and which could be incorporated into low-resource hardware where AI might be embodied

    Reconstruction of the Adaptable Deployable Entry and Placement Technology Sounding Rocket One Flight Test

    Get PDF
    The Adaptable Deployable Entry and Placement Technology Sounding Rocket One flight test is a demonstration experiment for deployable atmospheric decelerator technologies. The suborbital flight test occurred on 12 September 2018, at the White Sands Missile Range. Data from on-board and ground-based sensors were collected, from which the as-flown trajectory was reconstructed using an iterative extended Kalman filter-smoother. This paper describes the methodology, test vehicle instrumentation, and data analysis results from the flight test trajectory reconstruction

    Which Tweets 'Deserve' to be Included in News Stories? Chronemics of Tweet Embedding

    Get PDF
    The use and selection of user-generated social media content, specifically tweets, as a news source has become an integral part of news production practice. Yet, the mapping and the extent of the nature of the practices in which news outlets integrate social media use is still lacking. This study focuses on the pressures of immediacy on the media ecosystems, i.e., as organizational practices of news outlets that make choices related to social media content integration. By analyzing a large corpora of news outlets that have embedded tweets, this study analyzes tweet embedding practices by specifically focusing on the concept of chronemics, conceptualized here as the time needed to embed tweets. Temporal constraints are particularly pressing for journalistic practices, given the continuous pressures of 24/7 news cycle. We ask two main questions: which types of outlets are quicker to embed tweets, and which types of users’ tweets are more likely to be embedded quickly

    Mobile Course Feedback System for Improving Student Engagement

    Get PDF
    This study focuses on the progression of a prototype, named KlassBase, aimed at improving generally low student engagement and compensating for shortcomings of currently-employed engagement methods. The prototype is a smartphone application designed to incentivize honest, frequent sharing of feedback between students and professors, and provide insight into the areas of a course which need improvement. We tested our assumptions about which features of the prototype would positively impact engagement, first with interviews to refine our approach, then with online surveys to measure the performance of our prototype against one currently utilized method for enabling students to provide feedback -- traditional end-of-course evaluations. The results of the survey indicate that participants generally believed KlassBase would have a greater impact on a course’s instruction, and more importantly, it would make them more engaged and active in the classroom
    corecore