891 research outputs found

    Incremental construction of LSTM recurrent neural network

    Get PDF
    Long Short--Term Memory (LSTM) is a recurrent neural network that uses structures called memory blocks to allow the net remember significant events distant in the past input sequence in order to solve long time lag tasks, where other RNN approaches fail. Throughout this work we have performed experiments using LSTM networks extended with growing abilities, which we call GLSTM. Four methods of training growing LSTM has been compared. These methods include cascade and fully connected hidden layers as well as two different levels of freezing previous weights in the cascade case. GLSTM has been applied to a forecasting problem in a biomedical domain, where the input/output behavior of five controllers of the Central Nervous System control has to be modelled. We have compared growing LSTM results against other neural networks approaches, and our work applying conventional LSTM to the task at hand.Postprint (published version

    Transformers Learn Shortcuts to Automata

    Full text link
    Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are learned by these shallow and non-recurrent models? We find that a low-depth Transformer can represent the computations of any finite-state automaton (thus, any bounded-memory algorithm), by hierarchically reparameterizing its recurrent dynamics. Our theoretical results characterize shortcut solutions, whereby a Transformer with o(T)o(T) layers can exactly replicate the computation of an automaton on an input sequence of length TT. We find that polynomial-sized O(logT)O(\log T)-depth solutions always exist; furthermore, O(1)O(1)-depth simulators are surprisingly common, and can be understood using tools from Krohn-Rhodes theory and circuit complexity. Empirically, we perform synthetic experiments by training Transformers to simulate a wide variety of automata, and show that shortcut solutions can be learned via standard training. We further investigate the brittleness of these solutions and propose potential mitigations

    Radical Artificial Intelligence: A Postmodern Approach

    Get PDF

    Radical Artificial Intelligence: A Postmodern Approach

    Get PDF
    The dynamic response of end-clamped monolithic beams and sandwich beams has been measured by loading the beams at mid-span using metal foam projectiles. The AISI 304 stainless-steel sandwich beams comprise two identical face sheets and either prismatic Y-frame or corrugated cores. The resistance to shock loading is quantified by the permanent transverse deflection at mid-span of the beams as a function of projectile momentum. The prismatic cores are aligned either longitudinally along the beam length or transversely. It is found that the sandwich beams with a longitudinal core orientation have a higher shock resistance than the monolithic beams of equal mass. In contrast, the performance of the sandwich beams with a transverse core orientation is very similar to that of the monolithic beams. Three-dimensional finite element (FE) simulations are in good agreement with the measured responses. The FE calculations indicate that strain concentrations in the sandwich beams occur at joints within the cores and between the core and face sheets; the level of maximum strain is similar for the Y-frame and corrugated core beams for a given value of projectile momentum. The experimental and FE results taken together reveal that Y-frame and corrugated core sandwich beams of equal mass have similar dynamic performances in terms of rear-face deflection, degree of core compression and level of strain within the beam

    Radical Artificial Intelligence: A Postmodern Approach

    Get PDF

    Computational mechanics: from theory to practice

    Get PDF
    In the last fifty years, computational mechanics has gained the attention of a large number of disciplines, ranging from physics and mathematics to biology, involving all the disciplines that deal with complex systems or processes. With ϵ-machines, computational mechanics provides powerful models that can help characterizing these systems. To date, an increasing number of studies concern the use of such methodologies; nevertheless, an attempt to make this approach more accessible in practice is lacking yet. Starting from this point, this thesis aims at investigating a more practical approach to computational mechanics so as to make it suitable for applications in a wide spectrum of domains. ϵ-machines are analyzed more in the robotics scene, trying to understand if they can be exploited in contexts with typically complex dynamics like swarms. Experiments are conducted with random walk behavior and the aggregation task. Statistical complexity is first studied and tested on the logistical map and then exploited, as a more applicative case, in the analysis of electroencephalograms as a classification parameter, resulting in the discrimination between patients (with different sleep disorders) and healthy subjects. The number of applications that may benefit from the use of such a technique is enormous. Hopefully, this work has broadened the prospect towards a more applicative interest

    Bits from Biology for Computational Intelligence

    Get PDF
    Computational intelligence is broadly defined as biologically-inspired computing. Usually, inspiration is drawn from neural systems. This article shows how to analyze neural systems using information theory to obtain constraints that help identify the algorithms run by such systems and the information they represent. Algorithms and representations identified information-theoretically may then guide the design of biologically inspired computing systems (BICS). The material covered includes the necessary introduction to information theory and the estimation of information theoretic quantities from neural data. We then show how to analyze the information encoded in a system about its environment, and also discuss recent methodological developments on the question of how much information each agent carries about the environment either uniquely, or redundantly or synergistically together with others. Last, we introduce the framework of local information dynamics, where information processing is decomposed into component processes of information storage, transfer, and modification -- locally in space and time. We close by discussing example applications of these measures to neural data and other complex systems
    corecore