891 research outputs found
Incremental construction of LSTM recurrent neural network
Long Short--Term Memory (LSTM) is a recurrent neural network that
uses structures called memory blocks to allow the net remember
significant events distant in the past input sequence in order to
solve long time lag tasks, where other RNN approaches fail.
Throughout this work we have performed experiments using LSTM
networks extended with growing abilities, which we call GLSTM.
Four methods of training growing LSTM has been compared. These
methods include cascade and fully connected hidden layers as well
as two different levels of freezing previous weights in the
cascade case. GLSTM has been applied to a forecasting problem in a biomedical domain, where the input/output behavior of five
controllers of the Central Nervous System control has to be
modelled. We have compared growing LSTM results against other
neural networks approaches, and our work applying conventional
LSTM to the task at hand.Postprint (published version
Transformers Learn Shortcuts to Automata
Algorithmic reasoning requires capabilities which are most naturally
understood through recurrent models of computation, like the Turing machine.
However, Transformer models, while lacking recurrence, are able to perform such
reasoning using far fewer layers than the number of reasoning steps. This
raises the question: what solutions are learned by these shallow and
non-recurrent models? We find that a low-depth Transformer can represent the
computations of any finite-state automaton (thus, any bounded-memory
algorithm), by hierarchically reparameterizing its recurrent dynamics. Our
theoretical results characterize shortcut solutions, whereby a Transformer with
layers can exactly replicate the computation of an automaton on an input
sequence of length . We find that polynomial-sized -depth
solutions always exist; furthermore, -depth simulators are surprisingly
common, and can be understood using tools from Krohn-Rhodes theory and circuit
complexity. Empirically, we perform synthetic experiments by training
Transformers to simulate a wide variety of automata, and show that shortcut
solutions can be learned via standard training. We further investigate the
brittleness of these solutions and propose potential mitigations
Radical Artificial Intelligence: A Postmodern Approach
The dynamic response of end-clamped monolithic beams and sandwich beams has been measured by loading the beams at mid-span using metal foam projectiles. The AISI 304 stainless-steel sandwich beams comprise two identical face sheets and either prismatic Y-frame or corrugated cores. The resistance to shock loading is quantified by the permanent transverse deflection at mid-span of the beams as a function of projectile momentum. The prismatic cores are aligned either longitudinally along the beam length or transversely. It is found that the sandwich beams with a longitudinal core orientation have a higher shock resistance than the monolithic beams of equal mass. In contrast, the performance of the sandwich beams with a transverse core orientation is very similar to that of the monolithic beams. Three-dimensional finite element (FE) simulations are in good agreement with the measured responses. The FE calculations indicate that strain concentrations in the sandwich beams occur at joints within the cores and between the core and face sheets; the level of maximum strain is similar for the Y-frame and corrugated core beams for a given value of projectile momentum. The experimental and FE results taken together reveal that Y-frame and corrugated core sandwich beams of equal mass have similar dynamic performances in terms of rear-face deflection, degree of core compression and level of strain within the beam
Computational mechanics: from theory to practice
In the last fifty years, computational mechanics has gained the attention of a large number of disciplines, ranging from physics and mathematics to biology, involving all the disciplines that deal with complex systems or processes. With ϵ-machines, computational mechanics provides powerful models that can help characterizing these systems. To date, an increasing number of studies concern the use of such methodologies; nevertheless, an attempt to make this approach more accessible in practice is lacking yet. Starting from this point, this thesis aims at investigating a more practical approach to computational mechanics so as to make it suitable for applications in a wide spectrum of domains. ϵ-machines are analyzed more in the robotics scene, trying to understand if they can be exploited in contexts with typically complex dynamics like swarms. Experiments are conducted with random walk behavior and the aggregation task. Statistical complexity is first studied and tested on the logistical map and then exploited, as a more applicative case, in the analysis of electroencephalograms as a classification parameter, resulting in the discrimination between patients (with different sleep disorders) and healthy subjects.
The number of applications that may benefit from the use of such a technique is enormous. Hopefully, this work has broadened the prospect towards a more applicative interest
Bits from Biology for Computational Intelligence
Computational intelligence is broadly defined as biologically-inspired
computing. Usually, inspiration is drawn from neural systems. This article
shows how to analyze neural systems using information theory to obtain
constraints that help identify the algorithms run by such systems and the
information they represent. Algorithms and representations identified
information-theoretically may then guide the design of biologically inspired
computing systems (BICS). The material covered includes the necessary
introduction to information theory and the estimation of information theoretic
quantities from neural data. We then show how to analyze the information
encoded in a system about its environment, and also discuss recent
methodological developments on the question of how much information each agent
carries about the environment either uniquely, or redundantly or
synergistically together with others. Last, we introduce the framework of local
information dynamics, where information processing is decomposed into component
processes of information storage, transfer, and modification -- locally in
space and time. We close by discussing example applications of these measures
to neural data and other complex systems
- …