1,898 research outputs found
Correlation-powered Information Engines and the Thermodynamics of Self-Correction
Information engines can use structured environments as a resource to generate
work by randomizing ordered inputs and leveraging the increased Shannon entropy
to transfer energy from a thermal reservoir to a work reservoir. We give a
broadly applicable expression for the work production of an information engine,
generally modeled as a memoryful channel that communicates inputs to outputs as
it interacts with an evolving environment. The expression establishes that an
information engine must have more than one memory state in order to leverage
input environment correlations. To emphasize this functioning, we designed an
information engine powered solely by temporal correlations and not by
statistical biases, as employed by previous engines. Key to this is the
engine's ability to synchronize---the engine automatically returns to a desired
dynamical phase when thrown into an unwanted, dissipative phase by corruptions
in the input---that is, by unanticipated environmental fluctuations. This
self-correcting mechanism is robust up to a critical level of corruption,
beyond which the system fails to act as an engine. We give explicit analytical
expressions for both work and critical corruption level and summarize engine
performance via a thermodynamic-function phase diagram over engine control
parameters. The results reveal a new thermodynamic mechanism based on
nonergodicity that underlies error correction as it operates to support
resilient engineered and biological systems.Comment: 22 pages, 13 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/tos.ht
Entropy production in systems with unidirectional transitions
The entropy production is one of the most essential features for systems
operating out of equilibrium. The formulation for discrete-state systems goes
back to the celebrated Schnakenberg's work and hitherto can be carried out when
for each transition between two states also the reverse one is allowed.
Nevertheless, several physical systems may exhibit a mixture of both
unidirectional and bidirectional transitions, and how to properly define the
entropy production in this case is still an open question. Here, we present a
solution to such a challenging problem. The average entropy production can be
consistently defined, employing a mapping that preserves the average fluxes,
and its physical interpretation is provided. We describe a class of stochastic
systems composed of unidirectional links forming cycles and detailed-balanced
bidirectional links, showing that they behave in a pseudo-deterministic
fashion. This approach is applied to a system with time-dependent stochastic
resetting. Our framework is consistent with thermodynamics and leads to some
intriguing observations on the relation between the arrow of time and the
average entropy production for resetting events.Comment: (Accepted for publication in Physical Review Research
Small Open Chemical Systems Theory and Its Implications to Darwinian Evolutionary Dynamics, Complex Self-Organization and Beyond
The study of biological cells in terms of mesoscopic, nonequilibrium,
nonlinear, stochastic dynamics of open chemical systems provides a paradigm for
other complex, self-organizing systems with ultra-fast stochastic fluctuations,
short-time deterministic nonlinear dynamics, and long-time evolutionary
behavior with exponentially distributed rare events, discrete jumps among
punctuated equilibria, and catastrophe.Comment: 15 page
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits
In this paper, we propose an information-theoretic exploration strategy for
stochastic, discrete multi-armed bandits that achieves optimal regret. Our
strategy is based on the value of information criterion. This criterion
measures the trade-off between policy information and obtainable rewards. High
amounts of policy information are associated with exploration-dominant searches
of the space and yield high rewards. Low amounts of policy information favor
the exploitation of existing knowledge. Information, in this criterion, is
quantified by a parameter that can be varied during search. We demonstrate that
a simulated-annealing-like update of this parameter, with a sufficiently fast
cooling schedule, leads to an optimal regret that is logarithmic with respect
to the number of episodes.Comment: Entrop
Kullback-Leibler Divergence and Akaike Information Criterion in General Hidden Markov Models
To characterize the Kullback-Leibler divergence and Fisher information in
general parametrized hidden Markov models, in this paper, we first show that
the log likelihood and its derivatives can be represented as an additive
functional of a Markovian iterated function system, and then provide explicit
characterizations of these two quantities through this representation.
Moreover, we show that Kullback-Leibler divergence can be locally approximated
by a quadratic function determined by the Fisher information. Results relating
to the Cram\'{e}r-Rao lower bound and the H\'{a}jek-Le Cam local asymptotic
minimax theorem are also given. As an application of our results, we provide a
theoretical justification of using Akaike information criterion (AIC) model
selection in general hidden Markov models. Last, we study three concrete
models: a Gaussian vector autoregressive-moving average model of order ,
recurrent neural networks, and temporal restricted Boltzmann machine, to
illustrate our theory.Comment: 26 pages, 1 figur
Probabilistic Models of Motor Production
N. Bernstein defined the ability of the central neural system (CNS) to control many degrees of freedom of a physical body with all its redundancy and flexibility as the main problem in motor control. He pointed at that man-made mechanisms usually have one, sometimes two degrees of freedom (DOF); when the number of DOF increases further, it becomes prohibitively hard to control them. The brain, however, seems to perform such control effortlessly. He suggested the way the brain might deal with it: when a motor skill is being acquired, the brain artificially limits the degrees of freedoms, leaving only one or two. As the skill level increases, the brain gradually "frees" the previously fixed DOF, applying control when needed and in directions which have to be corrected, eventually arriving to the control scheme where all the DOF are "free". This approach of reducing the dimensionality of motor control remains relevant even today.
One the possibles solutions of the Bernstetin's problem is the hypothesis of motor primitives (MPs) - small building blocks that constitute complex movements and facilitite motor learnirng and task completion. Just like in the visual system, having a homogenious hierarchical architecture built of similar computational elements may be beneficial.
Studying such a complicated object as brain, it is important to define at which level of details one works and which questions one aims to answer. David Marr suggested three levels of analysis: 1. computational, analysing which problem the system solves; 2. algorithmic, questioning which representation the system uses and which computations it performs; 3. implementational, finding how such computations are performed by neurons in the brain. In this thesis we stay at the first two levels, seeking for the basic representation of motor output.
In this work we present a new model of motor primitives that comprises multiple interacting latent dynamical systems, and give it a full Bayesian treatment. Modelling within the Bayesian framework, in my opinion, must become the new standard in hypothesis testing in neuroscience. Only the Bayesian framework gives us guarantees when dealing with the inevitable plethora of hidden variables and uncertainty.
The special type of coupling of dynamical systems we proposed, based on the Product of Experts, has many natural interpretations in the Bayesian framework. If the dynamical systems run in parallel, it yields Bayesian cue integration. If they are organized hierarchically due to serial coupling, we get hierarchical priors over the dynamics. If one of the dynamical systems represents sensory state, we arrive to the sensory-motor primitives. The compact representation that follows from the variational treatment allows learning of a motor primitives library. Learned separately, combined motion can be represented as a matrix of coupling values.
We performed a set of experiments to compare different models of motor primitives. In a series of 2-alternative forced choice (2AFC) experiments participants were discriminating natural and synthesised movements, thus running a graphics Turing test. When available, Bayesian model score predicted the naturalness of the perceived movements. For simple movements, like walking, Bayesian model comparison and psychophysics tests indicate that one dynamical system is sufficient to describe the data. For more complex movements, like walking and waving, motion can be better represented as a set of coupled dynamical systems. We also experimentally confirmed that Bayesian treatment of model learning on motion data is superior to the simple point estimate of latent parameters. Experiments with non-periodic movements show that they do not benefit from more complex latent dynamics, despite having high kinematic complexity.
By having a fully Bayesian models, we could quantitatively disentangle the influence of motion dynamics and pose on the perception of naturalness. We confirmed that rich and correct dynamics is more important than the kinematic representation.
There are numerous further directions of research. In the models we devised, for multiple parts, even though the latent dynamics was factorized on a set of interacting systems, the kinematic parts were completely independent. Thus, interaction between the kinematic parts could be mediated only by the latent dynamics interactions. A more flexible model would allow a dense interaction on the kinematic level too.
Another important problem relates to the representation of time in Markov chains. Discrete time Markov chains form an approximation to continuous dynamics. As time step is assumed to be fixed, we face with the problem of time step selection. Time is also not a explicit parameter in Markov chains. This also prohibits explicit optimization of time as parameter and reasoning (inference) about it. For example, in optimal control boundary conditions are usually set at exact time points, which is not an ecological scenario, where time is usually a parameter of optimization. Making time an explicit parameter in dynamics may alleviate this
Hidden Markov Models
Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research
Essays on Financial Applications of Nonlinear Models
In this thesis, we examine the relationship between news and the
stock market. Further, we explore methods and build new nonlinear
models for forecasting stock price movement and portfolio
optimization based on past stock prices and on one type of big
data, news items, which are obtained through the RavenPack News
Analytics Global Equities editions.
The thesis consists of three essays. In Essay 1, we investigate
the relationship between news items and stock prices using the
artificial neural network (ANN) model. First, we use Granger
causality to ascertain how news items affect stock prices. The
results show that news volume is not the Granger cause of stock
price change; rather, news sentiment is. Second, we test the
semi–strong form efficient market hypothesis, whereas most
existing research testing efficient market hypothesis focuses on
the weak–form version. Our ANN strategies consistently
outperform the passive buy–and–hold strategy and this finding
is apparently at odds with the notion of the efficient market
hypothesis. Finally, using news sentiment analytics from
RavenPack Dow Jones News Analytics, we show positive
profitability with out–of–sample prediction using the
proposed ANN strategies for Google Inc. (NASDAQ: GOOG).
In Essay 2, we expand the utility of the information from news
volume and news sentiments to encompass portfolio
diversification. For the Dow Jones Industrial Average (DJIA)
components, we assign different weights to build portfolios
according to their weekly news volumes or news sentiments. Our
results show that news volume contributes to portfolio variance
both in–sample and out–of–sample: positive news sentiment
contributes to the portfolio return in–sample, while negative
contributes to the portfolio return out–of–sample, which is a
consequence of investors overreacting to the news sentiment.
Further, we propose a novel approach to portfolio diversification
using the k–Nearest Neighbors (kNN) algorithm based on the idea
that news sentiment correlates with stock returns.
Out–of–sample results indicate that such strategy dominates
the benchmark DJIA index portfolio.
In Essay 3, we propose a new model called the Combined Markov and
Hidden Markov Model (CMHMM), in which observation is affected by
a Markov model and an HMM (Hidden Markov Model) model. The three
fundamental questions of the CMHMM are discussed. Further, the
application of the CMHMM, in which the news sentiment is one
observation and the stock return is the other, is discussed. The
empirical results of the trading strategy based on the CMHMM show
the potential applications of the proposed model in finance.
This thesis contributes to the literature in a number of ways.
First, it extends the literature on financial applications of
nonlinear models. We explore the applications of the ANNs and kNN
in the financial market. Besides, the proposed new CMHMM model
adheres to the nature of the stock market and has better
potential prediction ability. Second, the empirical results from
this dissertation contribute to the understanding of the
relationship between news and the stock market. For instance, our
research found that news volume contributes to the portfolio
return and that investors overreact to news sentiment—a
phenomenon that has been discussed by other scholars from
different angles
Discriminative, generative, and imitative learning
Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (leaves 201-212).I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars. Conversely, discriminative algorithms adjust a possibly non-distributional model to data optimizing for a specific task, such as classification or prediction. This typically leads to superior performance yet compromises the flexibility of generative modeling. I present Maximum Entropy Discrimination (MED) as a framework to combine both discriminative estimation and generative probability densities. Calculations involve distributions over parameters, margins, and priors and are provably and uniquely solvable for the exponential family. Extensions include regression, feature selection, and transduction. SVMs are also naturally subsumed and can be augmented with, for example, feature selection, to obtain substantial improvements. To extend to mixtures of exponential families, I derive a discriminative variant of the Expectation-Maximization (EM) algorithm for latent discriminative learning (or latent MED).(cont.) While EM and Jensen lower bound log-likelihood, a dual upper bound is made possible via a novel reverse-Jensen inequality. The variational upper bound on latent log-likelihood has the same form as EM bounds, is computable efficiently and is globally guaranteed. It permits powerful discriminative learning with the wide range of contemporary probabilistic mixture models (mixtures of Gaussians, mixtures of multinomials and hidden Markov models). We provide empirical results on standardized data sets that demonstrate the viability of the hybrid discriminative-generative approaches of MED and reverse-Jensen bounds over state of the art discriminative techniques or generative approaches. Subsequently, imitative learning is presented as another variation on generative modeling which also learns from exemplars from an observed data source. However, the distinction is that the generative model is an agent that is interacting in a much more complex surrounding external world. It is not efficient to model the aggregate space in a generative setting. I demonstrate that imitative learning (under appropriate conditions) can be adequately addressed as a discriminative prediction task which outperforms the usual generative approach. This discriminative-imitative learning approach is applied with a generative perceptual system to synthesize a real-time agent that learns to engage in social interactive behavior.by Tony Jebara.Ph.D
- …