502 research outputs found
Prediction of the Atomization Energy of Molecules Using Coulomb Matrix and Atomic Composition in a Bayesian Regularized Neural Networks
Exact calculation of electronic properties of molecules is a fundamental step
for intelligent and rational compounds and materials design. The intrinsically
graph-like and non-vectorial nature of molecular data generates a unique and
challenging machine learning problem. In this paper we embrace a learning from
scratch approach where the quantum mechanical electronic properties of
molecules are predicted directly from the raw molecular geometry, similar to
some recent works. But, unlike these previous endeavors, our study suggests a
benefit from combining molecular geometry embedded in the Coulomb matrix with
the atomic composition of molecules. Using the new combined features in a
Bayesian regularized neural networks, our results improve well-known results
from the literature on the QM7 dataset from a mean absolute error of 3.51
kcal/mol down to 3.0 kcal/mol.Comment: Under review ICANN 201
Bayesian regression filter and the issue of priors
We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors
Nash Codes for Noisy Channels
This paper studies the stability of communication protocols that deal with
transmission errors. We consider a coordination game between an informed sender
and an uninformed decision maker, the receiver, who communicate over a noisy
channel. The sender's strategy, called a code, maps states of nature to
signals. The receiver's best response is to decode the received channel output
as the state with highest expected receiver payoff. Given this decoding, an
equilibrium or "Nash code" results if the sender encodes every state as
prescribed. We show two theorems that give sufficient conditions for Nash
codes. First, a receiver-optimal code defines a Nash code. A second, more
surprising observation holds for communication over a binary channel which is
used independently a number of times, a basic model of information
transmission: Under a minimal "monotonicity" requirement for breaking ties when
decoding, which holds generically, EVERY code is a Nash code.Comment: More general main Theorem 6.5 with better proof. New examples and
introductio
Calibration with confidence:A principled method for panel assessment
Frequently, a set of objects has to be evaluated by a panel of assessors, but
not every object is assessed by every assessor. A problem facing such panels is
how to take into account different standards amongst panel members and varying
levels of confidence in their scores. Here, a mathematically-based algorithm is
developed to calibrate the scores of such assessors, addressing both of these
issues. The algorithm is based on the connectivity of the graph of assessors
and objects evaluated, incorporating declared confidences as weights on its
edges. If the graph is sufficiently well connected, relative standards can be
inferred by comparing how assessors rate objects they assess in common,
weighted by the levels of confidence of each assessment. By removing these
biases, "true" values are inferred for all the objects. Reliability estimates
for the resulting values are obtained. The algorithm is tested in two case
studies, one by computer simulation and another based on realistic evaluation
data. The process is compared to the simple averaging procedure in widespread
use, and to Fisher's additive incomplete block analysis. It is anticipated that
the algorithm will prove useful in a wide variety of situations such as
evaluation of the quality of research submitted to national assessment
exercises; appraisal of grant proposals submitted to funding panels; ranking of
job applicants; and judgement of performances on degree courses wherein
candidates can choose from lists of options.Comment: 32 pages including supplementary information; 5 figure
Modelling the emergence of rodent filial huddling from physiological huddling
Huddling behaviour in neonatal rodents reduces the metabolic costs of physiological thermoregulation. However, animals continue to huddle into adulthood, at ambient temperatures where they are able to sustain a basal metabolism in isolation from the huddle. This 'filial huddling' in older animals is known to be guided by olfactory rather than thermal cues. The present study aimed to test whether thermally rewarding contacts between young mice, experienced when thermogenesis in brown adipose fat tissue (BAT) is highest, could give rise to olfactory preferences that persist as filial huddling interactions in adults. To this end, a simple model was constructed to fit existing data on the development of mouse thermal physiology and behaviour. The form of the model that emerged yields a remarkable explanation for filial huddling; associative learning maintains huddling into adulthood via processes that reduce thermodynamic entropy from BAT-metabolism and increase information about social ordering amongst littermates
Fast methods for training Gaussian processes on large data sets
Gaussian process regression (GPR) is a non-parametric Bayesian technique for
interpolating or fitting data. The main barrier to further uptake of this
powerful tool rests in the computational costs associated with the matrices
which arise when dealing with large data sets. Here, we derive some simple
results which we have found useful for speeding up the learning stage in the
GPR algorithm, and especially for performing Bayesian model comparison between
different covariance functions. We apply our techniques to both synthetic and
real data and quantify the speed-up relative to using nested sampling to
numerically evaluate model evidences.Comment: Fixed missing reference
Optimal client recommendation for market makers in illiquid financial products
The process of liquidity provision in financial markets can result in
prolonged exposure to illiquid instruments for market makers. In this case,
where a proprietary position is not desired, pro-actively targeting the right
client who is likely to be interested can be an effective means to offset this
position, rather than relying on commensurate interest arising through natural
demand. In this paper, we consider the inference of a client profile for the
purpose of corporate bond recommendation, based on typical recorded information
available to the market maker. Given a historical record of corporate bond
transactions and bond meta-data, we use a topic-modelling analogy to develop a
probabilistic technique for compiling a curated list of client recommendations
for a particular bond that needs to be traded, ranked by probability of
interest. We show that a model based on Latent Dirichlet Allocation offers
promising performance to deliver relevant recommendations for sales traders.Comment: 12 pages, 3 figures, 1 tabl
Fast and flexible selection with a single switch
Selection methods that require only a single-switch input, such as a button
click or blink, are potentially useful for individuals with motor impairments,
mobile technology users, and individuals wishing to transmit information
securely. We present a single-switch selection method, "Nomon," that is general
and efficient. Existing single-switch selection methods require selectable
options to be arranged in ways that limit potential applications. By contrast,
traditional operating systems, web browsers, and free-form applications (such
as drawing) place options at arbitrary points on the screen. Nomon, however,
has the flexibility to select any point on a screen. Nomon adapts automatically
to an individual's clicking ability; it allows a person who clicks precisely to
make a selection quickly and allows a person who clicks imprecisely more time
to make a selection without error. Nomon reaps gains in information rate by
allowing the specification of beliefs (priors) about option selection
probabilities and by avoiding tree-based selection schemes in favor of direct
(posterior) inference. We have developed both a Nomon-based writing application
and a drawing application. To evaluate Nomon's performance, we compared the
writing application with a popular existing method for single-switch writing
(row-column scanning). Novice users wrote 35% faster with the Nomon interface
than with the scanning interface. An experienced user (author TB, with > 10
hours practice) wrote at speeds of 9.3 words per minute with Nomon, using 1.2
clicks per character and making no errors in the final text.Comment: 14 pages, 5 figures, 1 table, presented at NIPS 2009 Mini-symposi
- …