183 research outputs found
The Loss Rank Principle for Model Selection
We introduce a new principle for model selection in regression and
classification. Many regression models are controlled by some smoothness or
flexibility or complexity parameter c, e.g. the number of neighbors to be
averaged over in k nearest neighbor (kNN) regression or the polynomial degree
in regression with polynomials. Let f_D^c be the (best) regressor of complexity
c on data D. A more flexible regressor can fit more data D' well than a more
rigid one. If something (here small loss) is easy to achieve it's typically
worth less. We define the loss rank of f_D^c as the number of other
(fictitious) data D' that are fitted better by f_D'^c than D is fitted by
f_D^c. We suggest selecting the model complexity c that has minimal loss rank
(LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP
only depends on the regression function and loss function. It works without a
stochastic noise model, and is directly applicable to any non-parametric
regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP,
study it for specific regression problems, in particular linear ones, and
compare it to other model selection schemes.Comment: 16 page
Multi-Objective Supervised Learning
Copyright © 2008 Springer-Verlag Berlin Heidelberg. The final publication is available at link.springer.comBook title: Multiobjective Problem Solving from NatureExtended version of the 2006 workshop paper presented at the Workshop on Multiobjective Problem-Solving from Nature, 9th International Conference on Parallel Problem Solving from Nature (PPSN IX), Reykjavik, Iceland, 9-13 September 2006; see: http://hdl.handle.net/10871/11785This chapter sets out a number of the popular areas in multiobjective supervised learning. It gives empirical examples of model complexity optimization and competing error terms, and presents the recent advances in multi-class receiver operating characteristic analysis enabled by multiobjective optimization. It concludes by highlighting some specific areas of interest/concern when dealing with multiobjective supervised learning problems, and sets out future areas of potential research
Reconstruction of Causal Networks by Set Covering
We present a method for the reconstruction of networks, based on the order of
nodes visited by a stochastic branching process. Our algorithm reconstructs a
network of minimal size that ensures consistency with the data. Crucially, we
show that global consistency with the data can be achieved through purely local
considerations, inferring the neighbourhood of each node in turn. The
optimisation problem solved for each individual node can be reduced to a Set
Covering Problem, which is known to be NP-hard but can be approximated well in
practice. We then extend our approach to account for noisy data, based on the
Minimum Description Length principle. We demonstrate our algorithms on
synthetic data, generated by an SIR-like epidemiological model.Comment: Under consideration for the ECML PKDD 2010 conferenc
A note on the minimum distance of quantum LDPC codes
We provide a new lower bound on the minimum distance of a family of quantum
LDPC codes based on Cayley graphs proposed by MacKay, Mitchison and
Shokrollahi. Our bound is exponential, improving on the quadratic bound of
Couvreur, Delfosse and Z\'emor. This result is obtained by examining a family
of subsets of the hypercube which locally satisfy some parity conditions
Statistical mechanics of typical set decoding
The performance of ``typical set (pairs) decoding'' for ensembles of
Gallager's linear code is investigated using statistical physics. In this
decoding, error happens when the information transmission is corrupted by an
untypical noise or two or more typical sequences satisfy the parity check
equation provided by the received codeword for which a typical noise is added.
We show that the average error rate for the latter case over a given code
ensemble can be tightly evaluated using the replica method, including the
sensitivity to the message length. Our approach generally improves the existing
analysis known in information theory community, which was reintroduced by
MacKay (1999) and believed as most accurate to date.Comment: 7 page
Magnetization enumerator of real-valued symmetric channels in Gallager error-correcting codes
Using the magnetization enumerator method, we evaluate the practical and
theoretical limitations of symmetric channels with real outputs. Results are
presented for several regular Gallager code constructions.Comment: 5 pages, 1 figure, to appear as Brief Report in Physical Review
The Entropy of a Binary Hidden Markov Process
The entropy of a binary symmetric Hidden Markov Process is calculated as an
expansion in the noise parameter epsilon. We map the problem onto a
one-dimensional Ising model in a large field of random signs and calculate the
expansion coefficients up to second order in epsilon. Using a conjecture we
extend the calculation to 11th order and discuss the convergence of the
resulting series
Impact of Heat Pump Load on Distribution Networks
© The Institution of Engineering and Technology 2014. Heat pumps can provide domestic heating at a cost that is competitive with oil heating in particular. If the electricity supply contains a significant amount of renewable generation, a move from fossil fuel heating to heat pumps can reduce greenhouse gas emissions. The inherent thermal storage of heat pump installations can also provide the electricity supplier with valuable flexibility. The increase in heat pump installations in the UK and Europe in the last few years poses a challenge for low-voltage networks, because of the use of induction motors to drive the pump compressors. The induction motor load tends to depress voltage, especially on starting. The study includes experimental results, dynamic load modelling, comparison of experimental results and simulation results for various levels of heat pump deployment. The simulations are based on a generic test network designed to capture the main characteristics of UK distribution system practice. The simulations employ DIgSlILENT Power Factory to facilitate dynamic simulations that focus on starting current, voltage variations, active power, reactive power and switching transients
- …