131,910 research outputs found
On the scatter in the relation between stellar mass and halo mass: random or halo formation time dependent?
The empirical HOD model of Wang et al. 2006 fits, by construction, both the
stellar mass function and correlation function of galaxies in the local
Universe. In contrast, the semi-analytical models of De Lucia & Blazoit 2007
(DLB07) and Guo et al. 2011 (Guo11), built on the same dark matter halo merger
trees than the empirical model, still have difficulties in reproducing these
observational data simultaneously. We compare the relations between the stellar
mass of galaxies and their host halo mass in the three models, and find that
they are different. When the relations are rescaled to have the same median
values and the same scatter as in Wang et al., the rescaled DLB07 model can fit
both the measured galaxy stellar mass function and the correlation function
measured in different galaxy stellar mass bins. In contrast, the rescaled Guo11
model still over-predicts the clustering of low-mass galaxies. This indicates
that the detail of how galaxies populate the scatter in the stellar mass --
halo mass relation does play an important role in determining the correlation
functions of galaxies. While the stellar mass of galaxies in the Wang et al.
model depends only on halo mass and is randomly distributed within the scatter,
galaxy stellar mass depends also on the halo formation time in semi-analytical
models. At fixed value of infall mass, galaxies that lie above the median
stellar mass -- halo mass relation reside in haloes that formed earlier, while
galaxies that lie below the median relation reside in haloes that formed later.
This effect is much stronger in Guo11 than in DLB07, which explains the
over-clustering of low mass galaxies in Guo11. Our results illustrate that the
assumption of random scatter in the relation between stellar and halo mass as
employed by current HOD and abundance matching models may be problematic in
case a significant assembly bias exists in the real Universe.Comment: 10 pages, 6 figures, published in MNRA
Computing medians and means in Hadamard spaces
The geometric median as well as the Frechet mean of points in an Hadamard
space are important in both theory and applications. Surprisingly, no
algorithms for their computation are hitherto known. To address this issue, we
use a split version of the proximal point algorithm for minimizing a sum of
convex functions and prove that this algorithm produces a sequence converging
to a minimizer of the objective function, which extends a recent result of D.
Bertsekas (2001) into Hadamard spaces. The method is quite robust and not only
does it yield algorithms for the median and the mean, but it also applies to
various other optimization problems. We moreover show that another algorithm
for computing the Frechet mean can be derived from the law of large numbers due
to K.-T. Sturm (2002). In applications, computing medians and means is probably
most needed in tree space, which is an instance of an Hadamard space, invented
by Billera, Holmes, and Vogtmann (2001) as a tool for averaging phylogenetic
trees. It turns out, however, that it can be also used to model numerous other
tree-like structures. Since there now exists a polynomial-time algorithm for
computing geodesics in tree space due to M. Owen and S. Provan (2011), we
obtain efficient algorithms for computing medians and means, which can be
directly used in practice.Comment: Corrected version. Accepted in SIAM Journal on Optimizatio
Pruning of genetic programming trees using permutation tests
We present a novel approach based on statistical permutation tests for pruning redundant subtrees from genetic programming (GP) trees that allows us to explore the extent of effective redundancy . We observe that over a range of regression problems, median tree sizes are reduced by around 20% largely independent of test function, and that while some large subtrees are removed, the median pruned subtree comprises just three nodes; most take the form of an exact algebraic simplification. Our statistically-based pruning technique has allowed us to explore the hypothesis
that a given subtree can be replaced with a constant if this substitution results in no statistical change to the behavior of the parent tree – what we term approximate simplification. In the eventuality, we infer that more than 95% of the accepted pruning proposals are the result of algebraic simplifications, which provides some practical insight into the scope of removing redundancies in GP trees
Unreliable point facility location problems on networks
In this paper we study facility location problems on graphs under the most common criteria, such as, median, center and centdian, but we incorporate in the objective function some reliability aspects. Assuming that facilities may become unavailable with a certain probability, the problem consists of locating facilities minimizing the overall or the maximum expected service cost in the long run, or a convex combination of the two. We show that
the k-facility problem on general networks is NP-hard. Then, we provide efficient algorithms for these problems for the cases of k = 1, 2, both on general networks and on trees. We also explain how our methodology extends to handle a more general class of unreliable point facility location problems related to the ordered median objective function.Ministerio de Ciencia y TecnologíaJunta de Andalucí
Sequential Design for Computer Experiments with a Flexible Bayesian Additive Model
In computer experiments, a mathematical model implemented on a computer is
used to represent complex physical phenomena. These models, known as computer
simulators, enable experimental study of a virtual representation of the
complex phenomena. Simulators can be thought of as complex functions that take
many inputs and provide an output. Often these simulators are themselves
expensive to compute, and may be approximated by "surrogate models" such as
statistical regression models. In this paper we consider a new kind of
surrogate model, a Bayesian ensemble of trees (Chipman et al. 2010), with the
specific goal of learning enough about the simulator that a particular feature
of the simulator can be estimated. We focus on identifying the simulator's
global minimum. Utilizing the Bayesian version of the Expected Improvement
criterion (Jones et al. 1998), we show that this ensemble is particularly
effective when the simulator is ill-behaved, exhibiting nonstationarity or
abrupt changes in the response. A number of illustrations of the approach are
given, including a tidal power application.Comment: 21 page
- …