109,069 research outputs found
Asymmetries arising from the space-filling nature of vascular networks
Cardiovascular networks span the body by branching across many generations of
vessels. The resulting structure delivers blood over long distances to supply
all cells with oxygen via the relatively short-range process of diffusion at
the capillary level. The structural features of the network that accomplish
this density and ubiquity of capillaries are often called space-filling. There
are multiple strategies to fill a space, but some strategies do not lead to
biologically adaptive structures by requiring too much construction material or
space, delivering resources too slowly, or using too much power to move blood
through the system. We empirically measure the structure of real networks (18
humans and 1 mouse) and compare these observations with predictions of model
networks that are space-filling and constrained by a few guiding biological
principles. We devise a numerical method that enables the investigation of
space-filling strategies and determination of which biological principles
influence network structure. Optimization for only a single principle creates
unrealistic networks that represent an extreme limit of the possible structures
that could be observed in nature. We first study these extreme limits for two
competing principles, minimal total material and minimal path lengths. We
combine these two principles and enforce various thresholds for balance in the
network hierarchy, which provides a novel approach that highlights the
trade-offs faced by biological networks and yields predictions that better
match our empirical data.Comment: 17 pages, 15 figure
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
For large, real-world inductive learning problems, the number of training
examples often must be limited due to the costs associated with procuring,
preparing, and storing the training examples and/or the computational costs
associated with learning from them. In such circumstances, one question of
practical importance is: if only n training examples can be selected, in what
proportion should the classes be represented? In this article we help to answer
this question by analyzing, for a fixed training-set size, the relationship
between the class distribution of the training data and the performance of
classification trees induced from these data. We study twenty-six data sets
and, for each, determine the best class distribution for learning. The
naturally occurring class distribution is shown to generally perform well when
classifier performance is evaluated using undifferentiated error rate (0/1
loss). However, when the area under the ROC curve is used to evaluate
classifier performance, a balanced distribution is shown to perform well. Since
neither of these choices for class distribution always generates the
best-performing classifier, we introduce a budget-sensitive progressive
sampling algorithm for selecting training examples based on the class
associated with each example. An empirical analysis of this algorithm shows
that the class distribution of the resulting training set yields classifiers
with good (nearly-optimal) classification performance
On Optimizing Distributed Tucker Decomposition for Dense Tensors
The Tucker decomposition expresses a given tensor as the product of a small
core tensor and a set of factor matrices. Apart from providing data
compression, the construction is useful in performing analysis such as
principal component analysis (PCA)and finds applications in diverse domains
such as signal processing, computer vision and text analytics. Our objective is
to develop an efficient distributed implementation for the case of dense
tensors. The implementation is based on the HOOI (Higher Order Orthogonal
Iterator) procedure, wherein the tensor-times-matrix product forms the core
routine. Prior work have proposed heuristics for reducing the computational
load and communication volume incurred by the routine. We study the two metrics
in a formal and systematic manner, and design strategies that are optimal under
the two fundamental metrics. Our experimental evaluation on a large benchmark
of tensors shows that the optimal strategies provide significant reduction in
load and volume compared to prior heuristics, and provide up to 7x speed-up in
the overall running time.Comment: Preliminary version of the paper appears in the proceedings of
IPDPS'1
- …