109,069 research outputs found

    Asymmetries arising from the space-filling nature of vascular networks

    Full text link
    Cardiovascular networks span the body by branching across many generations of vessels. The resulting structure delivers blood over long distances to supply all cells with oxygen via the relatively short-range process of diffusion at the capillary level. The structural features of the network that accomplish this density and ubiquity of capillaries are often called space-filling. There are multiple strategies to fill a space, but some strategies do not lead to biologically adaptive structures by requiring too much construction material or space, delivering resources too slowly, or using too much power to move blood through the system. We empirically measure the structure of real networks (18 humans and 1 mouse) and compare these observations with predictions of model networks that are space-filling and constrained by a few guiding biological principles. We devise a numerical method that enables the investigation of space-filling strategies and determination of which biological principles influence network structure. Optimization for only a single principle creates unrealistic networks that represent an extreme limit of the possible structures that could be observed in nature. We first study these extreme limits for two competing principles, minimal total material and minimal path lengths. We combine these two principles and enforce various thresholds for balance in the network hierarchy, which provides a novel approach that highlights the trade-offs faced by biological networks and yields predictions that better match our empirical data.Comment: 17 pages, 15 figure

    Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

    Full text link
    For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best class distribution for learning. The naturally occurring class distribution is shown to generally perform well when classifier performance is evaluated using undifferentiated error rate (0/1 loss). However, when the area under the ROC curve is used to evaluate classifier performance, a balanced distribution is shown to perform well. Since neither of these choices for class distribution always generates the best-performing classifier, we introduce a budget-sensitive progressive sampling algorithm for selecting training examples based on the class associated with each example. An empirical analysis of this algorithm shows that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance

    On Optimizing Distributed Tucker Decomposition for Dense Tensors

    Full text link
    The Tucker decomposition expresses a given tensor as the product of a small core tensor and a set of factor matrices. Apart from providing data compression, the construction is useful in performing analysis such as principal component analysis (PCA)and finds applications in diverse domains such as signal processing, computer vision and text analytics. Our objective is to develop an efficient distributed implementation for the case of dense tensors. The implementation is based on the HOOI (Higher Order Orthogonal Iterator) procedure, wherein the tensor-times-matrix product forms the core routine. Prior work have proposed heuristics for reducing the computational load and communication volume incurred by the routine. We study the two metrics in a formal and systematic manner, and design strategies that are optimal under the two fundamental metrics. Our experimental evaluation on a large benchmark of tensors shows that the optimal strategies provide significant reduction in load and volume compared to prior heuristics, and provide up to 7x speed-up in the overall running time.Comment: Preliminary version of the paper appears in the proceedings of IPDPS'1
    corecore