111,995 research outputs found
Vectorization and Parallelization of the Adaptive Mesh Refinement N-body Code
In this paper, we describe our vectorized and parallelized adaptive mesh
refinement (AMR) N-body code with shared time steps, and report its performance
on a Fujitsu VPP5000 vector-parallel supercomputer. Our AMR N-body code puts
hierarchical meshes recursively where higher resolution is required and the
time step of all particles are the same. The parts which are the most difficult
to vectorize are loops that access the mesh data and particle data. We
vectorized such parts by changing the loop structure, so that the innermost
loop steps through the cells instead of the particles in each cell, in other
words, by changing the loop order from the depth-first order to the
breadth-first order. Mass assignment is also vectorizable using this loop order
exchange and splitting the loop into loops, if the cloud-in-cell
scheme is adopted. Here, is the number of dimension. These
vectorization schemes which eliminate the unvectorized loops are applicable to
parallelization of loops for shared-memory multiprocessors. We also
parallelized our code for distributed memory machines. The important part of
parallelization is data decomposition. We sorted the hierarchical mesh data by
the Morton order, or the recursive N-shaped order, level by level and split and
allocated the mesh data to the processors. Particles are allocated to the
processor to which the finest refined cells including the particles are also
assigned. Our timing analysis using the -dominated cold dark matter
simulations shows that our parallel code speeds up almost ideally up to 32
processors, the largest number of processors in our test.Comment: 21pages, 16 figures, to be published in PASJ (Vol. 57, No. 5, Oct.
2005
Blockout: Dynamic Model Selection for Hierarchical Deep Networks
Most deep architectures for image classification--even those that are trained
to classify a large number of diverse categories--learn shared image
representations with a single model. Intuitively, however, categories that are
more similar should share more information than those that are very different.
While hierarchical deep networks address this problem by learning separate
features for subsets of related categories, current implementations require
simplified models using fixed architectures specified via heuristic clustering
methods. Instead, we propose Blockout, a method for regularization and model
selection that simultaneously learns both the model architecture and
parameters. A generalization of Dropout, our approach gives a novel
parametrization of hierarchical architectures that allows for structure
learning via back-propagation. To demonstrate its utility, we evaluate Blockout
on the CIFAR and ImageNet datasets, demonstrating improved classification
accuracy, better regularization performance, faster training, and the clear
emergence of hierarchical network structures
Hierarchical models for service-oriented systems
We present our approach to the denotation and representation of hierarchical graphs: a suitable algebra of hierarchical graphs and two domains of interpretations. Each domain of interpretation focuses on a particular perspective of the graph hierarchy: the top view (nested boxes) is based on a notion of embedded graphs while the side view (tree hierarchy) is based on gs-graphs. Our algebra can be understood as a high-level language for describing such graphical models, which are well suited for defining graphical representations of service-oriented systems where nesting (e.g. sessions, transactions, locations) and linking (e.g. shared channels, resources, names) are key aspects
Algorithms for Hierarchical and Semi-Partitioned Parallel Scheduling
We propose a model for scheduling jobs in a parallel machine setting that takes into account the cost of migrations by assuming that the processing time of a job may depend on the specific set of machines among which the job is migrated. For the makespan minimization objective, the model generalizes classical scheduling problems such as unrelated parallel machine scheduling, as well as novel ones such as semi-partitioned and clustered scheduling. In the case of a hierarchical family of machines, we derive a compact integer linear programming formulation of the problem and leverage its fractional relaxation to obtain a polynomial-time 2-approximation algorithm. Extensions that incorporate memory capacity constraints are also discussed
- …