Search CORE

27,962 research outputs found

Analysis of Heuristics for Number Partitioning

Author
Publication venue: 'Wiley'
Publication date
Field of study

Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

Author: Elango Venmugil
Fauzia Naznin
Pouchet Louis-Noël
Ramanujam J.
Rastello Fabrice
Ravishankar Mahesh
Rountev Atanas
Sadayappan P.
Publication venue
Publication date: 01/12/2013
Field of study

Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality. Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence preserving transformations that change the execution schedule of the operations in the computation. In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence preserving transformations. The execution trace of a code is analyzed to extract a computational directed acyclic graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.Comment: Transaction on Architecture and Code Optimization (2014

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

The Simulation Model Partitioning Problem: an Adaptive Solution Based on Self-Clustering (Extended Version)

Author: D'Angelo Gabriele
Publication venue: 'Elsevier BV'
Publication date: 04/11/2016
Field of study

This paper is about partitioning in parallel and distributed simulation. That means decomposing the simulation model into a numberof components and to properly allocate them on the execution units. An adaptive solution based on self-clustering, that considers both communication reduction and computational load-balancing, is proposed. The implementation of the proposed mechanism is tested using a simulation model that is challenging both in terms of structure and dynamicity. Various configurations of the simulation model and the execution environment have been considered. The obtained performance results are analyzed using a reference cost model. The results demonstrate that the proposed approach is promising and that it can reduce the simulation execution time in both parallel and distributed architectures

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Fast Deterministic Selection

Author: Alexandrescu Andrei
Publication venue
Publication date: 04/08/2016
Field of study

The Median of Medians (also known as BFPRT) algorithm, although a landmark theoretical achievement, is seldom used in practice because it and its variants are slower than simple approaches based on sampling. The main contribution of this paper is a fast linear-time deterministic selection algorithm QuickselectAdaptive based on a refined definition of MedianOfMedians. The algorithm's performance brings deterministic selection---along with its desirable properties of reproducible runs, predictable run times, and immunity to pathological inputs---in the range of practicality. We demonstrate results on independent and identically distributed random inputs and on normally-distributed inputs. Measurements show that QuickselectAdaptive is faster than state-of-the-art baselines.Comment: Pre-publication draf

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Postponing Branching Decisions

Author: Milano Michela
van Hoeve Willem Jan
Publication venue
Publication date: 01/01/2004
Field of study

Solution techniques for Constraint Satisfaction and Optimisation Problems often make use of backtrack search methods, exploiting variable and value ordering heuristics. In this paper, we propose and analyse a very simple method to apply in case the value ordering heuristic produces ties: postponing the branching decision. To this end, we group together values in a tie, branch on this sub-domain, and defer the decision among them to lower levels of the search tree. We show theoretically and experimentally that this simple modification can dramatically improve the efficiency of the search strategy. Although in practise similar methods may have been applied already, to our knowledge, no empirical or theoretical study has been proposed in the literature to identify when and to what extent this strategy should be used.Comment: 11 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

International Migration, Integration and Social Cohesion online publications

An efficient algorithm for the parallel solution of high-dimensional differential equations

Author: Benner
Bjorhus
Burrage
Cong Liu
Henzinger
Holmes
Jeong
Leimkuhler
Leimkuhler
Lumsdaine
Marsan
Marsan
Marsan
Marsan
McMillan
Michael Dellnitz
Molla-Hosseini
Palladino
Ren
Rowley
Stefan Klus
Stoer
Trefethen
Tuhin Sahai
von Luxburg
White
Woodside
Publication venue: 'Elsevier BV'
Publication date: 26/10/2010
Field of study

The study of high-dimensional differential equations is challenging and difficult due to the analytical and computational intractability. Here, we improve the speed of waveform relaxation (WR), a method to simulate high-dimensional differential-algebraic equations. This new method termed adaptive waveform relaxation (AWR) is tested on a communication network example. Further we propose different heuristics for computing graph partitions tailored to adaptive waveform relaxation. We find that AWR coupled with appropriate graph partitioning methods provides a speedup by a factor between 3 and 16

arXiv.org e-Print Archive

Crossref

Elsevier - Publisher Connector

Heriot Watt Pure

Fast Shortest Path Distance Estimation in Large Networks

Author: Castillo Carlos
Francesco Bonchi
Gionis Aristides
Potamias Michalis
Publication venue: Boston University Computer Science Department
Publication date: 09/03/2009
Field of study

We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications. In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random. Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship

Boston University Institutional Repository (OpenBU)

Considerations about multistep community detection

Author: A Broder
A Clauset
A Lancichinetti
AL Barabási
BH Good
FD Malliaros
HP Kriegel
J Reichardt
JC Bezdek
L Danon
M Belkin
M Girvan
ME Newman
ME Newman
ME Newman
P Krapivsky
R Kannan
S Fortunato
S Fortunato
TF Chan
VD Blondel
W Zhang
Publication venue
Publication date: 27/02/2014
Field of study

The problem and implications of community detection in networks have raised a huge attention, for its important applications in both natural and social sciences. A number of algorithms has been developed to solve this problem, addressing either speed optimization or the quality of the partitions calculated. In this paper we propose a multi-step procedure bridging the fastest, but less accurate algorithms (coarse clustering), with the slowest, most effective ones (refinement). By adopting heuristic ranking of the nodes, and classifying a fraction of them as `critical', a refinement step can be restricted to this subset of the network, thus saving computational time. Preliminary numerical results are discussed, showing improvement of the final partition.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca- Università del Salento