93,938 research outputs found
Efficient Symmetry Reduction and the Use of State Symmetries for Symbolic Model Checking
One technique to reduce the state-space explosion problem in temporal logic
model checking is symmetry reduction. The combination of symmetry reduction and
symbolic model checking by using BDDs suffered a long time from the
prohibitively large BDD for the orbit relation. Dynamic symmetry reduction
calculates representatives of equivalence classes of states dynamically and
thus avoids the construction of the orbit relation. In this paper, we present a
new efficient model checking algorithm based on dynamic symmetry reduction. Our
experiments show that the algorithm is very fast and allows the verification of
larger systems. We additionally implemented the use of state symmetries for
symbolic symmetry reduction. To our knowledge we are the first who investigated
state symmetries in combination with BDD based symbolic model checking
The Odyssey Approach for Optimizing Federated SPARQL Queries
Answering queries over a federation of SPARQL endpoints requires combining
data from more than one data source. Optimizing queries in such scenarios is
particularly challenging not only because of (i) the large variety of possible
query execution plans that correctly answer the query but also because (ii)
there is only limited access to statistics about schema and instance data of
remote sources. To overcome these challenges, most federated query engines rely
on heuristics to reduce the space of possible query execution plans or on
dynamic programming strategies to produce optimal plans. Nevertheless, these
plans may still exhibit a high number of intermediate results or high execution
times because of heuristics and inaccurate cost estimations. In this paper, we
present Odyssey, an approach that uses statistics that allow for a more
accurate cost estimation for federated queries and therefore enables Odyssey to
produce better query execution plans. Our experimental results show that
Odyssey produces query execution plans that are better in terms of data
transfer and execution time than state-of-the-art optimizers. Our experiments
using the FedBench benchmark show execution time gains of at least 25 times on
average.Comment: 16 pages, 10 figure
Shared Memory Parallel Subgraph Enumeration
The subgraph enumeration problem asks us to find all subgraphs of a target
graph that are isomorphic to a given pattern graph. Determining whether even
one such isomorphic subgraph exists is NP-complete---and therefore finding all
such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration
has applications in many fields, including biochemistry and social networks,
and interestingly the fastest algorithms for solving the problem for
biochemical inputs are sequential. Since they depend on depth-first tree
traversal, an efficient parallelization is far from trivial. Nevertheless,
since important applications produce data sets with increasing difficulty,
parallelism seems beneficial.
We thus present here a shared-memory parallelization of the state-of-the-art
subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs)
by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing
and our implementation demonstrates a significant speedup on real-world
biochemical data---despite a highly irregular data access pattern. We also
improve RI-DS by pruning the search space better; this further improves the
empirical running times compared to the already highly tuned RI-DS.Comment: 18 pages, 12 figures, To appear at the 7th IEEE Workshop on Parallel
/ Distributed Computing and Optimization (PDCO 2017
Property-Driven Fence Insertion using Reorder Bounded Model Checking
Modern architectures provide weaker memory consistency guarantees than
sequential consistency. These weaker guarantees allow programs to exhibit
behaviours where the program statements appear to have executed out of program
order. Fortunately, modern architectures provide memory barriers (fences) to
enforce the program order between a pair of statements if needed. Due to the
intricate semantics of weak memory models, the placement of fences is
challenging even for experienced programmers. Too few fences lead to bugs
whereas overuse of fences results in performance degradation. This motivates
automated placement of fences. Tools that restore sequential consistency in the
program may insert more fences than necessary for the program to be correct.
Therefore, we propose a property-driven technique that introduces
"reorder-bounded exploration" to identify the smallest number of program
locations for fence placement. We implemented our technique on top of CBMC;
however, in principle, our technique is generic enough to be used with any
model checker. Our experimental results show that our technique is faster and
solves more instances of relevant benchmarks as compared to earlier approaches.Comment: 18 pages, 3 figures, 4 algorithms. Version change reason : new set of
results and publication ready version of FM 201
A low-energy rate-adaptive bit-interleaved passive optical network
Energy consumption of customer premises equipment (CPE) has become a serious issue in the new generations of time-division multiplexing passive optical networks, which operate at 10 Gb/s or higher. It is becoming a major factor in global network energy consumption, and it poses problems during emergencies when CPE is battery-operated. In this paper, a low-energy passive optical network (PON) that uses a novel bit-interleaving downstream protocol is proposed. The details about the network architecture, protocol, and the key enabling implementation aspects, including dynamic traffic interleaving, rate-adaptive descrambling of decimated traffic, and the design and implementation of a downsampling clock and data recovery circuit, are described. The proposed concept is shown to reduce the energy consumption for protocol processing by a factor of 30. A detailed analysis of the energy consumption in the CPE shows that the interleaving protocol reduces the total energy consumption of the CPE significantly in comparison to the standard 10 Gb/s PON CPE. Experimental results obtained from measurements on the implemented CPE prototype confirm that the CPE consumes significantly less energy than the standard 10 Gb/s PON CPE
swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture
The flourish of deep learning frameworks and hardware platforms has been
demanding an efficient compiler that can shield the diversity in both software
and hardware in order to provide application portability. Among the exiting
deep learning compilers, TVM is well known for its efficiency in code
generation and optimization across diverse hardware devices. In the meanwhile,
the Sunway many-core processor renders itself as a competitive candidate for
its attractive computational power in both scientific and deep learning
applications. This paper combines the trends in these two directions.
Specifically, we propose swTVM that extends the original TVM to support
ahead-of-time compilation for architecture requiring cross-compilation such as
Sunway. In addition, we leverage the architecture features during the
compilation such as core group for massive parallelism, DMA for high bandwidth
memory transfer and local device memory for data locality, in order to generate
efficient code for deep learning application on Sunway. The experimental
results show the ability of swTVM to automatically generate code for various
deep neural network models on Sunway. The performance of automatically
generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup
on average than hand-optimized OpenACC implementations on convolution and fully
connected layers respectively. This work is the first attempt from the compiler
perspective to bridge the gap of deep learning and high performance
architecture particularly with productivity and efficiency in mind. We would
like to open source the implementation so that more people can embrace the
power of deep learning compiler and Sunway many-core processor
Optimizing the double description method for normal surface enumeration
Many key algorithms in 3-manifold topology involve the enumeration of normal
surfaces, which is based upon the double description method for finding the
vertices of a convex polytope. Typically we are only interested in a small
subset of these vertices, thus opening the way for substantial optimization.
Here we give an account of the vertex enumeration problem as it applies to
normal surfaces, and present new optimizations that yield strong improvements
in both running time and memory consumption. The resulting algorithms are
tested using the freely available software package Regina.Comment: 27 pages, 12 figures; v2: Removed the 3^n bound from Section 3.3,
fixed the projective equation in Lemma 4.4, clarified "most triangulations"
in the introduction to section 5; v3: replace -ise with -ize for Mathematics
of Computation (note that this changes the title of the paper
- …