2,702 research outputs found
Design, development and use of the finite element machine
Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained
Algorithms for Stable Matching and Clustering in a Grid
We study a discrete version of a geometric stable marriage problem originally
proposed in a continuous setting by Hoffman, Holroyd, and Peres, in which
points in the plane are stably matched to cluster centers, as prioritized by
their distances, so that each cluster center is apportioned a set of points of
equal area. We show that, for a discretization of the problem to an
grid of pixels with centers, the problem can be solved in time , and we experiment with two slower but more practical algorithms and
a hybrid method that switches from one of these algorithms to the other to gain
greater efficiency than either algorithm alone. We also show how to combine
geometric stable matchings with a -means clustering algorithm, so as to
provide a geometric political-districting algorithm that views distance in
economic terms, and we experiment with weighted versions of stable -means in
order to improve the connectivity of the resulting clusters.Comment: 23 pages, 12 figures. To appear (without the appendices) at the 18th
International Workshop on Combinatorial Image Analysis, June 19-21, 2017,
Plovdiv, Bulgari
Separation of pulsar signals from noise with supervised machine learning algorithms
We evaluate the performance of four different machine learning (ML)
algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ),
Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of
pulsars from radio frequency interference (RFI) and other sources of noise,
using a dataset obtained from the post-processing of a pulsar search pi peline.
This dataset was previously used for cross-validation of the SPINN-based
machine learning engine, used for the reprocessing of HTRU-S survey data
arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique
(SMOTE) to deal with high class imbalance in the dataset. We report a variety
of quality scores from all four of these algorithms on both the non-SMOTE and
SMOTE datasets. For all the above ML methods, we report high accuracy and
G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances
using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum
Relevance approach to report algorithm-agnostic feature ranking. From these
methods, we find that the signal to noise of the folded profile to be the best
feature. We find that all the ML algorithms report FPRs about an order of
magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for
the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and
Computin
Algorithms for Triangles, Cones & Peaks
Three different geometric objects are at the center of this dissertation: triangles, cones and peaks.
In computational geometry, triangles are the most basic shape for planar subdivisions.
Particularly, Delaunay triangulations are a widely used for manifold applications in engineering, geographic information systems, telecommunication networks, etc.
We present two novel parallel algorithms to construct the Delaunay triangulation of a given point set.
Yao graphs are geometric spanners that connect each point of a given set to its nearest neighbor in each of cones drawn around it.
They are used to aid the construction of Euclidean minimum spanning trees
or in wireless networks for topology control and routing.
We present the first implementation of an optimal -time sweepline algorithm to construct Yao graphs.
One metric to quantify the importance of a mountain peak is its isolation.
Isolation measures the distance between a peak and the closest point of higher elevation.
Computing this metric from high-resolution digital elevation models (DEMs) requires efficient algorithms.
We present a novel sweep-plane algorithm that can calculate the isolation of all peaks on Earth in mere minutes
Recommended from our members
A High-Performance Domain-Specific Language and Code Generator for General N-body Problems
General N-body problems are a set of problems in which an update to a single element in the system depends on every other element. N-body problems are ubiquitous, with applications in various domains ranging from scientific computing simulations in molecular dynamics, astrophysics, acoustics, and fluid dynamics all the way to computer vision, data mining and machine learning problems. Different N-body algorithms have been designed and implemented in these various fields. However, there is a big gap between the algorithm one designs on paper and the code that runs efficiently on a parallel system. It is time-consuming to write fast, parallel, and scalable code for these problems. On the other hand, the sheer scale and growth of modern scientific datasets necessitate exploiting the power of both parallel and approximation algorithms where there is a potential to trade-off accuracy for performance. The main problem that we are tackling in this thesis is how to automatically generate asymptotically optimal N-body algorithms from the high-level specification of the problem. We combine the body of work in performance optimizations, compilers and the domain of N-body problems to build a unified system where domain scientists can write programs at the high level while attaining performance of code written by an expert at the low level.In order to generate a high-performance, scalable code for this group of problems, we take the following steps in this thesis; first, we propose a unified algorithmic framework named PASCAL in order to address the challenge of designing a general algorithmic template to represent the class of N-body problems. PASCAL utilizes space-partitioning trees and user-controlled pruning/approximations to reduce the asymptotic runtime complexity from linear to logarithmic in the number of data points. In PASCAL, we design an algorithm that automatically generates conditions for pruning or approximation of an N-body problem considering the problem's definition. In order to evaluate PASCAL, we developed tree-based algorithms for six well-known problems: k-nearest neighbors, range search, minimum spanning tree, kernel density estimation, expectation maximization, and Hausdorff distance. We show that applying domain-specific optimizations and parallelization to the algorithms written in PASCAL achieves 10x to 230x speedup compared to state-of-the-art libraries on a dual-socket Intel Xeon processor with 16 cores on real-world datasets. Second, we extend the PASCAL framework to build PASCAL-X that adds support for NUMA-aware parallelization. PASCAL-X also presents insights on the influence of tuning parameters. Tuning parameters such as leaf size (influences the shape of the tree) and cut-off level (controls the granularity of tasks) of the space-partitioning trees result in performance improvement of up to 4.6x. A key goal is to generate scalable and high-performance code automatically without sacrificing productivity. That implies minimizing the effort the users have to put in to generate the desired high-performance code. Another critical factor is the adaptivity, which indicates the amount of effort that is required to extend the high-performance code generation to new N-body problems. Finally, we consider these factors and develop a domain-specific language and code generator named Portal, which is built on top of PASCAL-X. Portal's language design is inspired by the mathematical representation of N-body problems, resulting in an intuitive language for rapid implementation of a variety of problems. Portal's back-end is designed and implemented to generate optimized, parallel, and scalable implementations for multi-core systems. We demonstrate that the performance achieved by using Portal is comparable to that of expert hand-optimized code while providing productivity for domain scientists. For instance, using Portal for the k-nearest neighbors problem gains performance that is similar to the hand-optimized code, while reducing the lines of code by 68x. To the best of our knowledge, there are no known libraries or frameworks that implement parallel asymptotically optimal algorithms for the class of general N-body problems and this thesis primarily aims to fill this gap. Finally, we present a case study of Portal for the real-world problem of face clustering. In this case study, we show that Portal not only provides a fast solution for the face clustering problem with similar accuracy as the state-of-the-art algorithm, but also it provides productivity by implementing the face clustering algorithm in only 14 lines of Portal code
Understanding Disordered Systems Through Numerical Simulation and Algorithm Development
Disordered systems arise in many physical contexts. Not all matter is uni-
form, and impurities or heterogeneities can be modeled by fixed random disor-
der. Numerous complex networks also possess fixed disorder, leading to appli-
cations in transportation systems [1], telecommunications [2], social networks
[3, 4], and epidemic modeling [5], to name a few.
Due to their random nature and power law critical behavior, disordered
systems are difficult to study analytically. Numerical simulation can help
overcome this hurdle by allowing for the rapid computation of system states.
In order to get precise statistics and extrapolate to the thermodynamic limit,
large systems must be studied over many realizations. Thus, innovative al-
gorithm development is essential in order reduce memory or running time
requirements of simulations.
This thesis presents a review of disordered systems, as well as a thorough
study of two particular systems through numerical simulation, algorithm de-
velopment and optimization, and careful statistical analysis of scaling proper-
ties.
Chapter 1 provides a thorough overview of disordered systems, the his-
tory of their study in the physics community, and the development of tech-
niques used to study them. Topics of quenched disorder, phase transitions, the
renormalization group, criticality, and scale invariance are discussed. Several
prominent models of disordered systems are also explained. Lastly, analysis
techniques used in studying disordered systems are covered.
In Chapter 2, minimal spanning trees on critical percolation clusters are
studied, motivated in part by an analytic perturbation expansion by Jackson
and Read [6] that I check against numerical calculations. This system has a
direct mapping to the ground state of the strongly disordered spin glass [7].
We compute the path length fractal dimension of these trees in dimensions
d = {2, 3, 4, 5} and find our results to be compatible with the analytic results
suggested by Jackson and Read.
In Chapter 3, the random bond Ising ferromagnet is studied, which is es-
pecially useful since it serves as a prototype for more complicated disordered
systems such as the random field Ising model and spin glasses. We investigate
the effect that changing boundary spins has on the locations of domain walls
in the interior of the random ferromagnet system. We provide an analytic
proof that ground state domain walls in the two dimensional system are de-
composable, and we map these domain walls to a shortest paths problem. By
implementing a multiple-source shortest paths algorithm developed by Philip
Klein [8], we are able to efficiently probe domain wall locations for all possible
configurations of boundary spins. We consider lattices with uncorrelated dis-
order, as well as disorder that is spatially correlated according to a power law.
We present numerical results for the scaling exponent governing the probabil-
ity that a domain wall can be induced that passes through a particular location
in the system’s interior, and we compare these results to previous results on
the directed polymer problem
Fractal and Multifractal Scaling of Electrical Conduction in Random Resistor Networks
This article is a mini-review about electrical current flows in networks from
the perspective of statistical physics. We briefly discuss analytical methods
to solve the conductance of an arbitrary resistor network. We then turn to
basic results related to percolation: namely, the conduction properties of a
large random resistor network as the fraction of resistors is varied. We focus
on how the conductance of such a network vanishes as the percolation threshold
is approached from above. We also discuss the more microscopic current
distribution within each resistor of a large network. At the percolation
threshold, this distribution is multifractal in that all moments of this
distribution have independent scaling properties. We will discuss the meaning
of multifractal scaling and its implications for current flows in networks,
especially the largest current in the network. Finally, we discuss the relation
between resistor networks and random walks and show how the classic phenomena
of recurrence and transience of random walks are simply related to the
conductance of a corresponding electrical network.Comment: 27 pages & 10 figures; review article for the Encyclopedia of
Complexity and System Science (Springer Science
- …