1,408 research outputs found
Logic Programming approaches for routing fault-free and maximally-parallel Wavelength Routed Optical Networks on Chip (Application paper)
One promising trend in digital system integration consists of boosting
on-chip communication performance by means of silicon photonics, thus
materializing the so-called Optical Networks-on-Chip (ONoCs). Among them,
wavelength routing can be used to route a signal to destination by univocally
associating a routing path to the wavelength of the optical carrier. Such
wavelengths should be chosen so to minimize interferences among optical
channels and to avoid routing faults. As a result, physical parameter selection
of such networks requires the solution of complex constrained optimization
problems. In previous work, published in the proceedings of the International
Conference on Computer-Aided Design, we proposed and solved the problem of
computing the maximum parallelism obtainable in the communication between any
two endpoints while avoiding misrouting of optical signals. The underlying
technology, only quickly mentioned in that paper, is Answer Set Programming
(ASP). In this work, we detail the ASP approach we used to solve such problem.
Another important design issue is to select the wavelengths of optical
carriers such that they are spread across the available spectrum, in order to
reduce the likelihood that, due to imperfections in the manufacturing process,
unintended routing faults arise. We show how to address such problem in
Constraint Logic Programming on Finite Domains (CLP(FD)).
This paper is under consideration for possible publication on Theory and
Practice of Logic Programming.Comment: Paper presented at the 33nd International Conference on Logic
Programming (ICLP 2017), Melbourne, Australia, August 28 to September 1,
2017. 16 pages, LaTeX, 5 figure
Regularized Ordinal Regression and the ordinalNet R Package
Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the elementwise link multinomial-ordinal class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class
Novel Parallelization Techniques for Computer Graphics Applications
Increasingly complex and data-intensive algorithms in computer graphics applications require software engineers to find ways of improving performance and scalability to satisfy the requirements of customers and users. Parallelizing and tailoring each algorithm of each specific application is a time-consuming task and its implementation is domain-specific because it can not be reused outside the specific problem in which the algorithm is defined. Identifying reusable parallelization patterns that can be extrapolated and applied to other different algorithms is an essential task needed in order to provide consistent parallelization improvements and reduce the development time of evolving a sequential algorithm into a parallel one.
This thesis focuses on defining general and efficient parallelization techniques and approaches that can be followed in order to parallelize complex 3D graphic algorithms. These parallelization patterns can be easily applied in order to convert most kinds of sequential complex and data-intensive algorithms to parallel ones obtaining consistent optimization results.
The main idea in the thesis is to use multi-threading techniques to improve the parallelization and core utilization of 3D algorithms. Most of the 3D algorithms apply similar repetitive independent operations on a vast amount of 3D data. These application characteristics bring the opportunity of applying multi-thread parallelization techniques on such applications. The efficiency of the proposed idea is tested on two common computer graphics algorithms: hidden-line removal and collision detection. Both algorithms are data-intensive algorithms, whose conversions from a sequential to a multithread implementation introduce challenges, due to their complexities and the fact that elements in their data have different sizes and complexities, producing work-load imbalances and asymmetries between processing elements.
The results show that the proposed principles and patterns can be easily applied to both algorithms, transforming their sequential to multithread implementations, obtaining consistent optimization results proportional to the number of processing elements. From the work done in this thesis, it is concluded that the suggested parallelization warrants further study and development in order to extend its usage to heterogeneous platforms such as a Graphical Processing Unit (GPU). OpenCL is the most feasible framework to explore in the future due to its interoperability among different platforms
Machine Learning Using Serverless Computing
Machine learning has been trending in the domain of computer science for quite some time. Newer and newer models and techniques are being developed every day. The adoption of cloud computing has only expedited the process of training machine learning. With its variety of services, cloud computing provides many options for training machine learning models. Leveraging these services is up to the user. Serverless computing is an important service offered by cloud service providers. It is useful for short tasks that are event-driven or periodic. Machine learning training can be divided into short tasks or batches to take advantage of this. Due to the nature of serverless computing, there are certain limitations imposed by the cloud service provider such as execution time and memory. This research proposes standalone solutions to overcome the challenges faced by serverless computing in training machine learning models. The research further combines these individual solutions and proposes a system for leveraging serverless computing for training a machine learning model that incorporates distributed machine learning
Solution of the Skyrme-Hartree-Fock-Bogolyubov equations in the Cartesian deformed harmonic-oscillator basis. (VII) HFODD (v2.49t): a new version of the program
We describe the new version (v2.49t) of the code HFODD which solves the
nuclear Skyrme Hartree-Fock (HF) or Skyrme Hartree-Fock-Bogolyubov (HFB)
problem by using the Cartesian deformed harmonic-oscillator basis. In the new
version, we have implemented the following physics features: (i) the isospin
mixing and projection, (ii) the finite temperature formalism for the HFB and
HF+BCS methods, (iii) the Lipkin translational energy correction method, (iv)
the calculation of the shell correction. A number of specific numerical methods
have also been implemented in order to deal with large-scale multi-constraint
calculations and hardware limitations: (i) the two-basis method for the HFB
method, (ii) the Augmented Lagrangian Method (ALM) for multi-constraint
calculations, (iii) the linear constraint method based on the approximation of
the RPA matrix for multi-constraint calculations, (iv) an interface with the
axial and parity-conserving Skyrme-HFB code HFBTHO, (v) the mixing of the HF or
HFB matrix elements instead of the HF fields. Special care has been paid to
using the code on massively parallel leadership class computers. For this
purpose, the following features are now available with this version: (i) the
Message Passing Interface (MPI) framework, (ii) scalable input data routines,
(iii) multi-threading via OpenMP pragmas, (iv) parallel diagonalization of the
HFB matrix in the simplex breaking case using the ScaLAPACK library. Finally,
several little significant errors of the previous published version were
corrected.Comment: Accepted for publication to Computer Physics Communications. Program
files re-submitted to Comp. Phys. Comm. Program Library after correction of
several minor bug
Parallel For Loops on Heterogeneous Resources
In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
Towards exascale BEM simulations: hybrid parallelisation strategies for boundary element methods
Many fields of engineering benefit from an accurate and reliable solver for the
Laplace equation. Such an equation is able to model many different phenomena,
and is at the base of several multi-physics solvers. For example, in nautical engineering, since the Navier{Stokes system has an extremely high computational cost, many reduced order models are often used to predict ship performance. Under the assumption of incompressible fluid and irrotational flow it is possible to recover a flow field by simply imposing mass conservation, which simplifies to a Laplace equation.
Morevore, the deep theoretical background that surrounds this equation, makes it
ideal as a benchmark to test new numerical softwares. Over the last decades such equation has often been solved through its Boundary
integral formulation, leading to Boundary Element Methods. What makes such
methods appealing with respect to a classical Finite Element Method is the fact
that they only require discretisation of the boundary.
The purpose of the present work is to develop an effcient and optimize BEM for
the Laplace equation, designed around the architecture of modern CPUs
Concrete resource analysis of the quantum linear system algorithm used to compute the electromagnetic scattering cross section of a 2D target
We provide a detailed estimate for the logical resource requirements of the
quantum linear system algorithm (QLSA) [Phys. Rev. Lett. 103, 150502 (2009)]
including the recently described elaborations [Phys. Rev. Lett. 110, 250504
(2013)]. Our resource estimates are based on the standard quantum-circuit model
of quantum computation; they comprise circuit width, circuit depth, the number
of qubits and ancilla qubits employed, and the overall number of elementary
quantum gate operations as well as more specific gate counts for each
elementary fault-tolerant gate from the standard set {X, Y, Z, H, S, T, CNOT}.
To perform these estimates, we used an approach that combines manual analysis
with automated estimates generated via the Quipper quantum programming language
and compiler. Our estimates pertain to the example problem size N=332,020,680
beyond which, according to a crude big-O complexity comparison, QLSA is
expected to run faster than the best known classical linear-system solving
algorithm. For this problem size, a desired calculation accuracy 0.01 requires
an approximate circuit width 340 and circuit depth of order if oracle
costs are excluded, and a circuit width and depth of order and
, respectively, if oracle costs are included, indicating that the
commonly ignored oracle resources are considerable. In addition to providing
detailed logical resource estimates, it is also the purpose of this paper to
demonstrate explicitly how these impressively large numbers arise with an
actual circuit implementation of a quantum algorithm. While our estimates may
prove to be conservative as more efficient advanced quantum-computation
techniques are developed, they nevertheless provide a valid baseline for
research targeting a reduction of the resource requirements, implying that a
reduction by many orders of magnitude is necessary for the algorithm to become
practical.Comment: 37 pages, 40 figure
- …