Search CORE

1,408 research outputs found

Logic Programming approaches for routing fault-free and maximally-parallel Wavelength Routed Optical Networks on Chip (Application paper)

Author: Bertozzi Davide
Gavanelli Marco
Nonato Maddalena
Peano Andrea
Publication venue
Publication date: 01/01/2017
Field of study

One promising trend in digital system integration consists of boosting on-chip communication performance by means of silicon photonics, thus materializing the so-called Optical Networks-on-Chip (ONoCs). Among them, wavelength routing can be used to route a signal to destination by univocally associating a routing path to the wavelength of the optical carrier. Such wavelengths should be chosen so to minimize interferences among optical channels and to avoid routing faults. As a result, physical parameter selection of such networks requires the solution of complex constrained optimization problems. In previous work, published in the proceedings of the International Conference on Computer-Aided Design, we proposed and solved the problem of computing the maximum parallelism obtainable in the communication between any two endpoints while avoiding misrouting of optical signals. The underlying technology, only quickly mentioned in that paper, is Answer Set Programming (ASP). In this work, we detail the ASP approach we used to solve such problem. Another important design issue is to select the wavelengths of optical carriers such that they are spread across the available spectrum, in order to reduce the likelihood that, due to imperfections in the manufacturing process, unintended routing faults arise. We show how to address such problem in Constraint Logic Programming on Finite Domains (CLP(FD)). This paper is under consideration for possible publication on Theory and Practice of Logic Programming.Comment: Paper presented at the 33nd International Conference on Logic Programming (ICLP 2017), Melbourne, Australia, August 28 to September 1, 2017. 16 pages, LaTeX, 5 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Ferrara

Regularized Ordinal Regression and the ordinalNet R Package

Author: Hanlon Bret M.
Rathouz Paul J.
Wurm Michael J.
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/09/2021
Field of study

Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the elementwise link multinomial-ordinal class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class

PubMed Central

Journal of Statistical Software

Novel Parallelization Techniques for Computer Graphics Applications

Author: Sanz Villafruela Diego
Publication venue
Publication date: 06/04/2021
Field of study

Increasingly complex and data-intensive algorithms in computer graphics applications require software engineers to find ways of improving performance and scalability to satisfy the requirements of customers and users. Parallelizing and tailoring each algorithm of each specific application is a time-consuming task and its implementation is domain-specific because it can not be reused outside the specific problem in which the algorithm is defined. Identifying reusable parallelization patterns that can be extrapolated and applied to other different algorithms is an essential task needed in order to provide consistent parallelization improvements and reduce the development time of evolving a sequential algorithm into a parallel one. This thesis focuses on defining general and efficient parallelization techniques and approaches that can be followed in order to parallelize complex 3D graphic algorithms. These parallelization patterns can be easily applied in order to convert most kinds of sequential complex and data-intensive algorithms to parallel ones obtaining consistent optimization results. The main idea in the thesis is to use multi-threading techniques to improve the parallelization and core utilization of 3D algorithms. Most of the 3D algorithms apply similar repetitive independent operations on a vast amount of 3D data. These application characteristics bring the opportunity of applying multi-thread parallelization techniques on such applications. The efficiency of the proposed idea is tested on two common computer graphics algorithms: hidden-line removal and collision detection. Both algorithms are data-intensive algorithms, whose conversions from a sequential to a multithread implementation introduce challenges, due to their complexities and the fact that elements in their data have different sizes and complexities, producing work-load imbalances and asymmetries between processing elements. The results show that the proposed principles and patterns can be easily applied to both algorithms, transforming their sequential to multithread implementations, obtaining consistent optimization results proportional to the number of processing elements. From the work done in this thesis, it is concluded that the suggested parallelization warrants further study and development in order to extend its usage to heterogeneous platforms such as a Graphical Processing Unit (GPU). OpenCL is the most feasible framework to explore in the future due to its interoperability among different platforms

UTUPub

Machine Learning Using Serverless Computing

Author: Naik Vidish
Publication venue: SJSU ScholarWorks
Publication date: 24/05/2021
Field of study

Machine learning has been trending in the domain of computer science for quite some time. Newer and newer models and techniques are being developed every day. The adoption of cloud computing has only expedited the process of training machine learning. With its variety of services, cloud computing provides many options for training machine learning models. Leveraging these services is up to the user. Serverless computing is an important service offered by cloud service providers. It is useful for short tasks that are event-driven or periodic. Machine learning training can be divided into short tasks or batches to take advantage of this. Due to the nature of serverless computing, there are certain limitations imposed by the cloud service provider such as execution time and memory. This research proposes standalone solutions to overcome the challenges faced by serverless computing in training machine learning models. The research further combines these individual solutions and proposes a system for leveraging serverless computing for training a machine learning model that incorporates distributed machine learning

SJSU ScholarWorks

Solution of the Skyrme-Hartree-Fock-Bogolyubov equations in the Cartesian deformed harmonic-oscillator basis. (VII) HFODD (v2.49t): a new version of the program

Author: A. Staszczak
Anguiano
Baran
Bolsterli
Bonche
Bonche
Caurier
Caurier
Chabanat
Davies
Dechargé
Diebel
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Dobaczewski
Egido
Engelbrecht
Gall
Giannoni
Giannoni
Goodman
Heisenberg
Hestenes
J. Dobaczewski
J. McDonnell
J.A. Sheikh
Kortelainen
Kruppa
Lipkin
M. Stoitsov
Martin
N. Schunck
P. Toivanen
Pei
Powell
Rafalski
Ring
Robledo
Satuła
Satuła
Satuła
Satuła
Satuła
Schunck
Sheikh
Staszczak
Staszczak
Stoitsov
Stoitsov
Strutinsky
Strutinsky
Towner
Varshalovich
Vertse
W. Satuła
Werner
Wigner
Younes
Zduńczuk
Zduńczuk
Publication venue: 'Elsevier BV'
Publication date: 03/03/2011
Field of study

We describe the new version (v2.49t) of the code HFODD which solves the nuclear Skyrme Hartree-Fock (HF) or Skyrme Hartree-Fock-Bogolyubov (HFB) problem by using the Cartesian deformed harmonic-oscillator basis. In the new version, we have implemented the following physics features: (i) the isospin mixing and projection, (ii) the finite temperature formalism for the HFB and HF+BCS methods, (iii) the Lipkin translational energy correction method, (iv) the calculation of the shell correction. A number of specific numerical methods have also been implemented in order to deal with large-scale multi-constraint calculations and hardware limitations: (i) the two-basis method for the HFB method, (ii) the Augmented Lagrangian Method (ALM) for multi-constraint calculations, (iii) the linear constraint method based on the approximation of the RPA matrix for multi-constraint calculations, (iv) an interface with the axial and parity-conserving Skyrme-HFB code HFBTHO, (v) the mixing of the HF or HFB matrix elements instead of the HF fields. Special care has been paid to using the code on massively parallel leadership class computers. For this purpose, the following features are now available with this version: (i) the Message Passing Interface (MPI) framework, (ii) scalable input data routines, (iii) multi-threading via OpenMP pragmas, (iv) parallel diagonalization of the HFB matrix in the simplex breaking case using the ScaLAPACK library. Finally, several little significant errors of the previous published version were corrected.Comment: Accepted for publication to Computer Physics Communications. Program files re-submitted to Comp. Phys. Comm. Program Library after correction of several minor bug

arXiv.org e-Print Archive

Crossref

UNT Digital Library

Parallel For Loops on Heterogeneous Resources

Author: Weber Frederick Edward
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2012
Field of study

In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously

University of Tennessee, Knoxville: Trace

Parallel processing and expert systems

Author: Lau Sonie
Yan Jerry C.
Publication venue
Publication date
Field of study

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited

NASA Technical Reports Server

Towards exascale BEM simulations: hybrid parallelisation strategies for boundary element methods

Author
Publication venue: SISSA ; ICTP
Publication date: 18/12/2015
Field of study

Many fields of engineering benefit from an accurate and reliable solver for the Laplace equation. Such an equation is able to model many different phenomena, and is at the base of several multi-physics solvers. For example, in nautical engineering, since the Navier{Stokes system has an extremely high computational cost, many reduced order models are often used to predict ship performance. Under the assumption of incompressible fluid and irrotational flow it is possible to recover a flow field by simply imposing mass conservation, which simplifies to a Laplace equation. Morevore, the deep theoretical background that surrounds this equation, makes it ideal as a benchmark to test new numerical softwares. Over the last decades such equation has often been solved through its Boundary integral formulation, leading to Boundary Element Methods. What makes such methods appealing with respect to a classical Finite Element Method is the fact that they only require discretisation of the boundary. The purpose of the present work is to develop an effcient and optimize BEM for the Laplace equation, designed around the architecture of modern CPUs

Sissa Digital Library

Concrete resource analysis of the quantum linear system algorithm used to compute the electromagnetic scattering cross section of a 2D target

Author: Alexander Scott
Berg Eric van den
Chapuran Thomas E.
Mau Siun-Chuon
Scherer Artur
Valiron Benoît
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/07/2016
Field of study

We provide a detailed estimate for the logical resource requirements of the quantum linear system algorithm (QLSA) [Phys. Rev. Lett. 103, 150502 (2009)] including the recently described elaborations [Phys. Rev. Lett. 110, 250504 (2013)]. Our resource estimates are based on the standard quantum-circuit model of quantum computation; they comprise circuit width, circuit depth, the number of qubits and ancilla qubits employed, and the overall number of elementary quantum gate operations as well as more specific gate counts for each elementary fault-tolerant gate from the standard set {X, Y, Z, H, S, T, CNOT}. To perform these estimates, we used an approach that combines manual analysis with automated estimates generated via the Quipper quantum programming language and compiler. Our estimates pertain to the example problem size N=332,020,680 beyond which, according to a crude big-O complexity comparison, QLSA is expected to run faster than the best known classical linear-system solving algorithm. For this problem size, a desired calculation accuracy 0.01 requires an approximate circuit width 340 and circuit depth of order

10^{25}

if oracle costs are excluded, and a circuit width and depth of order

10^8

and

10^{29}

, respectively, if oracle costs are included, indicating that the commonly ignored oracle resources are considerable. In addition to providing detailed logical resource estimates, it is also the purpose of this paper to demonstrate explicitly how these impressively large numbers arise with an actual circuit implementation of a quantum algorithm. While our estimates may prove to be conservative as more efficient advanced quantum-computation techniques are developed, they nevertheless provide a valid baseline for research targeting a reduction of the resource requirements, implying that a reduction by many orders of magnitude is necessary for the algorithm to become practical.Comment: 37 pages, 40 figure

arXiv.org e-Print Archive

Springer - Publisher Connector

Macquarie University ResearchOnline