    An Improved Tight Closure Algorithm for Integer Octagonal Constraints

    Integer octagonal constraints (a.k.a. ``Unit Two Variables Per Inequality'' or ``UTVPI integer constraints'') constitute an interesting class of constraints for the representation and solution of integer problems in the fields of constraint programming and formal analysis and verification of software and hardware systems, since they couple algorithms having polynomial complexity with a relatively good expressive power. The main algorithms required for the manipulation of such constraints are the satisfiability check and the computation of the inferential closure of a set of constraints. The latter is called `tight' closure to mark the difference with the (incomplete) closure algorithm that does not exploit the integrality of the variables. In this paper we present and fully justify an O(n^3) algorithm to compute the tight closure of a set of UTVPI integer constraints.Comment: 15 pages, 2 figure

    Relational Abstract Domain of Weighted Hexagons

    AbstractWe propose a new numerical abstract domain for static analysis by abstract interpretation, the domain of Weighted Hexagons. It is capable of expressing interval constraints and relational invariants of the form x⩽a⋅y, where x and y are variables and a denotes a non-negative constant. This kind of domain is useful in analysis of safety for array accesses when multiplication is used (e.g. in guarding formulæ or in access expressions). We provide all standard abstract domain operations, including widening operator, as well as a graph-based algorithm for checking satisfiability and computing normal form for elements of the domain. All described operations are performed in O(n3) time. Expressiveness of this domain lies between the Pentagons by Logozzo and Fähndrich and the Two Variables Per Inequality by Simon, King and Howe

    The Parma Polyhedra Library: Toward a Complete Set of Numerical Abstractions for the Analysis and Verification of Hardware and Software Systems

    Since its inception as a student project in 2001, initially just for the handling (as the name implies) of convex polyhedra, the Parma Polyhedra Library has been continuously improved and extended by joining scrupulous research on the theoretical foundations of (possibly non-convex) numerical abstractions to a total adherence to the best available practices in software development. Even though it is still not fully mature and functionally complete, the Parma Polyhedra Library already offers a combination of functionality, reliability, usability and performance that is not matched by similar, freely available libraries. In this paper, we present the main features of the current version of the library, emphasizing those that distinguish it from other similar libraries and those that are important for applications in the field of analysis and verification of hardware and software systems.Comment: 38 pages, 2 figures, 3 listings, 3 table

    X-MAP A Performance Prediction Tool for Porting Algorithms and Applications to Accelerators

    Most modern high-performance computing systems comprise of one or more accelerators with varying architectures in addition to traditional multicore Central Processing Units (CPUs). Examples of these accelerators include Graphic Processing Units (GPU) and Intel’s Many Integrated Cores architecture called Xeon Phi (PHI). These architectures provide massive parallel computation capabilities, which provide substantial performance benefits over traditional CPUs for a variety of scientific applications. We know that all accelerators are not similar because each of them has their own unique architecture. This difference in the underlying architecture plays a crucial role in determining if a given accelerator will provide a significant speedup over its competition. In addition to the architecture itself, one more differentiating factor for these accelerators is the programming language used to program them. For example, Nvidia GPUs can be programmed using Compute Unified Device Architecture (CUDA) and OpenCL while Intel Xeon PHIs can be programmed using OpenMP and OpenCL. The choice of programming language also plays a critical role in the speedup obtained depending on how close the language is to the hardware in addition to the level of optimization. With that said, it is thus very difficult for an application developer to choose the ideal accelerator to achieve the best possible speedup. In light of this, we present an easy to use Graphical User Interface (GUI) Tool called X-MAP which is a performance prediction tool for porting algorithms and applications to architectures which encompasses a Machine Learning based inference model to predict the performance of an applica-tion on a number of well-known accelerators and at the same time predict the best architecture and programming language for the application. We do this by collecting hardware counters from a given application and predicting run time by providing this data as inputs to a Neural Network Regressor based inference model. We predict the architecture and associated programming language by pro viding the hardware counters as inputs to an inference model based on Random Forest Classification Model. Finally, with a mean absolute prediction error of 8.52 and features such as syntax high-lighting for multiple programming languages, a function-wise breakdown of the entire application to understand bottlenecks and the ability for end users to submit their own prediction models to further improve the system, makes X-MAP a unique tool that has a significant edge over existing performance prediction solutions

    Exact Join Detection for Convex Polyhedra and Other Numerical Abstractions

    Deciding whether the union of two convex polyhedra is itself a convex polyhedron is a basic problem in polyhedral computations; having important applications in the field of constrained control and in the synthesis, analysis, verification and optimization of hardware and software systems. In such application fields though, general convex polyhedra are just one among many, so-called, numerical abstractions, which range from restricted families of (not necessarily closed) convex polyhedra to non-convex geometrical objects. We thus tackle the problem from an abstract point of view: for a wide range of numerical abstractions that can be modeled as bounded join-semilattices --that is, partial orders where any finite set of elements has a least upper bound--, we show necessary and sufficient conditions for the equivalence between the lattice-theoretic join and the set-theoretic union. For the case of closed convex polyhedra --which, as far as we know, is the only one already studied in the literature-- we improve upon the state-of-the-art by providing a new algorithm with a better worst-case complexity. The results and algorithms presented for the other numerical abstractions are new to this paper. All the algorithms have been implemented, experimentally validated, and made available in the Parma Polyhedra Library.Comment: 36 pages, 4 figure

    A dependency-aware parallel programming model

    Designing parallel codes is hard. One of the most important roadblocks to parallel programming is the presence of data dependencies. These restrict parallelism and, in general, to work them around requires complex analysis and leads to convoluted solutions that decrease the quality of the code. This thesis proposes a solution to parallel programming that incorporates data dependencies into the model. The programming model can handle that information and to dynamically find parallelism that otherwise would be hard to find. This approach improves both programmability and parallelism, and thus performance. While this problem has already been solved in OpenMP 4 at the time of this publication, this research begun before the problem was even being considered for OpenMP 3. In fact, some of the contributions of this thesis have had an influence on the approach taken in OpenMP 4. However, the contributions go beyond that and cover aspects that have not been considered yet in OpenMP 4. The approach we propose is based on function-level dependencies across disjoint blocks of contiguous memory. While finding dependencies under those constraints is simple, it is much harder to do so over strided and possibly partially overlapping sets of data. This thesis also proposes a solution to this problem. By doing so, we increase the range of applicability of the original solution and increase the span of applicability of the programming model. OpenMP4 does not currently cover this aspect. Finally, we present a solution to take advantage of the performance characteristics of Non-Uniform Memory Access architectures. Our proposal is at the programming model level and does not require changes in the code. It automatically distributes the data and does not rely on data migration nor replication. Instead, it is based exclusively on scheduling the computations. While this process is automatic, it can be tuned through minor changes in the code that do not require any change in the programming model. Throughout the thesis, we demonstrate the effectiveness of the proposal through benchmarks that are either hard to program using other paradigms or that have different solutions. In most cases, our solutions perform either on par or better than already existing solutions. This includes the implementations available in well-known high-performance parallel libraries.Dissenyar codis paral·lels es complex. Un dels principals esculls a l'hora de programar aplicacions paral·leles és la presència de dependències. Aquestes constrenyen el paral·lelisme, i en general, per evitar-les es requereix realitzar anàlisis complicades que donen lloc a solucions complexes que redueixen la qualitat del codi. Aquesta tesi proposa una solució a la programació paral·lela que incorpora al model les dependències de dades. El model de programació és capaç d'utilitzar aquesta informació per a trobar paral·lelisme que altrament seria molt difícil de detectar i d'extreure. Aquest enfoc augmenta la programabilitat i el paral·lelisme, i per tant també el rendiment. Tot i que al moment de la publicació d'aquesta tesi, el problema ja ha estat resolt a OpenMP 4, la recerca d'aquesta tesi va començar abans de que el problema s'hagués plantejat en l'àmbit d'OpenMP 3. De fet, algunes de les contribucions de la tesi han influït en la solució emprada a OpenMP 4. Tanmateix, les contribucions van més enllà i cobreixen aspectes que encara no han estat considerats a OpenMP 4. La proposta es basa en dependències a nivell de funció entre blocs de memòria continus i sense intersecció. Tot i que trobar dependències sota aquestes condicions és senzill, fer-ho sobre dades no contínues amb possibles interseccions parcials és molt més complex. Aquesta tesi també proposa una solució a aquest problema. Fent això, es millora el rang d'aplicació de la solució original i per tant el del model de programació. Aquest és un dels aspectes que encara no es contemplen a OpenMP 4. Finalment, es presenta una solució que té en compte les característiques de rendiment de les arquitectures NUMA (Accés No Uniforme a la Memòria). La proposta es planteja a nivell del model de programació i no precisa de canvis al codi ja que les dades es distribueixen automàticament. En lloc de basar-se en la migració i la replicació de les dades, es basa exclusivament en la planificació de l'execució de les computacions. Tot i que aquest procés és automàtic, es pot afinar mitjançant petits canvis en el codi que no arriben a alterar el model de programació. Al llarg d'aquesta tesi es demostra la efectivitat de les propostes a través de bancs de proves que son difícils de programar amb altres paradigmes o que tenen solucions diferents. A la majoria dels casos les nostres solucions tenen un rendiment similar o millor que les solucions preexistents, que inclouen implementacions en ben reconegues biblioteques paral·leles d'alt rendiment