629 research outputs found

    Programming models to support data science workflows

    Get PDF
    Data Science workflows have become a must to progress in many scientific areas such as life, health, and earth sciences. In contrast to traditional HPC workflows, they are more heterogeneous; combining binary executions, MPI simulations, multi-threaded applications, custom analysis (possibly written in Java, Python, C/C++ or R), and real-time processing. Furthermore, in the past, field experts were capable of programming and running small simulations. However, nowadays, simulations requiring hundreds or thousands of cores are widely used and, to this point, efficiently programming them becomes a challenge even for computer sciences. Thus, programming languages and models make a considerable effort to ease the programmability while maintaining acceptable performance. This thesis contributes to the adaptation of High-Performance frameworks to support the needs and challenges of Data Science workflows by extending COMPSs, a mature, general-purpose, task-based, distributed programming model. First, we enhance our prototype to orchestrate different frameworks inside a single programming model so that non-expert users can build complex workflows where some steps require highly optimised state of the art frameworks. This extension includes the @binary, @OmpSs, @MPI, @COMPSs, and @MultiNode annotations for both Java and Python workflows. Second, we integrate container technologies to enable developers to easily port, distribute, and scale their applications to distributed computing platforms. This combination provides a straightforward methodology to parallelise applications from sequential codes along with efficient image management and application deployment that ease the packaging and distribution of applications. We distinguish between static, HPC, and dynamic container management and provide representative use cases for each scenario using Docker, Singularity, and Mesos. Third, we design, implement and integrate AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and requires one single annotation (the @parallel Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores. Finally, we propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows) using one single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements without the effort of combining several frameworks at the same time. Also, to illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library that can be easily integrated with existing task-based frameworks to provide support for dataflows. The library provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.Els fluxos de treball de Data Science s’han convertit en una necessitat per progressar en moltes àrees científiques com les ciències de la vida, la salut i la terra. A diferència dels fluxos de treball tradicionals per a la CAP, els fluxos de Data Science són més heterogenis; combinant l’execució de binaris, simulacions MPI, aplicacions multiprocés, anàlisi personalitzats (possiblement escrits en Java, Python, C / C ++ o R) i computacions en temps real. Mentre que en el passat els experts de cada camp eren capaços de programar i executar petites simulacions, avui dia, aquestes simulacions representen un repte fins i tot per als experts ja que requereixen centenars o milers de nuclis. Per aquesta raó, els llenguatges i models de programació actuals s’esforcen considerablement en incrementar la programabilitat mantenint un rendiment acceptable. Aquesta tesi contribueix a l’adaptació de models de programació per a la CAP per afrontar les necessitats i reptes dels fluxos de Data Science estenent COMPSs, un model de programació distribuïda madur, de propòsit general, i basat en tasques. En primer lloc, millorem el nostre prototip per orquestrar diferent programari per a que els usuaris no experts puguin crear fluxos complexos usant un únic model on alguns passos requereixin tecnologies altament optimitzades. Aquesta extensió inclou les anotacions de @binary, @OmpSs, @MPI, @COMPSs, i @MultiNode per a fluxos en Java i Python. En segon lloc, integrem tecnologies de contenidors per permetre als desenvolupadors portar, distribuir i escalar fàcilment les seves aplicacions en plataformes distribuïdes. A més d’una metodologia senzilla per a paral·lelitzar aplicacions a partir de codis seqüencials, aquesta combinació proporciona una gestió d’imatges i una implementació d’aplicacions eficients que faciliten l’empaquetat i la distribució d’aplicacions. Distingim entre la gestió de contenidors estàtica, CAP i dinàmica i proporcionem casos d’ús representatius per a cada escenari amb Docker, Singularity i Mesos. En tercer lloc, dissenyem, implementem i integrem AutoParallel, un mòdul de Python per determinar automàticament la paral·lelització basada en tasques de nius de bucles afins i executar-los en paral·lel en una infraestructura distribuïda. AutoParallel està basat en programació seqüencial, requereix una sola anotació (el decorador @parallel) i permet a un usuari intermig escalar una aplicació a centenars de nuclis. Finalment, proposem una forma d’estendre els sistemes basats en tasques per admetre dades d’entrada i sortida continus; permetent així la combinació de fluxos de treball i dades (Fluxos Híbrids) en un únic model. Conseqüentment, els desenvolupadors poden crear fluxos complexos seguint diferents patrons sense l’esforç de combinar diversos models al mateix temps. A més, per a il·lustrar les capacitats dels Fluxos Híbrids, hem creat una biblioteca (DistroStreamLib) que s’integra fàcilment amb els models basats en tasques per suportar fluxos de dades. La biblioteca proporciona una representació homogènia, genèrica i simple de seqüències contínues d’objectes i arxius en Java i Python; permetent gestionar qualsevol tipus de dades sense tractar directament amb el back-end de streaming.Los flujos de trabajo de Data Science se han convertido en una necesidad para progresar en muchas áreas científicas como las ciencias de la vida, la salud y la tierra. A diferencia de los flujos de trabajo tradicionales para la CAP, los flujos de Data Science son más heterogéneos; combinando la ejecución de binarios, simulaciones MPI, aplicaciones multiproceso, análisis personalizados (posiblemente escritos en Java, Python, C/C++ o R) y computaciones en tiempo real. Mientras que en el pasado los expertos de cada campo eran capaces de programar y ejecutar pequeñas simulaciones, hoy en día, estas simulaciones representan un desafío incluso para los expertos ya que requieren cientos o miles de núcleos. Por esta razón, los lenguajes y modelos de programación actuales se esfuerzan considerablemente en incrementar la programabilidad manteniendo un rendimiento aceptable. Esta tesis contribuye a la adaptación de modelos de programación para la CAP para afrontar las necesidades y desafíos de los flujos de Data Science extendiendo COMPSs, un modelo de programación distribuida maduro, de propósito general, y basado en tareas. En primer lugar, mejoramos nuestro prototipo para orquestar diferentes software para que los usuarios no expertos puedan crear flujos complejos usando un único modelo donde algunos pasos requieran tecnologías altamente optimizadas. Esta extensión incluye las anotaciones de @binary, @OmpSs, @MPI, @COMPSs, y @MultiNode para flujos en Java y Python. En segundo lugar, integramos tecnologías de contenedores para permitir a los desarrolladores portar, distribuir y escalar fácilmente sus aplicaciones en plataformas distribuidas. Además de una metodología sencilla para paralelizar aplicaciones a partir de códigos secuenciales, esta combinación proporciona una gestión de imágenes y una implementación de aplicaciones eficientes que facilitan el empaquetado y la distribución de aplicaciones. Distinguimos entre gestión de contenedores estática, CAP y dinámica y proporcionamos casos de uso representativos para cada escenario con Docker, Singularity y Mesos. En tercer lugar, diseñamos, implementamos e integramos AutoParallel, un módulo de Python para determinar automáticamente la paralelización basada en tareas de nidos de bucles afines y ejecutarlos en paralelo en una infraestructura distribuida. AutoParallel está basado en programación secuencial, requiere una sola anotación (el decorador @parallel) y permite a un usuario intermedio escalar una aplicación a cientos de núcleos. Finalmente, proponemos una forma de extender los sistemas basados en tareas para admitir datos de entrada y salida continuos; permitiendo así la combinación de flujos de trabajo y datos (Flujos Híbridos) en un único modelo. Consecuentemente, los desarrolladores pueden crear flujos complejos siguiendo diferentes patrones sin el esfuerzo de combinar varios modelos al mismo tiempo. Además, para ilustrar las capacidades de los Flujos Híbridos, hemos creado una biblioteca (DistroStreamLib) que se integra fácilmente a los modelos basados en tareas para soportar flujos de datos. La biblioteca proporciona una representación homogénea, genérica y simple de secuencias continuas de objetos y archivos en Java y Python; permitiendo manejar cualquier tipo de datos sin tratar directamente con el back-end de streaming.Postprint (published version

    Stratified Static Analysis Based on Variable Dependencies

    Get PDF
    In static analysis by abstract interpretation, one often uses widening operators in order to enforce convergence within finite time to an inductive invariant. Certain widening operators, including the classical one over finite polyhedra, exhibit an unintuitive behavior: analyzing the program over a subset of its variables may lead a more precise result than analyzing the original program! In this article, we present simple workarounds for such behavior

    Loop-based Modeling of Parallel Communication Traces

    Get PDF
    This paper describes an algorithm that takes a trace of a distributed program and builds a model of all communications of the program. The model is a set of nested loops representing repeated patterns. Loop bodies collect events representing communication actions performed by the various processes, like sending or receiving messages, and participating in collective operations. The model can be used for compact visualization of full executions, for program understanding and debugging, and also for building statistical analyzes of various quantitative aspects of the program's behavior. The construction of the communication model is performed in two phases. First, a local model is built for each process, capturing local regularities; this phase is incremental and fast, and can be done on-line, during the execution. The second phase is a reduction process that collects, aligns, and finally merges all local models into a global, system-wide model. This global model is a compact representation of all communications of the original program, capturing patterns across groups of processes. It can be visualized directly and, because it takes the form of a sequence of loop nests, can be used to replay the original program's communication actions. Because the model is based on communication events only, it completely ignores other quantitative aspects like timestamps or messages sizes. Including such data would in most case break regularities, reducing the usefulness of trace-based modeling. Instead, the paper shows how one can efficiently access quantitative data kept in the original trace(s), by annotating the model and compiling data scanners automatically.Ce rapport de recherche décrit un algorithme qui prend en entrée la trace d'un programme distribué, et construit un modèle de l'ensemble des communications du programme. Le modèle prend la forme d'un ensemble de boucles imbriquées représentant la répétition de motifs de communication. Le corps des boucles regroupe des événements représentant les actions de communication réalisées par les différents processus impliqués, tels que l'envoi et la réception de messages, ou encore la participation à des opérations collectives. Le modèle peut servir à la visualisation compact d'exécutions complètes, à la compréhension de programme et au debugging, mais également à la construction d'analyses statistiques de divers aspects quantitatifs du comportement du programme. La construction du modèle de communication s'effectue en deux phases. Premièrement, un modèle local est construit au sein de chaque processus, capturant les régularités locales~; cette phase est incrémentale et rapide, et peut être réalisée au cours de l'exécution. La seconde phase est un processus de réduction qui rassemble, aligne, et finalement fusionne tous les modèles locaux en un modèle global décrivant la totalité du système. Ce modèle global est une représentation compacte de toutes les communications du programme original, représentant des motifs de communication entre groupes de processus. Il peut être visualisé directement et, puisqu'il prend la forme d'un ensemble de nids de boucles, peut servir à rejouer les opérations de communication du programme initial. Puisque le modèle construit se base uniquement sur les opérations de communication, il ignore complètement d'autres données quantitatives, telles que les informations chronologiques, ou les tailles de messages. L'inclusion de telles données briserait dans la plupart des cas les régularités topologiques, réduisant l'efficacité de la modélisation par boucles. Nous préférons, dans ce rapport, montrer comment, grâce au modèle construit, il est possible d'accéder efficacement aux données quantitatives si celles-ci sont conservées dans les traces individuelles, en annotant le modèle et en l'utilisant pour compiler automatiquement des programmes d'accès aux données

    Deductive Optimization of Relational Data Storage

    Full text link
    Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build a compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of these specialized queries is competitive with a state-of-the-art in memory compiled database system

    Optimizing the Performance of Directive-based Programming Model for GPGPUs

    Get PDF
    Accelerators have been deployed on most major HPC systems. They are considered to improve the performance of many applications. Accelerators such as GPUs have an immense potential in terms of high compute capacity but programming these devices is a challenge. OpenCL, CUDA and other vendor-specific models for accelerator programming definitely offer high performance, but these are low-level models that demand excellent programming skills; moreover, they are time consuming to write and debug. In order to simplify GPU programming, several directive-based programming models have been proposed, including HMPP, PGI accelerator model and OpenACC. OpenACC has now become established as the de facto standard. We evaluate and compare these models involving several scientific applications. To study the implementation challenges and the principles and techniques of directive- based models, we built an open source OpenACC compiler on top of a main stream compiler framework (OpenUH as a branch of Open64). In this dissertation, we present the required techniques to parallelize and optimize the applications ported with OpenACC programming model. We apply both user-level optimizations in the applications and compiler and runtime-driven optimizations. The compiler optimization focuses on the parallelization of reduction operations inside nested parallel loops. To fully utilize all GPU resources, we also extend the OpenACC model to support multiple GPUs in a single node. Our application porting experience also revealed the challenge of choosing good loop schedules. The default loop schedule chosen by the compiler may not produce the best performance, so the user has to manually try different loop schedules to improve the performance. To solve this issue, we developed a locality-aware auto-tuning framework which is based on the proposed memory access cost model to help the compiler choose optimal loop schedules and guide the user to choose appropriate loop schedules.Computer Science, Department o

    Clickstream for learning analytics to assess students’ behavior with Scratch

    Get PDF
    The construction of knowledge through computational practice requires to teachers a substantial amount of time and effort to evaluate programming skills, to understand and to glimpse the evolution of the students and finally to state a quantitative judgment in learning assessment. The field of learning analytics has been a common practice in research since last years due to their great possibilities in terms of learning improvement. Both, Big and Small data techniques support the analysis cycle of learning analytics and risk of students’ failure prediction. Such possibilities can be a strong positive contribution to the field of computational practice such as programming. Our main objective was to help teachers in their assessments through to make those possibilities effective. Thus, we have developed a functional solution to categorize and understand students’ behavior in programming activities based in Scratch. Through collection and analysis of data generated by students’ clicks in Scratch, we proceed to execute both exploratory and predictive analytics to detect patterns in students’ behavior when developing solutions for assignments. We concluded that resultant taxonomy could help teachers to better support their students by giving real-time quality feedback and act before students deliver incorrectly or at least incomplete tasks.Peer ReviewedPostprint (author's final draft

    Exploring the effectiveness of loop perforation for quality of service profiling

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 62-64).Many computations exhibit a trade off between execution time and quality of service. A video encoder, for example, can often encode frames more quickly if it is given the freedom to produce slightly lower quality video. A developer attempting to optimize such computations must navigate a complex trade-off space to find optimizations that appropriately balance quality of service and performance. We present a new quality of service profiler that is designed to help developers identify promising optimization opportunities in such computations. In contrast to standard profilers, which simply identify time-consuming parts of the computation, a quality of service profiler is designed to identify subcomputations that can be replaced with new, potentially less accurate, subcomputations that deliver significantly increased performance in return for acceptably small quality of service losses. Our quality of service profiler uses loop perforation, which transforms loops to perform fewer iterations than the original loop, to obtain implementations that occupy different points in the performance/quality of service trade-off space. The rationale is that optimizable computations often contain loops that perform extra iterations, and that removing iterations, then observing the resulting effect on the quality of service, is an effective way to identify such optimizable subcomputations. Our experimental results from applying our implemented quality of service profiler to a challenging set of benchmark applications show that it can enable developers to identify promising optimization opportunities and deliver successful optimizations that substantially increase the performance with only small quality of service losses.by Sasa Misailovic.S.M
    • …
    corecore