Search CORE

82 research outputs found

Profiling of parallel programs in a non-strict functional language

Author: Ferreiro Henrique
Publication venue
Publication date: 01/01/2015
Field of study

[Abstract] Purely functional programming languages offer many benefits to parallel programming. The absence of side effects and the provision for higher-level abstractions eases the programming effort. In particular, nonstrict functional languages allow further separation of concerns and provide more parallel facilities in the form of semi-implicit parallelism. On the other hand, because the low-level details of the execution are hidden, usually in a runtime system, the process of debugging the performance of parallel applications becomes harder. Currently available parallel profiling tools allow programmers to obtain some information about the execution; however, this information is usually not detailed enough to precisely pinpoint the cause of some performance problems. Often, this is because the cost of obtaining that information would be prohibitive for a complete program execution. In this thesis, we design and implement a parallel profiling framework based on execution replay. This debugging technique makes it possible to simulate recorded executions of a program, ensuring that their behaviour remains unchanged. The novelty of our approach is to adapt this technique to the context of parallel profiling and to take advantage of the characteristics of non-strict purely functional semantics to guarantee minimal overhead in the recording process. Our work allows to build more powerful profiling tools that do not affect the parallel behaviour of the program in a meaningful way.We demonstrate our claims through a series of benchmarks and the study of two use cases.[Resumo] As linguaxes de programación funcional puras ofrecen moitos beneficios para a programación paralela. A ausencia de efectos secundarios e as abstraccións de alto nivel proporcionadas facilitan o esforzo de programación. En particular, as linguaxes de programación non estritas permiten unha maior separación de conceptos e proporcionan máis capacidades de paralelismo na forma de paralelismo semi-implícito. Por outra parte, debido a que os detalles de baixo nivel da execución están ocultos, xeralmente nun sistema de execución, o proceso de depuración do rendemento de aplicacións paralelas é máis difícil. As ferramentas de profiling dispoñibles hoxe en día permiten aos programadores obter certa información acerca da execución; non obstante, esta información non acostuma a ser o suficientemente detallada para determinar de maneira precisa a causa dalgúns problemas de rendemento. A miúdo, isto débese a que o custe de obter esa información sería prohibitivo para unha execución completa do programa. Nesta tese, deseñamos e implementamos unha plataforma de profiling paralelo baseada en execution replay. Esta técnica de depuración fai que sexa posible simular execucións previamente rexistradas, asegurando que o seu comportamento se manteña sen cambios. A novidade do noso enfoque é adaptar esta técnica para o contexto do profiling paralelo e aproveitar as características da semántica das linguaxes de programación funcional non estritas e puras para garantizar unha sobrecarga mínima na recolección das trazas de execución. O noso traballo permite construír ferramentas de profiling máis potentes que non afectan ao comportamento paralelo do programa de maneira significativa. Demostramos as nosas afirmacións nunha serie de benchmarks e no estudo de dous casos de uso.[Resumen]Los lenguajes de programación funcional puros ofrecen muchos beneficios para la programación paralela. La ausencia de efectos secundarios y las abstracciones de alto nivel proporcionadas facilitan el esfuerzo de programación. En particular, los lenguajes de programación no estrictos permiten una mayor separación de conceptos y proporcionan más capacidades de paralelismo en la forma de paralelismo semi-implícito. Por otra parte, debido a que los detalles de bajo nivel de la ejecución están ocultos, generalmente en un sistema de ejecución, el proceso de depuración del rendimiento de aplicaciones paralelas es más difícil. Las herramientas de profiling disponibles hoy en día permiten a los programadores obtener cierta información acerca de la ejecución; sin embargo, esta información no suele ser lo suficientemente detallada para determinar de manera precisa la causa de algunos problemas de rendimiento. A menudo, esto se debe a que el costo de obtener esa información sería prohibitivo para una ejecución completa del programa. En esta tesis, diseñamos e implementamos una plataforma de profiling paralelo baseada en execution replay. Esta técnica de depuración hace que sea posible simular ejecuciones previamente registradas, asegurando que su comportamiento se mantiene sin cambios. La novedad de nuestro enfoque es adaptar esta técnica para el contexto del profiling paralelo y aprovechar las características de la semántica de los lenguajes de programación funcional no estrictos y puros para garantizar una sobrecarga mínima en la recolección de las trazas de ejecución. Nuestro trabajo permite construir herramientas de profiling más potentes que no afectan el comportamiento paralelo del programa de manera significativa. Demostramos nuestras afirmaciones en una serie de benchmarks y en el estudio de dos casos de uso

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Profiling a parallel domain specific language using off-the-shelf tools

Author: Al-Saeed Majed Mohammed Abdullah
Publication venue
Publication date: 01/01/2015
Field of study

Profiling tools are essential for understanding and tuning the performance of both parallel programs and parallel language implementations. Assessing the performance of a program in a language with high-level parallel coordination is often complicated by the layers of abstraction present in the language and its implementation. This thesis investigates whether it is possible to profile parallel Domain Specific Languages (DSLs) using existing host language profiling tools. The key challenge is that the host language tools report the performance of the DSL runtime system (RTS) executing the application rather than the performance of the DSL application. The key questions are whether a correct, effective and efficient profiler can be constructed using host language profiling tools; is it possible to effectively profile the DSL implementation, and what capabilities are required of the host language profiling tools? The main contribution of this thesis is the development of an execution profiler for the parallel DSL, Haskell Distributed Parallel Haskell (HdpH) using the host language profiling tools. We show that it is possible to construct a profiler (HdpHProf) to support performance analysis of both the DSL applications and the DSL implementation. The implementation uses several new GHC features, including the GHC-Events Library and ThreadScope, develops two new performance analysis tools for DSL HdpH internals, i.e. Spark Pool Contention Analysis, and Registry Contention Analysis. We present a critical comparative evaluation of the host language profiling tools that we used (GHC-PPS and ThreadScope) with another recent functional profilers, EdenTV, alongside four important imperative profilers. This is the first report on the performance of functional profilers in comparison with well established industrial standard imperative profiling technologies. We systematically compare the profilers for usability and data presentation. We found that the GHC-PPS performs well in terms of overheads and usability so using it to profile the DSL is feasible and would not have significant impact on the DSL performance. We validate HdpHProf for functional correctness and measure its performance using six benchmarks. HdpHProf works correctly and can scale to profile HdpH programs running on up to 192 cores of a 32 nodes Beowulf cluster. We characterise the performance of HdpHProf in terms of profiling data size and profiling execution runtime overhead. It shows that HdpHProf does not alter the behaviour of the GHC-PPS and retains low tracing overheads close to the studied functional profilers; 18% on average. Also, it shows a low ratio of HdpH trace events in GHC-PPS eventlog, less than 3% on average. We show that HdpHProf is effective and efficient to use for performance analysis and tuning of the DSL applications. We use HdpHProf to identify performance issues and to tune the thread granularity of six HdpH benchmarks with different parallel paradigms, e.g. divide and conquer, flat data parallel, and nested data parallel. This include identifying problems such as, too small/large thread granularity, problem size too small for the parallel architecture, and synchronisation bottlenecks. We show that HdpHProf is effective and efficient for tuning the parallel DSL implementation. We use the Spark Pool Contention Analysis tool to examine how the spark pool implementation performs when accessed concurrently. We found that appropriate thread granularity can significantly reduce both conflict ratios, and conflict durations, by more than 90%. We use the Registry Contention Analysis tool to evaluate three alternatives of the registry implementations. We found that the tools can give a better understanding of how different implementations of the HdpH RTS perform

Glasgow Theses Service

DYNJA: a dynamic resource analyzer for multi-theaded Java

Author: Troyano Rollán Iván
Troyano Rollán Óscar
Publication venue
Publication date: 01/01/2013
Field of study

Presentamos a continuación el concepto, el uso y la implementación prototípica de Dynja, un analizador dinámico de consumo de recursos para programas Java multi-hilo. El sistema recibe como entrada una aplicación Java, los valores iniciales de sus parámetros de entrada, y con ello se calculan y se miden las siguientes tres métricas disponibles actualmente: número de instrucciones ejecutadas de bytecode (código de bytes), número (y tipo) de los objetos creados, y el número (y nombre) de los métodos invocados. Dynja proporciona como salida los recursos consumidos por cada hilo de acuerdo con la métrica(s) seleccionada(s). Nuestro analizador dinámico de recursos se ha implementado haciendo uso del framework Java Virtual Machine Tool Interface (JVMTI), un interfaz de programación nativo que permite inspeccionar el estado y controlar la ejecución de las aplicaciones que se ejecutan en una JVM. Las principales conclusiones del presente trabajo se han enviado para su evaluación al congreso “Principies and Practice of Programming in Java (PPPJ’13)” y actualmente se encuentra en proceso de revisión. El artículo se puede encontrar en el apéndice. [ABSTRACT] We present the concepts, usage and prototypical implementation of Dynja, a dynamic resource analyzer for multi-threaded Java. The system receives as input a Java application, initial values for its input parameters, and the cost metrics to be measured among the three metrics currently available: number of executed bytecode instructions, number (and type) of objects created, and number (and name) of methods invoked. Dynja yields as output the resources consumed by each thread according to the selected metric(s). Our dynamic resource analyzer has been implemented using the Java Virtual Machine Tool Interface (JVMTI), a native programming interface which allows inspecting the state and controlling the execution of applications running in a JVM. The main conclusions of this work have been submitted for assessment to Congress "Principles and Practice of Programming in Java (PPPJ'13)" and is currently under review. The article can be found in the appendix

Docta Complutense

Granularity in Large-Scale Parallel Functional Programming

Author: Loidl Hans Wolfgang
Publication venue: ProQuest Dissertations & Theses,
Publication date: 01/01/1998
Field of study

This thesis demonstrates how to reduce the runtime of large non-strict functional programs using parallel evaluation. The parallelisation of several programs shows the importance of granularity, i.e. the computation costs of program expressions. The aspect of granularity is studied both on a practical level, by presenting and measuring runtime granularity improvement mechanisms, and at a more formal level, by devising a static granularity analysis. By parallelising several large functional programs this thesis demonstrates for the first time the advantages of combining lazy and parallel evaluation on a large scale: laziness aids modularity, while parallelism reduces runtime. One of the parallel programs is the Lolita system which, with more than 47,000 lines of code, is the largest existing parallel non-strict functional program. A new mechanism for parallel programming, evaluation strategies, to which this thesis contributes, is shown to be useful in this parallelisation. Evaluation strategies simplify parallel programming by separating algorithmic code from code specifying dynamic behaviour. For large programs the abstraction provided by functions is maintained by using a data-oriented style of parallelism, which defines parallelism over intermediate data structures rather than inside the functions. A highly parameterised simulator, GRANSIM, has been constructed collaboratively and is discussed in detail in this thesis. GRANSIM is a tool for architecture-independent parallelisation and a testbed for implementing runtime-system features of the parallel graph reduction model. By providing an idealised as well as an accurate model of the underlying parallel machine, GRANSIM has proven to be an essential part of an integrated parallel software engineering environment. Several parallel runtime- system features, such as granularity improvement mechanisms, have been tested via GRANSIM. It is publicly available and in active use at several universities worldwide. In order to provide granularity information this thesis presents an inference-based static granularity analysis. This analysis combines two existing analyses, one for cost and one for size information. It determines an upper bound for the computation costs of evaluating an expression in a simple strict higher-order language. By exposing recurrences during cost reconstruction and using a library of recurrences and their closed forms, it is possible to infer the costs for some recursive functions. The possible performance improvements are assessed by measuring the parallel performance of a hand-analysed and annotated program

Glasgow Theses Service

CiteSeerX

An evaluation of Lolita and related natural language processing systems

Author: Callaghan Paul
Publication venue
Publication date: 01/01/1998
Field of study

This research addresses the question, "how do we evaluate systems like LOLITA?" LOLITA is the Natural Language Processing (NLP) system under development at the University of Durham. It is intended as a platform for building NL applications. We are therefore interested in questions of evaluation for such general NLP systems. The thesis has two, parts. The first, and main, part concerns the participation of LOLITA in the Sixth Message Understanding Conference (MUC-6). The MUC-relevant portion of LOLITA is described in detail. The adaptation of LOLITA for MUC-6 is discussed, including work undertaken by the author. Performance on a specimen article is analysed qualitatively, and in detail, with anonymous comparisons to competitors' output. We also examine current LOLITA performance. A template comparison tool was implemented to aid these analyses. The overall scores are then considered. A methodology for analysis is discussed, and a comparison made with current scores. The comparison tool is used to analyse how systems performed relative to each-other. One method, Correctness Analysis, was particularly interesting. It provides a characterisation of task difficulty, and indicates how systems approached a task. Finally, MUC-6 is analysed. In particular, we consider the methodology and ways of interpreting the results. Several criticisms of MUC-6 are made, along with suggestions for future MUC-style events. The second part considers evaluation from the point of view of general systems. A literature review shows a lack of serious work on this aspect of evaluation. A first principles discussion of evaluation, starting from a view of NL systems as a particular kind of software, raises several interesting points for single task evaluation. No evaluations could be suggested for general systems; their value was seen as primarily economic. That is, we are unable to analyse their linguistic capability directly

Durham e-Theses