26 research outputs found

    Effective Performance Analysis and Debugging

    Get PDF
    Performance is once again a first-class concern. Developers can no longer wait for the next generation of processors to automatically optimize their software. Unfortunately, existing techniques for performance analysis and debugging cannot cope with complex modern hardware, concurrent software, or latency-sensitive software services. While processor speeds have remained constant, increasing transistor counts have allowed architects to increase processor complexity. This complexity often improves performance, but the benefits can be brittle; small changes to a program’s code, inputs, or execution environment can dramatically change performance, resulting in unpredictable performance in deployed software and complicating performance evaluation and debugging. Developers seeking to improve performance must resort to manual performance tuning for large performance gains. Software profilers are meant to guide developers to important code, but conventional profilers do not produce actionable information for concurrent applications. These profilers report where a program spends its time, not where optimizations will yield performance improvements. Furthermore, latency is a critical measure of performance for software services and interactive applications, but conventional profilers measure only throughput. Many performance issues appear only when a system is under high load, but generating this load in development is often impossible. Developers need to identify and mitigate scalability issues before deploying software, but existing tools offer developers little or no assistance. In this dissertation, I introduce an empirically-driven approach to performance analysis and debugging. I present three systems for performance analysis and debugging. Stabilizer mitigates the performance variability that is inherent in modern processors, enabling both predictable performance in deployment and statistically sound performance evaluation. Coz conducts performance experiments using virtual speedups to create the effect of an optimization in a running application. This approach accurately predicts the effect of hypothetical optimizations, guiding developers to code where optimizations will have the largest effect. Amp allows developers to evaluate system scalability using load amplification to create the effect of high load in a testing environment. In combination, Amp and Coz allow developers to pinpoint code where manual optimizations will improve the scalability of their software

    Performance Improvement of Multithreaded Java Applications Execution on Multiprocessor Systems

    Get PDF
    El disseny del llenguatge Java, que inclou aspectes importants com són la seva portabilitat i neutralitat envers l'arquitectura, les seves capacitats multithreading, la seva familiaritat (degut a la seva semblança amb C/C++), la seva robustesa, les seves capacitats en seguretat i la seva naturalesa distribuïda, fan que sigui un llenguatge potencialment interessant per ser utilitzat en entorns paral·lels com són els entorns de computació d'altes prestacions (HPC), on les aplicacions poden treure profit del suport que ofereix Java a l'execució multithreaded per realitzar càlculs en paral·lel, o en entorns e-business, on els servidors Java multithreaded (que segueixen l'especificació J2EE) poden treure profit de les capacitats multithreading de Java per atendre de manera concurrent un gran nombre de peticions.No obstant, l'ús de Java per la programació paral·lela ha d'enfrontar-se a una sèrie de problemes que fàcilment poden neutralitzar el guany obtingut amb l'execució en paral·lel. El primer problema és el gran overhead provocat pel suport de threads de la JVM quan s'utilitzen threads per executar feina de gra fi, quan es crea un gran nombre de threads per suportar l'execució d'una aplicació o quan els threads interaccionen estretament mitjançant mecanismes de sincronització. El segon problema és la degradació en el rendiment produïda quan aquestes aplicacions multithreaded s'executen en sistemes paral·lels multiprogramats. La principal causa d'aquest problemes és la manca de comunicació entre l'entorn d'execució i les aplicacions, la qual pot induir a les aplicacions a fer un ús descoordinat dels recursos disponibles.Aquesta tesi contribueix amb la definició d'un entorn per analitzar i comprendre el comportament de les aplicacions Java multithreaded. La contribució principal d'aquest entorn és que la informació de tots els nivells involucrats en l'execució (aplicació, servidor d'aplicacions, JVM i sistema operatiu) està correlada. Aquest fet és molt important per entendre com aquest tipus d'aplicacions es comporten quan s'executen en entorns que inclouen servidors i màquines virtuals, donat que l'origen dels problemes de rendiment es pot trobar en qualsevol d'aquests nivells o en la seva interacció.Addicionalment, i basat en el coneixement adquirit mitjançant l'entorn d'anàlisis proposat, aquesta tesi contribueix amb mecanismes i polítiques de planificació orientats cap a l'execució eficient d'aplicacions Java multithreaded en sistemes multiprocessador considerant les interaccions i la coordinació dels mecanismes i les polítiques de planificació en els diferents nivells involucrats en l'execució. La idea bàsica consisteix en permetre la cooperació entre les aplicacions i l'entorn d'execució en la gestió de recursos establint una comunicació bi-direccional entre les aplicacions i el sistema. Per una banda, les aplicacions demanen a l'entorn d'execució la quantitat de recursos que necessiten. Per altra banda, l'entorn d'execució pot ser inquirit en qualsevol moment per les aplicacions ser informades sobre la seva assignació de recursos. Aquesta tesi proposa que les aplicacions utilitzin la informació proporcionada per l'entorn d'execució per adaptar el seu comportament a la quantitat de recursos que tenen assignats (aplicacions auto-adaptables). Aquesta adaptació s'assoleix en aquesta tesi per entorns HPC per mitjà de la mal·leabilitat de les aplicacions, i per entorns e-business amb una proposta de control de congestió que fa control d'admissió basat en la diferenciació de connexions SSL per prevenir la degradació del rendiment i mantenir la Qualitat de Servei (QoS).Els resultats de l'avaluació demostren que subministrar recursos de manera dinàmica a les aplicacions auto-adaptables en funció de la seva demanda millora el rendiment de les aplicacions Java multithreaded tant en entorns HPC com en entorns e-business. Mentre disposar d'aplicacions auto-adaptables evita la degradació del rendiment, el subministrament dinàmic de recursos permet satisfer els requeriments de les aplicacions en funció de la seva demanda i adaptar-se a la variabilitat de les seves necessitats de recursos. D'aquesta manera s'aconsegueix una millor utilització dels recursos donat que els recursos que no utilitza una aplicació determinada poden ser distribuïts entre les altres aplicacions.The design of the Java language, which includes important aspects such as its portability and architecture neutrality, its multithreading facilities, its familiarity (due to its resemblance with C/C++), its robustness, its security capabilities and its distributed nature, makes it a potentially interesting language to be used in parallel environments such as high performance computing (HPC) environments, where applications can benefit from the Java multithreading support for performing parallel calculations, or e-business environments, where multithreaded Java application servers (i.e. following the J2EE specification) can take profit of Java multithreading facilities to handle concurrently a large number of requests.However, the use of Java for parallel programming has to face a number of problems that can easily offset the gain due to parallel execution. The first problem is the large overhead incurred by the threading support available in the JVM when threads are used to execute fine-grained work, when a large number of threads are created to support the execution of the application or when threads closely interact through synchronization mechanisms. The second problem is the performance degradation occurred when these multithreaded applications are executed in multiprogrammed parallel systems. The main issue that causes these problems is the lack of communication between the execution environment and the applications, which can cause these applications to make an uncoordinated use of the available resources.This thesis contributes with the definition of an environment to analyze and understand the behavior of multithreaded Java applications. The main contribution of this environment is that all levels in the execution (application, application server, JVM and operating system) are correlated. This is very important to understand how this kind of applications behaves when executed on environments that include servers and virtual machines, because the origin of performance problems can reside in any of these levels or in their interaction.In addition, and based on the understanding gathered using the proposed analysis environment, this thesis contributes with scheduling mechanisms and policies oriented towards the efficient execution of multithreaded Java applications on multiprocessor systems considering the interactions and coordination between scheduling mechanisms and policies at the different levels involved in the execution. The basis idea consists of allowing the cooperation between the applications and the execution environment in the resource management by establishing a bi-directional communication path between the applications and the underlying system. On one side, the applications request to the execution environment the amount of resources they need. On the other side, the execution environment can be requested at any time by the applications to inform them about their resource assignments. This thesis proposes that applications use the information provided by the execution environment to adapt their behavior to the amount of resources allocated to them (self-adaptive applications). This adaptation is accomplished in this thesis for HPC environments through the malleability of the applications, and for e-business environments with an overload control approach that performs admission control based on SSL connections differentiation for preventing throughput degradation and maintaining Quality of Service (QoS).The evaluation results demonstrate that providing resources dynamically to self-adaptive applications on demand improves the performance of multithreaded Java applications as in HPC environments as in e-business environments. While having self-adaptive applications avoids performance degradation, dynamic provision of resources allows meeting the requirements of the applications on demand and adapting to their changing resource needs. In this way, better resource utilization is achieved because the resources not used by some application may be distributed among other applications

    Toward least-privilege isolation for software

    Get PDF
    Hackers leverage software vulnerabilities to disclose, tamper with, or destroy sensitive data. To protect sensitive data, programmers can adhere to the principle of least-privilege, which entails giving software the minimal privilege it needs to operate, which ensures that sensitive data is only available to software components on a strictly need-to-know basis. Unfortunately, applying this principle in practice is dif- �cult, as current operating systems tend to provide coarse-grained mechanisms for limiting privilege. Thus, most applications today run with greater-than-necessary privileges. We propose sthreads, a set of operating system primitives that allows �ne-grained isolation of software to approximate the least-privilege ideal. sthreads enforce a default-deny model, where software components have no privileges by default, so all privileges must be explicitly granted by the programmer. Experience introducing sthreads into previously monolithic applications|thus, partitioning them|reveals that enumerating privileges for sthreads is di�cult in practice. To ease the introduction of sthreads into existing code, we include Crowbar, a tool that can be used to learn the privileges required by a compartment. We show that only a few changes are necessary to existing code in order to partition applications with sthreads, and that Crowbar can guide the programmer through these changes. We show that applying sthreads to applications successfully narrows the attack surface by reducing the amount of code that can access sensitive data. Finally, we show that applications using sthreads pay only a small performance overhead. We applied sthreads to a range of applications. Most notably, an SSL web server, where we show that sthreads are powerful enough to protect sensitive data even against a strong adversary that can act as a man-in-the-middle in the network, and also exploit most code in the web server; a threat model not addressed to date

    Integrative bioinformatics applications for complex human disease contexts

    Get PDF
    This thesis presents new methods for the analysis of high-throughput data from modern sources in the context of complex human diseases, at the example of a bioinformatics analysis workflow. New measurement techniques improve the resolution with which cellular and molecular processes can be monitored. While RNA sequencing (RNA-seq) measures mRNA expression, single-cell RNA-seq (scRNA-seq) resolves this on a per-cell basis. Long-read sequencing is increasingly used in genomics. With imaging mass spectrometry (IMS) the protein level in tissues is measured spatially resolved. All these techniques induce specific challenges, which need to be addressed with new computational methods. Collecting knowledge with contextual annotations is important for integrative data analyses. Such knowledge is available through large literature repositories, from which information, such as miRNA-gene interactions, can be extracted using text mining methods. After aggregating this information in new databases, specific questions can be answered with traceable evidence. The combination of experimental data with these databases offers new possibilities for data integrative methods and for answering questions relevant for complex human diseases. Several data sources are made available, such as literature for text mining miRNA-gene interactions (Chapter 2), next- and third-generation sequencing data for genomics and transcriptomics (Chapters 4.1, 5), and IMS for spatially resolved proteomics (Chapter 4.4). For these data sources new methods for information extraction and pre-processing are developed. For instance, third-generation sequencing runs can be monitored and evaluated using the poreSTAT and sequ-into methods. The integrative (down-stream) analyses make use of these (heterogeneous) data sources. The cPred method (Chapter 4.2) for cell type prediction from scRNA-seq data was successfully applied in the context of the SARS-CoV-2 pandemic. The robust differential expression (DE) analysis pipeline RoDE (Chapter 6.1) contains a large set of methods for (differential) data analysis, reporting and visualization of RNA-seq data. Topics of accessibility of bioinformatics software are discussed along practical applications (Chapter 3). The developed miRNA-gene interaction database gives valuable insights into atherosclerosis-relevant processes and serves as regulatory network for the prediction of active miRNA regulators in RoDE (Chapter 6.1). The cPred predictions, RoDE results, scRNA-seq and IMS data are unified as input for the 3D-index Aorta3D (Chapter 6.2), which makes atherosclerosis related datasets browsable. Finally, the scRNA-seq analysis with subsequent cPred cell type prediction, and the robust analysis of bulk-RNA-seq datasets, led to novel insights into COVID-19. Taken all discussed methods together, the integrative analysis methods for complex human disease contexts have been improved at essential positions.Die Dissertation beschreibt Methoden zur Prozessierung von aktuellen Hochdurchsatzdaten, sowie Verfahren zu deren weiterer integrativen Analyse. Diese findet Anwendung vor allem im Kontext von komplexen menschlichen Krankheiten. Neue Messtechniken erlauben eine detailliertere Beobachtung biomedizinischer Prozesse. Mit RNA-Sequenzierung (RNA-seq) wird mRNA-Expression gemessen, mit Hilfe von moderner single-cell-RNA-seq (scRNA-seq) sogar für (sehr viele) einzelne Zellen. Long-Read-Sequenzierung wird zunehmend zur Sequenzierung ganzer Genome eingesetzt. Mittels bildgebender Massenspektrometrie (IMS) können Proteine in Geweben räumlich aufgelöst quantifiziert werden. Diese Techniken bringen spezifische Herausforderungen mit sich, die mit neuen bioinformatischen Methoden angegangen werden müssen. Für die integrative Datenanalyse ist auch die Gewinnung von geeignetem Kontextwissen wichtig. Wissenschaftliche Erkenntnisse werden in Artikeln veröffentlicht, die über große Literaturdatenbanken zugänglich sind. Mittels Textmining können daraus Informationen extrahiert werden, z.B. miRNA-Gen-Interaktionen, die in eigenen Datenbank aggregiert werden um spezifische Fragen mit nachvollziehbaren Belegen zu beantworten. In Kombination mit experimentellen Daten bieten sich so neue Möglichkeiten für integrative Methoden. Durch die Extraktion von Rohdaten und deren Vorprozessierung werden mehrere Datenquellen erschlossen, wie z.B. Literatur für Textmining von miRNA-Gen-Interaktionen (Kapitel 2), Long-Read- und RNA-seq-Daten für Genomics und Transcriptomics (Kapitel 4.2, 5) und IMS für Protein-Messungen (Kapitel 4.4). So dienen z.B. die poreSTAT und sequ-into Methoden der Vorprozessierung und Auswertung von Long-Read-Sequenzierungen. In der integrativen (down-stream) Analyse werden diese (heterogenen) Datenquellen verwendet. Für die Bestimmung von Zelltypen in scRNA-seq-Experimenten wurde die cPred-Methode (Kapitel 4.2) erfolgreich im Kontext der SARS-CoV-2-Pandemie eingesetzt. Auch die robuste Pipeline RoDE fand dort Anwendung, die viele Methoden zur (differentiellen) Datenanalyse, zum Reporting und zur Visualisierung bereitstellt (Kapitel 6.1). Themen der Benutzbarkeit von (bioinformatischer) Software werden an Hand von praktischen Anwendungen diskutiert (Kapitel 3). Die entwickelte miRNA-Gen-Interaktionsdatenbank gibt wertvolle Einblicke in Atherosklerose-relevante Prozesse und dient als regulatorisches Netzwerk für die Vorhersage von aktiven miRNA-Regulatoren in RoDE (Kapitel 6.1). Die cPred-Methode, RoDE-Ergebnisse, scRNA-seq- und IMS-Daten werden im 3D-Index Aorta3D (Kapitel 6.2) zusammengeführt, der relevante Datensätze durchsuchbar macht. Die diskutierten Methoden führen zu erheblichen Verbesserungen für die integrative Datenanalyse in komplexen menschlichen Krankheitskontexten

    Profilage et débogage par prise de traces efficaces d'applications hybrides multi-threadées HPC

    Get PDF
    Supercomputers’ evolution is at the source of both hardware and software challenges. In the quest for the highest computing power, the interdependence in-between simulation components is becoming more and more impacting, requiring new approaches. This thesis is focused on the software development aspect and particularly on the observation of parallel software when being run on several thousand cores. This observation aims at providing developers with the necessary feedback when running a program on an execution substrate which has not been modeled yet because of its complexity. In this purpose, we firstly introduce the development process from a global point of view, before describing developer tools and related work. In a second time, we present our contribution which consists in a trace based profiling and debugging tool and its evolution towards an on-line coupling method which as we will show is more scalable as it overcomes IOs limitations. Our contribution also covers our time-stamp synchronisation algorithm for tracing purposes which relies on a probabilistic approach with quantified error. We also present a tool allowing machine characterisation from the MPI aspect and demonstrate the presence of machine noise for both point to point and collectives, justifying the use of an empirical approach. In summary, this work proposes and motivates an alternative approach to trace based event collection while preserving event granularity and a reduced overheadL’évolution des supercalculateurs est à la source de défis logiciels et architecturaux. Dans la quête de puissance de calcul, l’interdépendance des éléments du processus de simulation devient de plus en plus impactante et requiert de nouvelles approches. Cette thèse se concentre sur le développement logiciel et particulièrement sur l’observation des programmes parallèles s’exécutant sur des milliers de cœurs. Dans ce but, nous décrivons d’abord le processus de développement de manière globale avant de présenter les outils existants et les travaux associés. Dans un second temps, nous détaillons notre contribution qui consiste d’une part en des outils de débogage et profilage par prise de traces, et d’autre part en leur évolution vers un couplage en ligne qui palie les limitations d’entrées–sorties. Notre contribution couvre également la synchronisation des horloges pour la prise de traces avec la présentation d’un algorithme de synchronisation probabiliste dont nous avons quantifié l’erreur. En outre, nous décrivons un outil de caractérisation machine qui couvre l’aspect MPI. Un tel outil met en évidence la présence de bruit aussi bien sur les communications de type point-à-point que de type collective. Enfin, nous proposons et motivons une alternative à la collecte d’événements par prise de traces tout en préservant la granularité des événements et un impact réduit sur les performances, tant sur le volet utilisation CPU que sur les entrées–sortie

    In Situ Visualization of Performance Data in Parallel CFD Applications

    Get PDF
    This thesis summarizes the work of the author on visualization of performance data in parallel Computational Fluid Dynamics (CFD) simulations. Current performance analysis tools are unable to show their data on top of complex simulation geometries (e.g. an aircraft engine). But in CFD simulations, performance is expected to be affected by the computations being carried out, which in turn are tightly related to the underlying computational grid. Therefore it is imperative that performance data is visualized on top of the same computational geometry which they originate from. However, performance tools have no native knowledge of the underlying mesh of the simulation. This scientific gap can be filled by merging the branches of HPC performance analysis and in situ visualization of CFD simulations data, which shall be done by integrating existing, well established state-of-the-art tools from each field. In this threshold, an extension for the open-source performance tool Score-P was designed and developed, which intercepts an arbitrary number of manually selected code regions (mostly functions) and send their respective measurements – amount of executions and cumulative time spent – to the visualization software ParaView – through its in situ library, Catalyst –, as if they were any other flow-related variable. Subsequently the tool was extended with the capacity to also show communication data (messages sent between MPI ranks) on top of the CFD mesh. Testing and evaluation are done with two industry-grade codes: Rolls-Royce’s CFD code, Hydra, and Onera, DLR and Airbus’ CFD code, CODA. On the other hand, it has been also noticed that the current performance tools have limited capacity of displaying their data on top of three-dimensional, framed (i.e. time-stepped) representations of the cluster’s topology. Parallel to that, in order for the approach not to be limited to codes which already have the in situ adapter, it was extended to take the performance data and display it – also in codes without in situ – on a three-dimensional, framed representation of the hardware resources being used by the simulation. Testing is done with the Multi-Grid and Block Tri-diagonal NAS Parallel Benchmarks (NPB), as well as with Hydra and CODA again. The benchmarks are used to explain how the new visualizations work, while real performance analyses are done with the industry-grade CFD codes. The proposed solution is able to provide concrete performance insights, which would not have been reached with the current performance tools and which motivated beneficial changes in the respective source code in real life. Finally, its overhead is discussed and proven to be suitable for usage with CFD codes. The dissertation provides a valuable addition to the state of the art of highly parallel CFD performance analysis and serves as basis for further suggested research directions
    corecore