142 research outputs found
High Availability and Scalability of Mainframe Environments using System z and z/OS as example
Mainframe computers are the backbone of industrial and commercial computing, hosting the most relevant and critical data of businesses. One of the most important mainframe environments is IBM System z with the operating system z/OS. This book introduces mainframe technology of System z and z/OS with respect to high availability and scalability. It highlights their presence on different levels within the hardware and software stack to satisfy the needs for large IT organizations
Vector-Processing for Mobile Devices: Benchmark and Analysis
Vector processing has become commonplace in today's CPU microarchitectures.
Vector instructions improve performance and energy which is crucial for
resource-constraint mobile devices. The research community currently lacks a
comprehensive benchmark suite to study the benefits of vector processing for
mobile devices. This paper presents Swan-an extensive vector processing
benchmark suite for mobile applications. Swan consists of a diverse set of
data-parallel workloads from four commonly used mobile applications: operating
system, web browser, audio/video messaging application, and PDF rendering
engine. Using Swan benchmark suite, we conduct a detailed analysis of the
performance, power, and energy consumption of vectorized workloads, and show
that: (a) Vectorized kernels increase the pressure on cache hierarchy due to
the higher rate of memory requests. (b) Vector processing is more beneficial
for workloads with lower precision operations and higher cache hit rates. (c)
Limited Instruction-Level Parallelism and strided memory accesses to
multi-dimensional data structures prevent vector processing benefits from
scaling with more SIMD functional units and wider registers. (d) Despite lower
computation throughput than domain-specific accelerators, such as GPU, vector
processing outperforms these accelerators for kernels with lower operation
counts. Finally, we show five common computation patterns in mobile
data-parallel workloads that dominate the execution time.Comment: 2023 IEEE International Symposium on Workload Characterization
(IISWC
The Pitfalls of Benchmarking with Applications
International audienceApplication benchmarking is a widely trusted method of performance evaluation. Compiler developers rely on them to assess the correctness and performance of their optimizations; computer vendors use them to compare their respective machines; processor architects run them to tune innovative features, and — to a lesser extent — to validate their correctness. Benchmarks must reflect actual workloads of interest, and return a synthetic measure of “performance”. Often, benchmarks are simply a collection of real-world applications run as black boxes. We identify a number of pitfalls that derive from using applications as benchmarks, and we illustrate them with a popular, freely available, benchmark suite. In particular, we advocate the fact that correctness should be defined by an expert of the application domain, and the test should be integrated in the benchmark
Quantifying and Predicting the Influence of Execution Platform on Software Component Performance
The performance of software components depends on several factors, including the execution platform on which the software components run. To simplify cross-platform performance prediction in relocation and sizing scenarios, a novel approach is introduced in this thesis which separates the application performance profile from the platform performance profile. The approach is evaluated using transparent instrumentation of Java applications and with automated benchmarks for Java Virtual Machines
Experimental evaluation of big data querying tools
Nos últimos anos, o termo Big Data tornou-se um tópico bastanta debatido em várias
áreas de negócio. Um dos principais desafios relacionados com este conceito é como lidar
com o enorme volume e variedade de dados de forma eficiente. Devido Ă notĂłria
complexidade e volume de dados associados ao conceito de Big Data, são necessários
mecanismos de consulta eficientes para fins de análise de dados. Motivado pelo rápido
desenvolvimento de ferramentas e frameworks para Big Data, há muita discussão sobre
ferramentas de consulta e, mais especificamente, quais sĂŁo as mais apropriadas para
necessidades analĂticas especĂfica. Esta dissertação descreve e compara as principais
caracterĂsticas e arquiteturas das seguintes conhecidas ferramentas analĂticas para Big Data:
Drill, HAWQ, Hive, Impala, Presto e Spark. Para testar o desempenho dessas ferramentas
analĂticas para Big Data, descrevemos tambĂ©m o processo de preparação, configuração e
administração de um Cluster Hadoop para que possamos instalar e utilizar essas ferramentas,
tendo um ambiente capaz de avaliar seu desempenho e identificar quais cenários mais
adequados à sua utilização. Para realizar esta avaliação, utilizamos os benchmarks TPC-H e
TPC-DS, onde os resultados mostraram que as ferramentas de processamento em memĂłria
como HAWQ, Impala e Presto apresentam melhores resultados e desempenho em datasets de
dimensão baixa e média. No entanto, as ferramentas que apresentaram tempos de execuções
mais lentas, especialmente o Hive, parecem apanhar as ferramentas de melhor desempenho
quando aumentamos os datasets de referĂŞncia
Accessible software frameworks for reproducible image analysis of host-pathogen interactions
Um die Mechanismen hinter lebensgefährlichen Krankheiten zu verstehen, müssen die zugrundeliegenden Interaktionen zwischen den Wirtszellen und krankheitserregenden Mikroorganismen bekannt sein. Die kontinuierlichen Verbesserungen in bildgebenden Verfahren und Computertechnologien ermöglichen die Anwendung von Methoden aus der bildbasierten Systembiologie, welche moderne Computeralgorithmen benutzt um das Verhalten von Zellen, Geweben oder ganzen Organen präzise zu messen. Um den Standards des digitalen Managements von Forschungsdaten zu genügen, müssen Algorithmen den FAIR-Prinzipien (Findability, Accessibility, Interoperability, and Reusability) entsprechen und zur Verbreitung ebenjener in der wissenschaftlichen Gemeinschaft beitragen. Dies ist insbesondere wichtig für interdisziplinäre Teams bestehend aus Experimentatoren und Informatikern, in denen Computerprogramme zur Verbesserung der Kommunikation und schnellerer Adaption von neuen Technologien beitragen können. In dieser Arbeit wurden daher Software-Frameworks entwickelt, welche dazu beitragen die FAIR-Prinzipien durch die Entwicklung von standardisierten, reproduzierbaren, hochperformanten, und leicht zugänglichen Softwarepaketen zur Quantifizierung von Interaktionen in biologischen System zu verbreiten. Zusammenfassend zeigt diese Arbeit wie Software-Frameworks zu der Charakterisierung von Interaktionen zwischen Wirtszellen und Pathogenen beitragen können, indem der Entwurf und die Anwendung von quantitativen und FAIR-kompatiblen Bildanalyseprogrammen vereinfacht werden. Diese Verbesserungen erleichtern zukünftige Kollaborationen mit Lebenswissenschaftlern und Medizinern, was nach dem Prinzip der bildbasierten Systembiologie zur Entwicklung von neuen Experimenten, Bildgebungsverfahren, Algorithmen, und Computermodellen führen wird
Algorithms and architectures for decimal transcendental function computation
Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers' attentions. Recently, the decimal-encoded formats and DFP arithmetic units have been implemented in IBM's system z900, POWER6, and z10 microprocessors.
Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, virtually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors.
In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are composed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selection by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first attempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008.
To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel's DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches
Performance analysis methods for understanding scaling bottlenecks in multi-threaded applications
In dit proefschrift stellen we drie nieuwe methodes voor om de prestatie van meerdradige programma's te analyseren. Onze eerste methode, criticality stacks, is bruikbaar voor het analyseren van onevenwicht tussen draden. Om deze stacks te construeren stellen we een nieuwe criticaliteitsmetriek voor, die de uitvoeringstijd van een applicatie opsplitst in een deel voor iedere draad. Hoe groter dit deel is voor een draad, hoe kritischer deze draad is voor de applicatie. De tweede methode, bottle graphs, stelt iedere draad van een meerdradig programma voor als een rechthoek in een grafiek. De hoogte van de rechthoek wordt berekend door middel van onze criticaliteitsmetriek, en de breedte stelt het parallellisme van een draad voor. Rechthoeken die bovenaan in de grafiek zitten, als het ware in de hals van de fles, hebben een beperkt parallellisme, waardoor we ze beschouwen als “bottlenecks” voor de applicatie. Onze derde methode, speedup stacks, toont de bereikte speedup van een applicatie en de verschillende componenten die speedup beperken in een gestapelde grafiek. De intuïtie achter dit concept is dat door het reduceren van de invloed van een bepaalde component, de speedup van een applicatie proportioneel toeneemt met de grootte van die component in de stapel
Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming
We present a novel programming language design that attempts to combine the
clarity and safety of high-level functional languages with the efficiency and
parallelism of low-level numerical languages. We treat arrays as
eagerly-memoized functions on typed index sets, allowing abstract function
manipulations, such as currying, to work on arrays. In contrast to composing
primitive bulk-array operations, we argue for an explicit nested indexing style
that mirrors application of functions to arguments. We also introduce a
fine-grained typed effects system which affords concise and
automatically-parallelized in-place updates. Specifically, an associative
accumulation effect allows reverse-mode automatic differentiation of in-place
updates in a way that preserves parallelism. Empirically, we benchmark against
the Futhark array programming language, and demonstrate that aggressive
inlining and type-driven compilation allows array programs to be written in an
expressive, "pointful" style with little performance penalty.Comment: 31 pages with appendix, 11 figures. A conference submission is still
under revie
- …