765 research outputs found
Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases
[Abstract] Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data. However, it presents some performance issues that hinder its utilization in many practical use cases. Although existing alternatives like Spark or Hama can outperform Hadoop, they require to rewrite the source code of the applications due to API incompatibilities. This paper studies the use of Flame-MR, an in-memory processing architecture for MapReduce applications, to improve the performance of real-world use cases in a transparent way while keeping application compatibility. Flame-MR adapts to the characteristics of the workloads, managing efficiently the use of custom data formats and iterative computations, while also reducing workload imbalance. The experimental evaluation, conducted in high performance clusters and the Microsoft Azure cloud, shows a clear outperformance of Flame-MR over Hadoop. In most cases, Flame-MR reduces the execution times by more than a half
Reify Your Collection Queries for Modularity and Speed!
Modularity and efficiency are often contradicting requirements, such that
programers have to trade one for the other. We analyze this dilemma in the
context of programs operating on collections. Performance-critical code using
collections need often to be hand-optimized, leading to non-modular, brittle,
and redundant code. In principle, this dilemma could be avoided by automatic
collection-specific optimizations, such as fusion of collection traversals,
usage of indexing, or reordering of filters. Unfortunately, it is not obvious
how to encode such optimizations in terms of ordinary collection APIs, because
the program operating on the collections is not reified and hence cannot be
analyzed.
We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala
collections API that allows such analyses and optimizations to be defined and
executed within Scala, without relying on external tools or compiler
extensions. SQuOpt provides the same "look and feel" (syntax and static typing
guarantees) as the standard collections API. We evaluate SQuOpt by
re-implementing several code analyses of the Findbugs tool using SQuOpt, show
average speedups of 12x with a maximum of 12800x and hence demonstrate that
SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page
Energyware engineering: techniques and tools for green software development
Tese de Doutoramento em Informática (MAP-i)Energy consumption is nowadays one of the most important concerns worldwide. While
hardware is generally seen as the main culprit for a computer’s energy usage, software
too has a tremendous impact on the energy spent, as it can cancel the efficiency introduced
by the hardware. Green Computing is not a newfield of study, but the focus has been,
until recently, on hardware. While there has been advancements in Green Software techniques,
there is still not enough support for software developers so they can make their
code more energy-aware, with various studies arguing there is both a lack of knowledge
and lack of tools for energy-aware development.
This thesis intends to tackle these two problems and aims at further pushing
forward research on Green Software. This software energy consumption issue is faced
as a software engineering question. By using systematic, disciplined, and quantifiable
approaches to the development, operation, and maintenance of software we defined several
techniques, methodologies, and tools within this document. These focus on providing
software developers more knowledge and tools to help with energy-aware software
development, or Energyware Engineering.
Insights are provided on the energy influence of several stages performed during
a software’s development process. We look at the energy efficiency of various popular
programming languages, understanding which are the most appropriate if a developer’s
concern is energy consumption. A detailed study on the energy profiles of different
Java data structures is also presented, alongwith a technique and tool, further providing
more knowledge on what energy efficient alternatives a developer has to choose from. To
help developers with the lack of tools, we defined and implemented a technique to detect
energy inefficient fragments within the source code of a software system. This technique
and tool has been shown to help developers improve the energy efficiency of their programs,
and even outperforming a runtime profiler. Finally, answers are provided to common questions and misconceptions within
this field of research, such as the relationship between time and energy, and howone can
improve their software’s energy consumption.
This thesis provides a great effort to help support both research and education on
this topic, helps continue to grow green software out of its infancy, and contributes to
solving the lack of knowledge and tools which exist for Energyware Engineering.Hoje em dia o consumo energético é uma das maiores preocupações a nível global. Apesar
do hardware ser, de umaforma geral, o principal culpado para o consumo de energia
num computador, o software tem também um impacto significativo na energia consumida,
pois pode anular, em parte, a eficiência introduzida pelo hardware. Embora
Green Computing não seja uma área de investigação nova, o foco tem sido, até recentemente,
na componente de hardware. Embora as técnicas de Green Software tenham
vindo a evoluir, não há ainda suporte suficiente para que os programadores possam
produzir código com consciencialização energética. De facto existemvários estudos que
defendem que existe tanto uma falta de conhecimento como uma escassez de ferramentas
para o desenvolvimento energeticamente consciente.
Esta tese pretende abordar estes dois problemas e tem como foco promover avanços
em green software. O tópico do consumo de energia é abordado duma perspectiva
de engenharia de software. Através do uso de abordagens sistemáticas, disciplinadas
e quantificáveis no processo de desenvolvimento, operação e manutencão de software,
foi possível a definição de novas metodologias e ferramentas, apresentadas neste documento.
Estas ferramentas e metodologias têm como foco dotar de conhecimento e
ferramentas os programadores de software, de modo a suportar um desenvolvimento
energeticamente consciente, ou Energyware Engineering.
Deste trabalho resulta a compreensão sobre a influência energética a ser usada
durante as diferentes fases do processo de desenvolvimento de software. Observamos as
linguagens de programação mais populares sobre um ponto de vista de eficiência energética,
percebendo quais as mais apropriadas caso o programador tenha uma preocupação
com o consumo energético. Apresentamos também um estudo detalhado sobre perfis energéticos de diferentes
estruturas de dados em Java, acompanhado por técnicas e ferramentas, fornecendo
conhecimento relativo a quais as alternativas energeticamente eficientes que os programadores
dispõem. Por forma a ajudar os programadores, definimos e implementamos
uma técnica para detetar fragmentos energicamente ineficientes dentro do código fonte
de um sistema de software. Esta técnica e ferramenta têm demonstrado ajudar programadores
a melhorarem a eficiência energética dos seus programas e em algum casos
superando um runtime profiler.
Por fim, são dadas respostas a questões e conceções erradamente formuladas dentro
desta área de investigação, tais como o relacionamento entre tempo e energia e como
é possível melhorar o consumo de energia do software.
Foi empregue nesta tese um esforço árduo de suporte tanto na investigação como
na educação relativo a este tópico, ajudando à maturação e crescimento de green computing,
contribuindo para a resolução da lacuna de conhecimento e ferramentas para
suporte a Energyware Engineering.This work is partially funded by FCT – Foundation for Science and Technology, the
Portuguese Ministry of Science, Technology and Higher Education, through national funds,
and co-financed by the European Social Fund (ESF) through the Operacional Programme for
Human Capital (POCH), with scholarship reference SFRH/BD/112733/2015. Additionally,
funding was also provided the ERDF – European Regional Development Fund – through
the Operational Programmes for Competitiveness and Internationalisation COMPETE and
COMPETE 2020, and by the Portuguese Government through FCT project Green Software
Lab (ref. POCI-01-0145-FEDER-016718), by the project GreenSSCM - Green Software for
Space Missions Control, a project financed by the Innovation Agency, SA, Northern Regional
Operational Programme, Financial Incentive Grant Agreement under the Incentive Research
and Development System, Project No. 38973, and by the Luso-American Foundation in
collaboration with the National Science Foundation with grant FLAD/NSF ref. 300/2015 and
ref. 275/2016
Handling Massive N-Gram Datasets Efficiently
This paper deals with the two fundamental problems concerning the handling of
large n-gram language models: indexing, that is compressing the n-gram strings
and associated satellite data without compromising their retrieval speed; and
estimation, that is computing the probability distribution of the strings from
a large textual source. Regarding the problem of indexing, we describe
compressed, exact and lossless data structures that achieve, at the same time,
high space reductions and no time degradation with respect to state-of-the-art
solutions and related software packages. In particular, we present a compressed
trie data structure in which each word following a context of fixed length k,
i.e., its preceding k words, is encoded as an integer whose value is
proportional to the number of words that follow such context. Since the number
of words following a given context is typically very small in natural
languages, we lower the space of representation to compression levels that were
never achieved before. Despite the significant savings in space, our technique
introduces a negligible penalty at query time. Regarding the problem of
estimation, we present a novel algorithm for estimating modified Kneser-Ney
language models, that have emerged as the de-facto choice for language modeling
in both academia and industry, thanks to their relatively low perplexity
performance. Estimating such models from large textual sources poses the
challenge of devising algorithms that make a parsimonious use of the disk. The
state-of-the-art algorithm uses three sorting steps in external memory: we show
an improved construction that requires only one sorting step thanks to
exploiting the properties of the extracted n-gram strings. With an extensive
experimental analysis performed on billions of n-grams, we show an average
improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February
2019, Article No: 2
Software for supporting large scale data processing for High Throughput Screening
High Throughput Screening for is a valuable data generation technique for data driven knowledge discovery. Because the rate of data generation is so great, it is a challenge to cope with the demands of post experiment data analysis. This thesis presents three software solutions that I implemented in an attempt to alleviate this problem. The first is K-Screen, a Laboratory Information Management System designed to handle and visualize large High Throughput Screening datasets. K-Screen is being successfully used by the University of Kansas High Throughput Screening Laboratory to better organize and visualize their data. The next two algorithms are designed to accelerate the search times for chemical similarity searches using 1-dimensional fingerprints. The first algorithm balances information content in bit strings to attempt to find more optimal ordering and segmentation patterns for chemical fingerprints. The second algorithm eliminates redundant pruning calculations for large batch chemical similarity searches and shows a 250% improvement for the fastest current fingerprint search algorithm for large batch queries
Evaluating SOAP for High Performance Business Applications: Real-Time Trading Systems
Web services, with an emphasis on open standards and flexibility, may provide benefits over existing capital markets integration practices. However, web services must first meet certain technical requirements including performance, security and fault--tolerance. This paper presents an experimental evaluation of SOAP performance using realistic business application message content. To get some indication of whether SOAP is appropriate for high performance capital markets systems, the results are compared with a widely used existing protocol. The study finds that, although SOAP performs relatively poorly, the difference is less than in scientific computing environments. Furthermore, we find that in realistic business applications it is possible for text--based wire formats to have comparable performance to binary, and that the text--based nature of XML is not sufficient to explain SOAP's inefficiency. This suggests that further work may enable SOAP to become a viable wire format for high performance business applications
The Parma Polyhedra Library: Toward a Complete Set of Numerical Abstractions for the Analysis and Verification of Hardware and Software Systems
Since its inception as a student project in 2001, initially just for the
handling (as the name implies) of convex polyhedra, the Parma Polyhedra Library
has been continuously improved and extended by joining scrupulous research on
the theoretical foundations of (possibly non-convex) numerical abstractions to
a total adherence to the best available practices in software development. Even
though it is still not fully mature and functionally complete, the Parma
Polyhedra Library already offers a combination of functionality, reliability,
usability and performance that is not matched by similar, freely available
libraries. In this paper, we present the main features of the current version
of the library, emphasizing those that distinguish it from other similar
libraries and those that are important for applications in the field of
analysis and verification of hardware and software systems.Comment: 38 pages, 2 figures, 3 listings, 3 table
- …