777 research outputs found
BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning
An ever increasing number of configuration parameters are provided to system
users. But many users have used one configuration setting across different
workloads, leaving untapped the performance potential of systems. A good
configuration setting can greatly improve the performance of a deployed system
under certain workloads. But with tens or hundreds of parameters, it becomes a
highly costly task to decide which configuration setting leads to the best
performance. While such task requires the strong expertise in both the system
and the application, users commonly lack such expertise.
To help users tap the performance potential of systems, we present
BestConfig, a system for automatically finding a best configuration setting
within a resource limit for a deployed system under a given application
workload. BestConfig is designed with an extensible architecture to automate
the configuration tuning for general systems. To tune system configurations
within a resource limit, we propose the divide-and-diverge sampling method and
the recursive bound-and-search algorithm. BestConfig can improve the throughput
of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce
the running time of Hive join job by about 50% and that of Spark join job by
about 80%, solely by configuration adjustment
GMonE: a complete approach to cloud monitoring
The inherent complexity of modern cloud infrastructures has created the need for innovative monitoring approaches, as state-of-the-art solutions used for other large-scale
environments do not address specific cloud features. Although cloud monitoring is nowadays an active research field, a comprehensive study covering all its aspects has
not been presented yet. This paper provides a deep insight into cloud monitoring. It proposes a unified cloud monitoring taxonomy, based on which it defines a layered
cloud monitoring architecture. To illustrate it, we have implemented GMonE, a general-purpose cloud monitoring tool which covers all aspects of cloud monitoring by specifically addressing the needs of modern cloud infrastructures. Furthermore, we have evaluated the performance, scalability and overhead of GMonE with Yahoo
Cloud Serving Benchmark (YCSB), by using the OpenNebula cloud middleware on the Grid’5000 experimental testbed. The results of this evaluation demonstrate the benefits of our approach, surpassing the monitoring performance and capabilities of cloud monitoring alternatives such as those present in state-of-the-art systems such as Amazon EC2 and OpenNebula
Which NoSQL Database? A Performance Overview
NoSQL data stores are widely used to store and retrieve possibly large amounts of data, typically in a key-value format. There are many NoSQL types with different performances, and thus it is important to compare them in terms of performance and verify how the performance is related to the database type. In this paper, we evaluate five most popular NoSQL databases: Cassandra, HBase, MongoDB, OrientDB and Redis. We compare those databases in terms of query performance, based on reads and updates, taking into consideration the typical workloads, as represented by the Yahoo! Cloud Serving Benchmark. This comparison allows users to choose the most appropriate database according to the specific mechanisms and application needs
Development of a centralized log management system
Os registos de um sistema são uma peça crucial de qualquer sistema e fornecem
uma visão útil daquilo que este está fazendo e do que acontenceu em caso de falha.
Qualquer processo executado num sistema gera registos em algum formato.
Normalmente, estes registos ficam armazenados em memória local. À medida que os
sistemas evoluiram, o número de registos a analisar também aumentou, e, como
consequência desta evolução, surgiu a necessidade de produzir um formato de registos
uniforme, minimizando assim dependências e facilitando o processo de análise.
A ams é uma empresa que desenvolve e cria soluções no mercado dos sensores.
Com vinte e dois centros de design e três locais de fabrico, a empresa fornece os seus
serviços a mais de oito mil clientes em todo o mundo. Um centro de design está
localizado no Funchal, no qual está incluida uma equipa de engenheiros de aplicação
que planeiam e desenvolvem applicações de software para clientes internos. O processo
de desenvolvimento destes engenheiros envolve várias aplicações e programas, cada
um com o seu próprio sistema de registos.
Os registos gerados por cada aplicação são mantido em sistemas de
armazenamento distintos. Se um desenvolvedor ou administrador quiser solucionar um
problema que abrange várias aplicações, será necessário percorrer as várias localizações
onde os registos estão armazenados, colecionando-os e correlacionando-os de forma a
melhor entender o problema. Este processo é cansativo e, se o ambiente for
dimensionado automaticamente, a solução de problemas semelhantes torna-se
inconcebível.
Este projeto teve como principal objetivo resolver estes problemas, criando
assim um Sistema de Gestão de Registos Centralizado capaz de lidar com registos de
várias fontes, como também fornecer serviços que irão ajudar os desenvolvedores e
administradores a melhor entender os diferentes ambientes afetados.
A solução final foi desenvolvida utilizando um conjunto de diferentes tecnologias
de código aberto, tais como a Elastic Stack (Elasticsearch, Logstash e Kibana), Node.js,
GraphQL e Cassandra.
O presente documento descreve o processo e as decisões tomadas para chegar
à solução apresentada.Logs are a crucial piece of any system and give a helpful insight into what it is
doing as well as what happened in case of failure. Every process running on a system
generates logs in some format. Generally, these logs are written to local storage
resources. As systems evolved, the number of logs to analyze increased, and, as a
consequence of this progress, there was the need of having a standardized log format,
minimizing dependencies and making the analysis process easier.
ams is a company that develops and creates sensor solutions. With twenty-two
design centers and three manufacturing locations, the company serves to over eight
thousand clients worldwide. One design center is located in Funchal, which includes a
team of application engineers that design and develop software applications to clients
inside the company. The application engineer’s development process is comprised of
several applications and programs, each having its own logging system.
Log entries generated by different applications are kept in separate storage
systems. If a developer or administrator wants to troubleshoot an issue that includes
several applications, he/she would have to go to different database systems or locations
to collect the logs and correlate them across the several requests. This is a tiresome
process and if the environment is auto-scaled, then troubleshooting an issue is
inconceivable.
This project aimed to solve these problems by creating a Centralized Log
Management System that was capable of handling logs from a variety of sources, as well
as to provide services that will help developers and administrators better understand
the different affected environments.
The deployed solution was developed using a set of different open-source
technologies, such as the Elastic Stack (Elasticsearch, Logstash and Kibana), Node.js,
GraphQL and Cassandra.
The present document describes the process and decisions taken to achieve the
solution
Evaluating Riak Key Value Cluster for Big Data
NoSQL database has become an important alternative to traditional relational databases. Those databases are prepared by the management of large, continuously and variably changing data sets. They are widely used in cloud databases and distributed systems. With NoSQL databases, static schemes and many other restrictions are avoided. In the era of big data, such databases provide scalable high availability solutions. Their key-value feature allows fast retrieval of data and the ability to store a lot of it. There are many kinds of NoSQL databases with various performances. Therefore, comparing those different types of databases in terms of performance and verifying the relationship between performance and database type has become very important. In this paper, we test and evaluate the Riak key-value database for big data clusters using benchmark tools, where huge amounts of data are stored and retrieved in different sizes in a distributed database environment. Execution times of the NoSQL database over different types of workloads and different sizes of data are compared. The results show that the Riak key-value is stable in execution time for both small and large amounts of data, and the throughput performance increases as the number of threads increases
Interplaying Cassandra NoSQL Consistency and Performance: A Benchmarking Approach
This experience report analyses performance of the Cassandra NoSQL database and studies the fundamental trade-off between data consistency and delays in distributed data storages. The primary focus is on investigating the interplay between the Cassandra performance (response time) and its consistency settings. The paper reports the results of the read and write performance benchmarking for a replicated Cassandra cluster, deployed in the Amazon EC2 Cloud. We present quantitative results showing how different consistency settings affect the Cassandra performance under different workloads. One of our main findings is that it is possible to minimize Cassandra delays and still guarantee the strong data consistency by optimal coordination of consistency settings for both read and write requests. Our experiments show that (i) strong consistency costs up to 25% of performance and (ii) the best setting for strong consistency depends on the ratio of read and write operations. Finally, we generalize our experience by proposing a benchmarking-based methodology for run-time optimization of consistency settings to achieve the maximum Cassandra performance and still guarantee the strong data consistency under mixed workloads
Benchmarking Scalability of NoSQL Databases for Geospatial Queries
NoSQL databases provide an edge when it comes to dealing with big unstructured data. Flexibility, agility, and scalability offered by NoSQL databases become increasingly essential when dealing with geospatial data. The proliferation of geospatial applications has tremendously increased the variety, velocity, and volume of data that the data stores must manage. Such characteristics of big spatial data surpassed the capability and anticipated use cases of relational databases. Because we can choose from an extensive collection of NoSQL databases these days, it becomes vital for organizations to make an informed decision. NoSQL Database benchmarks provide system architects, who shoulder a considerable burden of selecting the right technology for their data stores, with a vital start point and source of information. The major utility of these benchmarks is reproducing experiments on similar experimental data that can verify and optimize the process of selecting an optimum tool for data management needs in the early phases of the development. The goal of this research is to develop a benchmark that can compare the performance of NoSQL databases for querying complex geospatial data. We have analyzed throughputs, latencies, and runtime of MongoDB and Couchbase to identify the correct fit for our use case. This way we have also demonstrated a systematic process that can be followed to make an optimum choice of datastore. This benchmark can be extended easily to any NoSQL database that supports geospatial querying
- …