8,164 research outputs found

    Don't Repeat Yourself: Seamless Execution and Analysis of Extensive Network Experiments

    Full text link
    This paper presents MACI, the first bespoke framework for the management, the scalable execution, and the interactive analysis of a large number of network experiments. Driven by the desire to avoid repetitive implementation of just a few scripts for the execution and analysis of experiments, MACI emerged as a generic framework for network experiments that significantly increases efficiency and ensures reproducibility. To this end, MACI incorporates and integrates established simulators and analysis tools to foster rapid but systematic network experiments. We found MACI indispensable in all phases of the research and development process of various communication systems, such as i) an extensive DASH video streaming study, ii) the systematic development and improvement of Multipath TCP schedulers, and iii) research on a distributed topology graph pattern matching algorithm. With this work, we make MACI publicly available to the research community to advance efficient and reproducible network experiments

    On the performance of SQL scalable systems on Kubernetes: a comparative study

    Get PDF
    The popularization of Hadoop as the the-facto standard platform for data analytics in the context of Big Data applications has led to the upsurge of SQL-on-Hadoop systems, which provide scalable query execution engines allowing the use of SQL queries on data stored in HDFS. In this context, Kubernetes appears as the leading choice to simplify the deployment and scaling of containerized applications; however, there is a lack of studies about the performance of SQL-on-Hadoop systems deployed on Kubernetes, and this is the gap we intend to fill in this paper. We present an experimental study involving four representative SQL scalable platforms: Apache Drill, Apache Hive, Apache Spark SQL and Trino. Concretely, we analyze the performance of these systems when they are deployed on a Hadoop cluster with Kubernetes by using the TPC-H benchmark. The results of our study can help practitioners and users about what they can expect in terms of performance if they plan to use the advantages of Kubernetes to deploy applications using the analyzed SQL scalable platforms.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Funding for open access charge: Universidad de Málaga / CBUA. This work has been partially funded by the Spanish Ministry of Science and Innovation via Grant PID2020-112540RB-C41 (AEI/FEDER, UE), Andalusian PAIDI program with grant P18-RT-2799, and by project ”Evolución y desarrollo de la plataforma DOP de Big Data” (702C2000044) under Andalusian “Programa de Apoyo a la I+D+i Empresarial”

    Business intelligence-centered software as the main driver to migrate from spreadsheet-based analytics

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNowadays, companies are handling and managing data in a way that they weren’t ten years ago. The data deluge is, as a mere consequence of that, the constant day-to-day challenge for them - having to create agile and scalable data solutions to tackle this reality. The main trigger of this project was to support the decision-making process of a customer-centered marketing team (called Customer Voice) in the Company X by developing a complete, holistic Business Intelligence solution that goes all the way from ETL processes to data visualizations based on that team’s business needs. Having this context into consideration, the focus of the internship was to make use of BI, ETL techniques to migrate their data stored in spreadsheets — where they performed data analysis — and shift the way they see the data into a more dynamic, sophisticated, and suitable way in order to help them make data-driven strategic decisions. To ensure that there was credibility throughout the development of this project and its subsequent solution, it was necessary to make an exhaustive literature review to help me frame this project in a more realistic and logical way. That being said, this report made use of scientific literature that explained the evolution of the ETL workflows, tools, and limitations across different time periods and generations, how it was transformed from manual to real-time data tasks together with data warehouses, the importance of data quality and, finally, the relevance of ETL processes optimization and new ways of approaching data integrations by using modern, cloud architectures

    Open Source Platforms for Big Data Analytics

    Get PDF
    O conceito de Big Data tem tido um grande impacto no campo da tecnologia, em particular na gestão e análise de enormes volumes de informação. Atualmente, as organizações consideram o Big Data como uma oportunidade para gerir e explorar os seus dados o máximo possível, com o objetivo de apoiar as suas decisões dentro das diferentes áreas operacionais. Assim, é necessário analisar vários conceitos sobre o Big Data e o Big Data Analytics, incluindo definições, características, vantagens e desafios. As ferramentas de Business Intelligence (BI), juntamente com a geração de conhecimento, são conceitos fundamentais para o processo de tomada de decisão e transformação da informação. Ao investigar as plataformas de Big Data, as práticas industriais atuais e as tendências relacionadas com o mundo da investigação, é possível entender o impacto do Big Data Analytics nas pequenas organizações. Este trabalho pretende propor soluções para as micro, pequenas ou médias empresas (PME) que têm um grande impacto na economia portuguesa, dado que representam a maioria do tecido empresarial. As plataformas de código aberto para o Big Data Analytics oferecem uma grande oportunidade de inovação nas PMEs. Este trabalho de pesquisa apresenta uma análise comparativa das funcionalidades e características das plataformas e os passos a serem tomados para uma análise mais profunda e comparativa. Após a análise comparativa, apresentamos uma avaliação e seleção de plataformas Big Data Analytics (BDA) usando e adaptando a metodologia QSOS (Qualification and Selection of software Open Source) para qualificação e seleção de software open-source. O resultado desta avaliação e seleção traduziu-se na eleição de duas plataformas para os testes experimentais. Nas plataformas de software livre de BDA foi usado o mesmo conjunto de dados assim como a mesma configuração de hardware e software. Na comparação das duas plataformas, demonstrou que a HPCC Systems Platform é mais eficiente e confiável que a Hortonworks Data Platform. Em particular, as PME portuguesas devem considerar as plataformas BDA como uma oportunidade de obter vantagem competitiva e melhorar os seus processos e, consequentemente, definir uma estratégia de TI e de negócio. Por fim, este é um trabalho sobre Big Data, que se espera que sirva como um convite e motivação para novos trabalhos de investigação.The concept of Big Data has been having a great impact in the field of technology, particularly in the management and analysis of huge volumes of information. Nowadays organizations look for Big Data as an opportunity to manage and explore their data the maximum they can, with the objective of support decisions within its different operational areas. Thus, it is necessary to analyse several concepts about Big Data and Big Data Analytics, including definitions, features, advantages and disadvantages. Business intelligence along with the generation of knowledge are fundamental concepts for the process of decision-making and transformation of information. By investigate today's big data platforms, current industrial practices and related trends in the research world, it is possible to understand the impact of Big Data Analytics on small organizations. This research intends to propose solutions for micro, small or médium enterprises (SMEs) that have a great impact on the Portuguese economy since they represente approximately 90% of the companies in Portugal. The open source platforms for Big Data Analytics offers a great opportunity for SMEs. This research work presents a comparative analysis of those platforms features and functionalities and the steps that will be taken for a more profound and comparative analysis. After the comparative analysis, we present an evaluation and selection of Big Data Analytics (BDA) platforms using and adapting the Qualification and Selection of software Open Source (QSOS) method. The result of this evaluation and selection was the selection of two platforms for the empirical experiment and tests. The same testbed and dataset was used in the two Open Source Big Data Analytics platforms. When comparing two BDA platforms, HPCC Systems Platform is found to be more efficient and reliable than Hortonworks Data Platform. In particular, Portuguese SMEs should consider for BDA platforms an opportunity to obtain competitive advantage and improve their processes and consequently define an IT and business strategy. Finally, this is a research work on Big Data; it is hoped that this will serve as an invitation and motivation for new research

    A systems thinking approach to business intelligence solutions based on cloud computing

    Get PDF
    Thesis (S.M. in System Design and Management)--Massachusetts Institute of Technology, Engineering Systems Division, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 73-74).Business intelligence is the set of tools, processes, practices and people that are used to take advantage of information to support decision making in the organizations. Cloud computing is a new paradigm for offering computing resources that work on demand, are scalable and are charged by the time they are used. Organizations can save large amounts of money and effort using this approach. This document identifies the main challenges companies encounter while working on business intelligence applications in the cloud, such as security, availability, performance, integration, regulatory issues, and constraints on network bandwidth. All these challenges are addressed with a systems thinking approach, and several solutions are offered that can be applied according to the organization's needs. An evaluations of the main vendors of cloud computing technology is presented, so that business intelligence developers identify the available tools and companies they can depend on to migrate or build applications in the cloud. It is demonstrated how business intelligence applications can increase their availability with a cloud computing approach, by decreasing the mean time to recovery (handled by the cloud service provider) and increasing the mean time to failure (achieved by the introduction of more redundancy on the hardware). Innovative mechanisms are discussed in order to improve cloud applications, such as private, public and hybrid clouds, column-oriented databases, in-memory databases and the Data Warehouse 2.0 architecture. Finally, it is shown how the project management for a business intelligence application can be facilitated with a cloud computing approach. Design structure matrices are dramatically simplified by avoiding unnecessary iterations while sizing, validating, and testing hardware and software resources.by Eumir P. Reyes.S.M.in System Design and Managemen


    Get PDF
    Management support systems (MSS) help managers to perform their jobs more efficiently. With in-memory technology, a new IT enabler promises to support managers by benefits ranging from reducing time for MSS data entry and analysis to completing even new topics of analysis. Hence, the present situation is favorable for an MSS redesign applying in-memory apps. Such apps are field-tested and ready-to-use, but from a business perspective they lack impact. Based on findings from a literature review and results from a workshop with an expert focus group validated with one-on-one manager interviews, we propose four initial use situations in which in-memory apps contribute to greater MSS acceptance: (1) In-memory apps should accelerate the MSS response time for both check status and receive an alert. In doing so, they should focus on information from management accounting. (2) By delivering information more timely, in-memory apps should contribute to MSS standard reports and financial closing. (3) In-memory apps should accelerate MSS response time for both ad-hoc analysis and drill-down/drill-through analysis. (4) Leveraging in-memory apps, MSS ad-hoc analysis and drill down/drill-through analysis should become more flexible.