3 research outputs found

    Availability Modeling and Assurance of Map-reduce Computing

    Get PDF
    This thesis proposes a new analytical model to evaluate the availability of Map-Reduce computing on a Hadoop platform. Map-Reduce computing is represented by a queueing model in order to trace flow of Map-Reduce jobs of their arrivals and departures in the course of computation. The objective of this analytical model is to evaluate the probability for a Map-Reduce computation to be available at an instant of time, referred to as availability. The set of variables taken into account in this model lists the number of Map-Reduce jobs, the number of servers (or referred to as the worker nodes in this thesis) engaged, along with a few constants such as job arrival/completion rates and the worker node failure/repair rates. The proposed model provides a comprehensive yet fundamental basis to assure and ultimately optimize the design of Map-Reduce computing in terms of availability with reference to its performance in a simultaneous manner. Parametric simulations have been conducted and demonstrated efficacy of the proposed model in assessing the availability and the cost for achieving the availability with respect to throughput as well as turnaround time.Computer Scienc

    Performance and Reliability Evaluation of Apache Kafka Messaging System

    Get PDF
    Streaming data is now flowing across various devices and applications around us. This type of data means any unbounded, ever growing, infinite data set which is continuously generated by all kinds of sources. Examples include sensor data transmitted among different Internet of Things (IoT) devices, user activity records collected on websites and payment requests sent from mobile devices. In many application scenarios, streaming data needs to be processed in real-time because its value can be futile over time. A variety of stream processing systems have been developed in the last decade and are evolving to address rising challenges. A typical stream processing system consists of multiple processing nodes in the topology of a DAG (directed acyclic graph). To build real-time streaming data pipelines across those nodes, message middleware technology is widely applied. As a distributed messaging system with high durability and scalability, Apache Kafka has become very popular among modern companies. It ingests streaming data from upstream applications and store the data in its distributed cluster, which provides a fault-tolerant data source for stream processors. Therefore, Kafka plays a critical role to ensure the completeness, correctness and timeliness of streaming data delivery. However, it is impossible to meet all the user requirements in real-time cases with a simple and fixed data delivery strategy. In this thesis, we address the challenge of choosing a proper configuration to guarantee both performance and reliability of Kafka for complex streaming application scenarios. We investigate the features that have an impact on the performance and reliability metrics. We propose a queueing based prediction model to predict the performance metrics, including producer throughput and packet latency of Kafka. We define two reliability metrics, the probability of message loss and the probability of message duplication. We create an ANN model to predict these metrics given unstable network metrics like network delay and packet loss rate. To collect sufficient training data we build a Docker-based Kafka testbed with a fault injection module. We use a new quality-of-service metric, timely throughput to help us choosing proper batch size in Kafka. Based on this metric, we propose a dynamic configuration method, which reactively guarantees both performance and reliability of Kafka under complex operation conditions

    Інформаційна система аналітичного відділу з використанням хмарних технологій

    Get PDF
    Робота публікується згідно наказу ректора від 29.12.2020 р. №580/од "Про розміщення кваліфікаційних робіт вищої освіти в репозиторії НАУ". Керівник проекту: к.т.н., доцент Кудренко Станіслава ОлексіївнаToday, no industry can do without the implementation of automation systems, decision support. When developing software for a department, Business Intelligence (BI) is a collection of software applications, techniques and business systems that play a key role in the business processes of any corporation. Most companies generate huge amounts of data in the course of their business. To provide access to this data to all departments of the company, a wide arsenal of applications and DBMS is often used.Сьогодні жодна галузь не може обійтися без впровадження систем автоматизації, підтримки прийняття рішень. При розробці програмного забезпечення для департаменту Business Intelligence (BI) - це сукупність програмних додатків, методів та бізнес-систем, які відіграють ключову роль у бізнес-процесах будь-якої корпорації. Більшість компаній генерують величезні обсяги даних у процесі свого бізнесу. Для надання доступу до цих даних усім підрозділам компанії часто використовується широкий арсенал програм та СУБД
    corecore