2,246 research outputs found

    Challenges in managing real-time data in health information system (HIS)

    Get PDF
    © Springer International Publishing Switzerland 2016. In this paper, we have discussed the challenges in handling real-time medical big data collection and storage in health information system (HIS). Based on challenges, we have proposed a model for realtime analysis of medical big data. We exemplify the approach through Spark Streaming and Apache Kafka using the processing of health big data Stream. Apache Kafka works very well in transporting data among different systems such as relational databases, Apache Hadoop and nonrelational databases. However, Apache Kafka lacks analyzing the stream, Spark Streaming framework has the capability to perform some operations on the stream. We have identified the challenges in current realtime systems and proposed our solution to cope with the medical big data streams

    On Efficiently Partitioning a Topic in Apache Kafka

    Full text link
    Apache Kafka addresses the general problem of delivering extreme high volume event data to diverse consumers via a publish-subscribe messaging system. It uses partitions to scale a topic across many brokers for producers to write data in parallel, and also to facilitate parallel reading of consumers. Even though Apache Kafka provides some out of the box optimizations, it does not strictly define how each topic shall be efficiently distributed into partitions. The well-formulated fine-tuning that is needed in order to improve an Apache Kafka cluster performance is still an open research problem. In this paper, we first model the Apache Kafka topic partitioning process for a given topic. Then, given the set of brokers, constraints and application requirements on throughput, OS load, replication latency and unavailability, we formulate the optimization problem of finding how many partitions are needed and show that it is computationally intractable, being an integer program. Furthermore, we propose two simple, yet efficient heuristics to solve the problem: the first tries to minimize and the second to maximize the number of brokers used in the cluster. Finally, we evaluate its performance via large-scale simulations, considering as benchmarks some Apache Kafka cluster configuration recommendations provided by Microsoft and Confluent. We demonstrate that, unlike the recommendations, the proposed heuristics respect the hard constraints on replication latency and perform better w.r.t. unavailability time and OS load, using the system resources in a more prudent way.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. This work was funded by the European Union's Horizon 2020 research and innovation programme MARVEL under grant agreement No 95733

    Supporting Massive Mobility with stream processing software

    Get PDF
    The goal of this project is to design a solution for massive mobility using LISP protocol and scalable database systems like Apache Kafka. The project consists of three steps: rst, understanding the requirements of the massive mobility scenario; second, designing a solution based on a stream processing software that integrates with OOR (open-source LISP implementation). Third, building a prototype with OOR and a stream processing software (or a similar technology) and evaluating its performance. Our objectives are: Understand the requirements in an environment for massive mo- bility;Learn and evaluate the architecture of Apache Kafka and similar broker messages to see if these tools could satisfy the requirements; Propose an architecture for massive mobility using protocol LISP and Kafka as mapping system, and nally; Evaluate the performance of Apache Kafka using such architecture. In chapters 3 and 4 we will provide a summary of LISP protocol, Apache Kafka and other message brokers. On these chapters we describe the components of these tools and how we can use such components to achieve our objective. We will be evaluating the di erent mechanisms to 1) authenticate users, 2) access control list, 3) protocols to assure the delivery of the message, 4)integrity and 5)communication patterns. Because we are interested only in the last message of the queue, it is very important that the broker message provides a capability to obtain this message. Regarding the proposed architecture, we will see how we adapted Kafka to store the information managed by the mapping system in LISP. The EID in LISP will be repre- sented by topics in Apache Kafka., It will use the pattern publish-subscribe to spread the noti cation between all the subscribers. xTRs or Mobile devices will be able to play the role of Consumers and Publisher of the message brokers. Every topic will use only one partition and every subscriber will have its own consumer group to avoid competition to consume the messages. Finally we evaluate the performance of Apache Kafka. As we will see, Kafka escalates in a Linear way in the following cases: number of packets in the network in relation with the number of topics, number of packets in the network in relation with the number of subscribers, number of opened les by the server in relation with the number of topics time elapsed between the moment when publisher sends a message and subscriber receives it, regarding to the number of topics. In the conclusion we explain which objectives were achieved and why there are some challenges to be faced by kafka especially in two points: 1) we need only the last location (message) stored in the broker since Kafka does not provide an out of the box mechanism to obtain such messages, and 2) the amount of opened les that have to be managed simultaneously by the server. More study is required to compare the performance of Kafka against other tools

    Real-Time Data Processing With Lambda Architecture

    Get PDF
    Data has evolved immensely in recent years, in type, volume and velocity. There are several frameworks to handle the big data applications. The project focuses on the Lambda Architecture proposed by Marz and its application to obtain real-time data processing. The architecture is a solution that unites the benefits of the batch and stream processing techniques. Data can be historically processed with high precision and involved algorithms without loss of short-term information, alerts and insights. Lambda Architecture has an ability to serve a wide range of use cases and workloads that withstands hardware and human mistakes. The layered architecture enhances loose coupling and flexibility in the system. This a huge benefit that allows understanding the trade-offs and application of various tools and technologies across the layers. There has been an advancement in the approach of building the LA due to improvements in the underlying tools. The project demonstrates a simplified architecture for the LA that is maintainable

    Combining Stream Mining and Neural Networks for Short Term Delay Prediction

    Full text link
    The systems monitoring the location of public transport vehicles rely on wireless transmission. The location readings from GPS-based devices are received with some latency caused by periodical data transmission and temporal problems preventing data transmission. This negatively affects identification of delayed vehicles. The primary objective of the work is to propose short term hybrid delay prediction method. The method relies on adaptive selection of Hoeffding trees, being stream classification technique and multilayer perceptrons. In this way, the hybrid method proposed in this study provides anytime predictions and eliminates the need to collect extensive training data before any predictions can be made. Moreover, the use of neural networks increases the accuracy of the predictions compared with the use of Hoeffding trees only
    corecore