2,246 research outputs found
Challenges in managing real-time data in health information system (HIS)
© Springer International Publishing Switzerland 2016. In this paper, we have discussed the challenges in handling real-time medical big data collection and storage in health information system (HIS). Based on challenges, we have proposed a model for realtime analysis of medical big data. We exemplify the approach through Spark Streaming and Apache Kafka using the processing of health big data Stream. Apache Kafka works very well in transporting data among different systems such as relational databases, Apache Hadoop and nonrelational databases. However, Apache Kafka lacks analyzing the stream, Spark Streaming framework has the capability to perform some operations on the stream. We have identified the challenges in current realtime systems and proposed our solution to cope with the medical big data streams
On Efficiently Partitioning a Topic in Apache Kafka
Apache Kafka addresses the general problem of delivering extreme high volume
event data to diverse consumers via a publish-subscribe messaging system. It
uses partitions to scale a topic across many brokers for producers to write
data in parallel, and also to facilitate parallel reading of consumers. Even
though Apache Kafka provides some out of the box optimizations, it does not
strictly define how each topic shall be efficiently distributed into
partitions. The well-formulated fine-tuning that is needed in order to improve
an Apache Kafka cluster performance is still an open research problem. In this
paper, we first model the Apache Kafka topic partitioning process for a given
topic. Then, given the set of brokers, constraints and application requirements
on throughput, OS load, replication latency and unavailability, we formulate
the optimization problem of finding how many partitions are needed and show
that it is computationally intractable, being an integer program. Furthermore,
we propose two simple, yet efficient heuristics to solve the problem: the first
tries to minimize and the second to maximize the number of brokers used in the
cluster. Finally, we evaluate its performance via large-scale simulations,
considering as benchmarks some Apache Kafka cluster configuration
recommendations provided by Microsoft and Confluent. We demonstrate that,
unlike the recommendations, the proposed heuristics respect the hard
constraints on replication latency and perform better w.r.t. unavailability
time and OS load, using the system resources in a more prudent way.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. This work was funded by the European Union's Horizon
2020 research and innovation programme MARVEL under grant agreement No 95733
Supporting Massive Mobility with stream processing software
The goal of this project is to design a solution for massive mobility using LISP protocol
and scalable database systems like Apache Kafka. The project consists of three steps:
rst, understanding the requirements of the massive mobility scenario; second, designing
a solution based on a stream processing software that integrates with OOR (open-source
LISP implementation). Third, building a prototype with OOR and a stream processing
software (or a similar technology) and evaluating its performance.
Our objectives are: Understand the requirements in an environment for massive mo-
bility;Learn and evaluate the architecture of Apache Kafka and similar broker messages
to see if these tools could satisfy the requirements; Propose an architecture for massive
mobility using protocol LISP and Kafka as mapping system, and nally; Evaluate the
performance of Apache Kafka using such architecture.
In chapters 3 and 4 we will provide a summary of LISP protocol, Apache Kafka and
other message brokers. On these chapters we describe the components of these tools and
how we can use such components to achieve our objective. We will be evaluating the
di erent mechanisms to 1) authenticate users, 2) access control list, 3) protocols to assure
the delivery of the message, 4)integrity and 5)communication patterns. Because we are
interested only in the last message of the queue, it is very important that the broker
message provides a capability to obtain this message.
Regarding the proposed architecture, we will see how we adapted Kafka to store the
information managed by the mapping system in LISP. The EID in LISP will be repre-
sented by topics in Apache Kafka., It will use the pattern publish-subscribe to spread the
noti cation between all the subscribers. xTRs or Mobile devices will be able to play the
role of Consumers and Publisher of the message brokers. Every topic will use only one
partition and every subscriber will have its own consumer group to avoid competition to
consume the messages.
Finally we evaluate the performance of Apache Kafka. As we will see, Kafka escalates
in a Linear way in the following cases: number of packets in the network in relation with
the number of topics, number of packets in the network in relation with the number of
subscribers, number of opened les by the server in relation with the number of topics
time elapsed between the moment when publisher sends a message and subscriber receives
it, regarding to the number of topics.
In the conclusion we explain which objectives were achieved and why there are some
challenges to be faced by kafka especially in two points: 1) we need only the last location
(message) stored in the broker since Kafka does not provide an out of the box mechanism
to obtain such messages, and 2) the amount of opened les that have to be managed
simultaneously by the server. More study is required to compare the performance of
Kafka against other tools
Real-Time Data Processing With Lambda Architecture
Data has evolved immensely in recent years, in type, volume and velocity. There are several frameworks to handle the big data applications. The project focuses on the Lambda Architecture proposed by Marz and its application to obtain real-time data processing. The architecture is a solution that unites the benefits of the batch and stream processing techniques. Data can be historically processed with high precision and involved algorithms without loss of short-term information, alerts and insights. Lambda Architecture has an ability to serve a wide range of use cases and workloads that withstands hardware and human mistakes. The layered architecture enhances loose coupling and flexibility in the system. This a huge benefit that allows understanding the trade-offs and application of various tools and technologies across the layers. There has been an advancement in the approach of building the LA due to improvements in the underlying tools. The project demonstrates a simplified architecture for the LA that is maintainable
Combining Stream Mining and Neural Networks for Short Term Delay Prediction
The systems monitoring the location of public transport vehicles rely on
wireless transmission. The location readings from GPS-based devices are
received with some latency caused by periodical data transmission and temporal
problems preventing data transmission. This negatively affects identification
of delayed vehicles. The primary objective of the work is to propose short term
hybrid delay prediction method. The method relies on adaptive selection of
Hoeffding trees, being stream classification technique and multilayer
perceptrons. In this way, the hybrid method proposed in this study provides
anytime predictions and eliminates the need to collect extensive training data
before any predictions can be made. Moreover, the use of neural networks
increases the accuracy of the predictions compared with the use of Hoeffding
trees only
- …