The specialized distributed real-time stream processing systems demand the underlying system should able to adapt with increment in data volume, having heterogeneous data sources. Along with the requirement of massive computational capabilities on increasing data velocity, these specialized systems also insist that underlying framework should provide highly scalable resources to achieve massive parallelism among the processing logic components in a distributed computing nodes in a timely manner and to facilitate fast recovery from hardware failures, stateless and stateful mechanism of processing logic components ensure low latency streaming. Among all state-of-the-art specialized distributed stream processing framework Apache Storm 
model which automatically take care of data and process distribution to achieve sufficient task parallelism. More recent forays in low-latency distributed stream processing, Google MillWheel [3] and Apache Heron [4] emerged as a successor to all modern unbounded streams of continuous data processing systems and scale transparently to large clusters which are most common among all stream processing engines. Although there are similarities among components but they provide a different mechanism such as tuples or buffers for message passing to provide high throughput.
The emerging real-time distributed stream processing system's Heron is built from plethora of components named as Spout, Bolts, Topology Master, Stream Manager and Metrics Manager which interacts in complex ways while running on several containers to correlate with high velocity of data volume. These containers are scheduled to run on a heterogeneous selection of multi-core nodes using largescale storage infrastructures. It also provides a framework to seamlessly integrate with existing large data processing components named as Apache Hadoop Distributed File System, Apache REEF [5] , Apache Mesos, Apache Aurora [6] , Simple Linux Utility for Resource Management (SLURM) and Google Kubernetes [7] but simultaneously makes it difficult to understand the performance behavior of underlying applications and components. Traditional relational database management systems performance complexities can be resolved using optimizers [8] but how to accurately model and predict performance complexities in distributed stream processing framework is quite challenging and has not yet been well studied. We address this gap in this paper. These performance complexities arise due to huge variance in workloads, elasticity, computation fluctuations and tuple serialization rate which makes difficult to predict the behavior of data pipelined on distributed components. Since predicting the dynamic performance of data stream will provide further insight to a number of data management task including workload optimization [9] , scheduling [10] and resource management which help in reducing unnecessary overprovisioning of resources through efficient prioritization of resource allocations in the specialized distributed stream processing systems domain.
In this paper, we propose a novel architecture independent performance prediction framework for text streaming in distributed stream processing platform running on top of OpenHPC systems. Specifically, we summarize our contribution to the following:
• We provide domain specific metrics which were most relevant for streaming platform running on top of high performance computing architecture because existing methodologies only depicts about the big data processing and distributed database management framework.
• We provide performance behavior of streaming platform running on top of high performance architecture.
• We transform state-of-the-art automated performance tuning module of distributed database management system to work for distributed streaming platform.
• We propose a novel framework running on top of a streaming platform using linear least squares with L2 regularization to recommend a plausible performance for the stream of individual topology.
• To validate and evaluate the proposed framework, we implemented on an emerging processing system, Apache Heron.
The remainder of this paper is structured as follows: in "Background" section, we present the adequate background for the entire paper. "Design and implementation of proposed framework" section presents the methodology followed by an overview of the proposed framework. "Experimental evaluation" section presents the evaluation and results while comparing with all proposed models. In "Related work" section, all related literature are discussed and finally, the paper concludes in "Discussions and conclusions" section with addressing some of the conceivable used cases.
Background
A standard stream processing framework running on high performance computing cluster is the one where every component is running on computing nodes, an exemplary architecture described in Fig. 1 . The processing representation of the continuous progression of tuples, streams we model it into a directed acyclic graph (DAG). These acyclic graphs are well known as topology in heron and have the capacity for processing of these tuples with ∞ number of times which eventually depends on the availability of tuples. Topologies in heron comprise three basic processing logic components (PLU) labeled as spout, bolt and edging bolt. The source processing component, spout read tuples potentially from the outsourced stream publisher-subscriber system (here, Apache Pulsar [11] ) and seeds tuples into a contemporary graph having count zeros indegree. The tuple processing component, bolt parse the seeded tuple with user-defined processing logic and later, seed the processed tuple into a contemporary graph such that in-degree ≥ 1 and out-degree ≥ 1 to maintain the stream processing pipeline. Similarly, edging bolt or sink processing component parses the seeded tuple with user-defined processing logic and later, seed the processed tuple into outsourced storage such that indegree ≥ 1 and out-degree ⇐ 0 to maintain the stream processing pipeline. The vertices in the logical plan of topology represent nodes of contemporary graph and direction of these vertices represents the progression of these tuples whole scenario is elaborated in Fig. 1 . These processing logic component instances packed into a containerized process, Heron Instance which can able to execute as many parallel tasks on multiple containers hosted on either single or multiple computing nodes. These user-defined topologies are distributed to the cluster through one of the scalable mechanism named as Hadoop File System, Local File System, and Lustre File Systems. 1 Dynamically, the efficiency of contemporary topologies is maintained using the back-pressure mechanism [12] for spout and bolt respectively maintained through Topology Master. Tuples in this framework are generally composed of the message with the encoded meta-attributes object. Heron has 1 http://www.lustr e.org. six ways of grouping the tuples among contemporary processing component described as follows:
• Fields grouping: The progression of tuples is transmitted to those processing logic components comprised of similar meta-attribute value.
• Global grouping: The progression of tuples is transmitted to single instance having lowest encoded meta-attribute value.
• Shuffle grouping: The progression of tuples is randomly distributed to distinct instances while ensuring uniform distribution. • None grouping: Till now, having similar functionality as shuffle grouping.
• All grouping: The progression of tuples distributed to all corresponding processing components.
• Custom grouping: The progression of tuples distributed to corresponding processing components as defined by the user.
Heron has gathered a results in following two ways described as follows:
• Sliding window: Tuples in a stream are grouped together to form windows that can be overlap either on the basis of time duration or on number of operation performed.
• Tumbling window: Tuples in a stream are grouped together to form non-overlapping window either on the basis of time duration or on number of operation performed.
A distributed stream data processing system consist of master node that serves as the topology life cycle management unit and helps in transformation of logical plan into physical plan which are analogous to a database query plan [13] using state-of-the-art bin packing algorithm. Inspired from microkernel based architecture, it share these plans to the SLURM scheduler which further assign the task to specific compute nodes as per the physical plan of a topology. The scheduling solution leads to almost even distribution among all containers which assign a task to instances in a round-robin manner.
Design and implementation of proposed framework
In this section, we describe the overview of design and implementation details of the proposed framework.
Overview
Our proposed benchmark suite allows to inject various data loads into heron stream data processing systems and collect dynamic metrics which helps to estimate runtime performance of data streams in streaming topology. The overall architecture is presented in Fig. 2 topology implements services running on a high performance stream data processing cluster where each service has several types of requests issued by users.
Performance metrics classification
There is no single metric exists till the time of writing paper based on which we evaluate overall performance of big data system which is almost the same problem discussed in [14] . In this section, we are classifying all existing metrics into seven different categories which helps in deeper visualization of strength and weakness of entire big data processing systems discussed in following section.
Memory metrics
Heap memory Running topology in containerized environment contains another layer of abstract execution environment on top of hardware virtualization over a physical hosting platform and sharing these hardware resources along with memory among multiple Java virtual machines (JVMs) ends up with the unpredictable memory demands as discussed in [15] .
To consider such memory behavior, the percentage of Heap Available metric is computed using the total amount of heap memory free divided by total amount of heap memory available in terms of megabytes.
Garbage collection time The total accumulated milliseconds time spent by the garbage collector managed bean (MBean) [16] to find and reclaim unreachable objects to free up memory space per minute known as garbage collection time ( GC Time ).
Alternatively, GC Time defined as the total accumulated time spent to determine the number of reachable objects ( α ), count of unreachable objects ( β ) and time to free up memory space in a milliseconds window frame ( γ).
n-Verticals metrics
Thread share The total actively running live threads are simultaneously made request for the services in the same container at the given instance of time. These accumulated running threads also comprising of background supporting task which are fulfilled by daemon threads.
A total number of active threads (non-daemons) count at the given time can be evaluated as modulus subtraction of numbers of active threads with a number of active daemon threads running. Later, fraction with total active thread which is known as thread share.
(1) Heap available (%, mb) = Heap free Heap total × 100 
CPU load
The allocated cores which are actively running process (p) for x duration of time from the recent period being observed are termed as CPU Load of containerized process.
Alternatively, process CPU load is defined as a product of current process CPU load with the number of allocated cores to the topology. And the range of such java virtual machine variable count lies between 0 and 1. Hence, the percentage of core idle is related to the processing load and is defined as the complement of process load.
Communication metrics
Back pressure The total accumulated time spent by an instance under back-pressure.
We measure back-pressure (BP) time [17] in terms of milliseconds per minute which includes TCP back-pressure ( θ 1 ), spout back-pressure ( θ 2 ) and stage-by-stage back-pressure ( θ 3 ) as heron internal includes both back-pressure initiated by self and others.
Computation metrics
Execute latency The execution latency is the latency it acquired to process a user-defined logic on windowed incoming tuples of a topology.
Scheduler metrics
Uptime The total computation time allocated to a process on which Java virtual machine is running, once shortlisted by the short-term scheduler. In rest of the paper, we keep nanoseconds as a unit of measurement in metrics pipeline module .
Among all the selected metrics, containerized configuration as a cost metrics (RAM, CPU, Disk usage) and input-output as a cost metrics (emit count, fail count, acknowledgement count) are some of the widely selected features on most state-of-the-art systems. A data-center system such as IBM Cloud Private [18] , reports the performance of worker nodes to the master node in terms of CPUs, GPUs usage, and overall RAM utilization. Moreover, auto-scaling of running application totally depends on consumption of these contemporary components. Poggi et al. [19] also includes these system configuration metrics to report resource consumption based on the query to have a overall insight of cluster.
Data streaming performance prediction model
Regression algorithms are best candidate to perform prediction of any component in terms of latency. Since this problem is dealing with densely populated high-dimensional input data but only having continuous attributes, which makes it appropriate to apply parametric ridge regularization regression algorithms. The non-parametric regression algorithm such as support vector regression algorithms ( ǫ-SVR, nu-SVR) also be the good candidate as it has less memory overhead in comparison with ridge regularization regression algorithm (label them as MSL in rest of the paper) but it outperforms the SVR with respect to all the three validation metrics as shown in Table 1 for each distinct types of cluster discussed in "Environmental setup" section. We compare the performance of these regression models using various measures such as mean squares of the deviations, square root of the arithmetic means of imperfection measure, mean absolute error and coefficient of determination as described in Table 1 . We also used cross-validation (k = 10) for various selections of kernel and the result depicts that Ridge regularization regression algorithm outperforms the linear kernel, the polynomial kernel, the radial basis function kernel and the sigmoid kernel for each ǫ-SVR and nu-SVR.
To compare the efficiency of proposed model (label as DKL in rest of paper), we use well-studied technique for regression problem used in most performance tuning module of Distributed Database Management System. The dimensions of all the dynamic metrics are reduced using state-of-the-art dimensionality reduction technique called Factor Analysis. It transform the high dimensional stream processing systems dynamic metric data into lower dimensional data. Based on our experiments, we found that only the initial factors are most significant for our prediction framework due to existence of major influenced metrics distribution. To find out the highly influential metrics, we use k-means clustering algorithm to cluster this lower dimensional data using each row as its feature metrics and, keep a single metric from each cluster (one which were nearest to the centroid of a cluster). Finally, we use Gaussian processes regression to recommend performance of data streams with help of top k dynamic metrics of stream data processing system. 
Metric pipeline
The function of metrics pipeline is to keep running as multiple threads throughout the live compute nodes in the cluster. When tuples arrives from one component of topology, the entire dynamic metrics are recorded with a encoded meta-attribute ID and timestamp. Then the tuple will be processed and stream the aggregate metrics to adjoint component, while entire dynamic metrics are again recorded by the metric pipeline along timestamp including tuple ID.
Suppose there are total of t topologies and at any given time, maximum n nodes are consumed up by topology t t1 . Therefore, during a time frame t thread are maximum threads running as shown in Eq. 6.
Data enrichment
The Data Enrichment converge all tuples record into new record having all dynamic metrics with help of unique tuple ID and timestamp. After concatenation, all the missing values are replaced with mean of entire column since all dynamic metrics record contains sparse data and distributed data storage termed as Data Grid.
Experimental evaluation
In this section, we provide the brief overview of benchmark suite used in the evaluation of proposed framework followed by a glimpse of experimental setups. The benchmark topologies form a data pipeline using open-source distributed pub-sub messaging system Apache Pulsar [11] to consume text streams generated by parallel synthetic data load generator. The input streams are the tuples which are generated using Alice's Adventures in Wonderland 2 text file and the spout consume the data streams later emits into topology through subscription to pulsar topic.
Environmental setup
We perform the evaluation of the proposed performance modeling framework over Apache Incubator Heron 0. latency including throughput is measured during changes in number of parallel task and system performance.
Benchmark suite

Grep count directed acyclic graph (GC-DAG)
The four stage topology widely known as a application of MapReduce. Their structure is alike to a chain of components comprised of three bolts and one spout as shown in Fig. 3 and it operates at the level of sentences with a maximum length of 119 characters. The spout is connected with the uni-gram bolt which convert texts into unigram tokens. These tokens are fed into Identify Keyword bolt which looks for keyword defined by a user. The Keyword Count bolt make a count for the presence of uni-gram in a tuple and store all results into a single file database 3 using field grouping. In the experiments, the processing logic components was set to have thirteen stream managers comprising of one spout executors, four uni-gram bolt executors, four Identify Keyword executors and four Keyword Count executors.
GEneral matrix to matrix multiplication directed acyclic graph (GEMM-DAG)
The three-stage topology structure is alike to a chain of components comprised of three parallel bolts and one spout as shown in Fig. 4 . To evaluate our framework, microbenchmark topology operates at the level of sentences with a maximum length of 20 words. The purpose of this topology is to have a performance profile of data processing platform during computation of CPU intensive operation on data stream along with deep learning model operations and matrix multiplication are the best candidate in such domain. Since performance of such operation varies due to dependency on size of matrices and kernel implementation along with the type of bound which can be computed, bandwidth and occupancy bound. Surprisingly, the way these models are utilized in practice are diverse as the optimization space for hardware and software targeting deep learning is large and underspecified [20] [21] [22] . The spout is connected with two parallel bolts named as Matrix A and Matrix B which generates the matrix based on tuples received from the spout. These bolts generates a sparse matrix of certain user defined size based on the tuples they received from bolts and the presence of common words among two tuples. These matrices are fed into the GEMM bolt which performs the multiplication operation after receiving matrices from both tuples. The results are stored into a single file database (see footnote 3) using shuffle grouping and tumbling window.
In the experiments, the processing logic components configured to have twelve stream managers comprising of one spout executor, six matrix A executors, six matrix B executors and ten GEMM executors each perform operation on matrix of size 1048 with value of BLAS α ∈ 1 and BLAS β ∈ 0.
Unique sort acyclic graph (US-DAG)
The four stage topology structure is alike to a chain of components having five bolts including two adjacent parallel bolts and two parallel spouts as shown in Fig. 5 . To evaluate our framework, micro-benchmark topology operates at the level of sentences with a maximum length of 30 words. The purpose of this topology is to have a performance profile of data processing platform during computation of memory intensive operation on data streams and sorting algorithms are the best candidate for memory intensive computation. Since performance of such operation varies due to dependencies on size of inputs and kernel implementation along with the type of stableness it achieve. The spout is connected with bolt named as Unique A which transforms into tuples containing unique words based on tuples received from the spout. These transformed are fed into bolt named as Sort A which sort these tuples using standard merge sort algorithms which itself has O(n) space complexity. Lastly, sorted tuples are merged into a single stream and stored into a single file database (see footnote 3) using shuffle grouping. The experiments conducted with 16 stream managers comprising of one spout executor, twenty two unique A executors, twelve sort A executors and five merge bolt executors. 
Speed of light compute directed acyclic graph (SOL-DAG)
The two stage topology has a chain-like structures with a group of bolts and a spout as shown in Fig. 6 . To evaluate our framework, micro-benchmark topology operates at the level of sentences with a user-defined length of words. The purpose of this topology is to have a performance profile of data processing platform during computation of network intensive operations on data streams and fast tuples consumption algorithms are the best candidate for network intensive computation with varying message size. Since performance of such algorithms varies due to presence of number of bolts, message size and bandwidth used to interconnect these components running on distributed compute nodes. The spout is connected with the user-defined count of bolts termed as Emit 1 and reach till Emit (n − 1). The goal of this topology is to have a performance trace of network therefore we try to keep as minimum computation as can. Lastly, emitted counts are stored into a single file database (see footnote 3) using shuffle grouping. The experiments conducted with 19 stream managers comprising of one spout executor, sixty emit n executors.
Speed of light sleep directed acyclic graph (SOLS-DAG)
The three-stage topology structure is alike to a chain of components having two groups of bolts and spout as shown in Fig. 7 . To evaluate our framework, micro-benchmark topology operates at the level of sentences with a fixed length of words. The purpose of this topology is to have a performance profile of data processing platform during computation of scheduler intensive operation on data streams and placing a bolt into an idle state for fixed quanta of time, which makes it a best candidate for scheduler intensive computations. Most state-of-the-art big data systems scheduler has the capability to integrate with existing version of heron and provide state-of-the-art computation solutions but how effectively it copes with the system calls at runtime totally depends on the scheduler being used. Although there is a assurance of minimum time duration but there is not strict assurance in executing the contemporary thread immediately. How these knobs are utilized in practice can be diverse since the optimization space is large and underspecified because these assurance is dependent on thread priorities and scheduler's decision. The spout is connected with the user-defined count of bolts termed as Sleep 1 and reach till Sleep (n − 1). Tuples from these bolts are fed into groups of bolts termed as Emit 1 and reach till Emit (n − 1). The goal of this topology is to have performance traces of scheduler which inspire us keep to keep minimum processing logic at the contemporary components. Lastly, emitted counts are stored into a single file database (see footnote 3) using shuffle grouping. In the experiments, heron cluster is configured with 16 stream managers comprised with one spout, ten sleep bolt executor with a parallelism of eight and ten emit bolt with a parallelism count one.
Results and inferences
In this section, we describe the brief experimental results of the five benchmark topologies in "Experimental evaluation" section. These are based on three commonly used metrics Average Accuracy Rate over a fixed quanta of time (here, 60 s).
Average error rate over a fixed quanta of time (here, 120 s) and evaluated using (100-accuracy) and Overall Execution Latency comprised of default and estimated execution latency of processing logic units over 20 min window frame. In the corresponding Figs. 8, 9, 10, 11 and 12 , "PL-SNumber", "SNumber-PNumber", "PNumber-Sq" and "Overall" resembles to the predicted accuracy, predicted error rate of tuple transmission latency among pulsar processing component and spout, spout and bolt, bolt and bolt, bolt and SQLite edging bolt after stabilization respectively. From these experimental norms, we can make the following observations.
GC-DAG topology
The experimental conclusion of the contemporary graph is delineated in Fig. 8 . The average prediction accuracy rate of individual processing logic components are shown in Fig. 8a that varies from 99.91% (among the source component and uni-gram component) to 88.45% (at keyWord Count component) for MSL model. The performance assessment of contemporary models would be interesting in presence of unpredictable workload variation. To represent dynamic behavior we forcefully ingest skewness in the processing of user-defined components by restricting the parallelism count to be four. Due to dynamic variations in process unit metrics, the available metrics are far enough to cover all possible values; thus it reduces the predictive accuracy of topology to 90.90%, which is slightly higher than individual accuracy rates. Surprisingly, for model DKL prediction accuracy rate of individual processing logic components varies from 99.65% (among the source component and uni-gram component) ( 
7)
(2019) 9:2 to 4.585% (at keyWord count edging component). The presence of sparse attributed metrics leads to 29.89% reduction in prediction accuracy which is slightly lesser than prediction accuracy of individual processing component. Later, based on the experimental conclusion we found that discussed accuracy is much lower than overall accuracy achieved through MSL performance model. The average prediction error rates vary from 11.54% (among the source and uni-gram component) to 11.96% (at keyWord count component) and 0.348% (among the source and uni-gram component) to 95.84% (at keyWord count edging component) for MSL and DKL models respectively as shown in Fig. 8b . Even though DKL prediction model achieves an average accuracy of 29.89% but overall how it actually performs when its estimated latencies are compared with default dynamic latencies along the latencies estimated with MSL model are shown in Fig. 8c, d for a regular time frame of 20 min for spouts and bolts components respectively. Moreover, the default normalized dynamic execution latencies of bolts is much lower than the normalized estimated execution latencies of DKL prediction model as shown in Fig. 8d . However, as shown in Fig. 8c there is no significant PL-S1 S1-P1 P1-P2 P2-P3 P3-S q difference among default and estimated normalized dynamic execution latencies of spouts.
GEMM-DAG topology
The experimental conclusion of the contemporary graph is delineated in Fig. 9 . The average prediction accuracy rate of individual processing logic components are shown in Fig. 9a that varies from 99.74% (among the source component and matrix component) to 86.10% (at GEMM component) for MSL model. The performance assessment of contemporary models would be interesting in presence of unpredictable workload variation. To represent dynamic behavior we forcefully ingest skewness in the processing of user-defined components by restricting the parallelism count to be six. Due to dynamic variations in process unit metrics, the available metrics are far enough to cover all possible values; thus it reduces the predictive accuracy of topology to 70.41%, which is slightly higher than individual accuracy rates. Surprisingly, for model DKL prediction accuracy rate of individual processing logic components varies from 90.03% (among the source component and matrix component) to 67.69% (at GEMM edging component). The presence of sparse attributed metrics leads to 66.18% reduction in prediction accuracy which is slightly lesser than prediction accuracy of individual processing component. Later, based on the experimental conclusion we found that discussed accuracy is much lower than overall accuracy achieved through MSL performance model. The average prediction error rates vary from 0.25% (among the source and matrix component) to 29.58% (at GEMM component) and 9.96% (among the source and matrix component) to 32.30% (at GEMM edging component) for MSL and DKL models respectively as shown in Fig. 9b . Even though DKL prediction model achieves an average accuracy of 66.181 percent but overall how it actually performs when its estimated latencies are compared with default dynamic latencies along the latencies estimated with MSL model are shown in Fig. 9c, d for a regular time frame of 20 min for spouts and bolts respectively. Moreover, the default normalized dynamic execution latencies of spouts is much lower than the normalized estimated execution latencies of DKL prediction model as shown in Fig. 9c . However, as shown in Fig. 9d there is no significant difference among default and estimated normalized dynamic execution latencies of bolts. 
SOL-DAG topology
The experimental conclusion of the contemporary graph is delineated in Fig. 10 . The average prediction accuracy rate of individual processing logic components are shown in Fig. 10a that varies from 93.85% (among the source component and emit component) to 91.73% (at emit component) for MSL model. The performance assessment of contemporary models would be interesting in presence of unpredictable workload variation.
To represent dynamic behavior we forcefully ingest skewness in the processing of userdefined components by restricting the parallelism count to be one. Due to dynamic variations in process unit metrics, the available metrics are far enough to cover all possible values; thus it reduces the predictive accuracy of topology to 92.79%, which is slightly lower than individual accuracy rates. Surprisingly, for model DKL prediction accuracy rate of individual processing logic components varies from 99.99% (among the source component and emit component) to 69.63% (at emit edging component). The presence of sparse attributed metrics leads to 84.81% reduction in prediction accuracy which is slightly lesser than prediction accuracy of individual processing component. The average prediction error rates vary from 6.76% (among the source and emit component) to 9.85% (at emit component) and 0.01% (among the source and emit component) to 73.81% (at emit edging component) for MSL and DKL models respectively as shown in Fig. 9b . The MSL model on an average perfectly estimate the default dynamic latencies but scenario are very different for model DKL as shown in Fig. 10c, d . Moreover, the default normalized dynamic execution latencies of spout is almost similar with DKL and MSL prediction model observations but there exist significant deviation in 15-20 min range which can be negligible since difference is very minute as shown in Fig. 10c . However, there exists significant difference as shown in Fig. 10d among default and estimated normalized dynamic execution latencies of bolts with DKL prediction model over a duration of 20 min.
SOLS-DAG topology
The experimental conclusion of the contemporary graph is delineated in Fig. 11 . The average prediction accuracy rate of individual processing logic components are shown in components by restricting the parallelism count to be eight and one respectively. Due to dynamic variations in process unit metrics, the available metrics are far enough to cover all possible values; thus it reduces the predictive accuracy of topology to 69.46%, which is slightly lower than individual accuracy rates. Surprisingly, for model DKL prediction accuracy rate of individual processing logic components varies from 92.29% (among the source component and sleep component) to 31.72% (at emit edging component). The presence of sparse attributed metrics leads to 66.12% reduction in prediction accuracy which is slightly lesser than prediction accuracy of individual processing component. The average prediction error rates vary from 1.55% (among the source and sleep component) to 48.05% (at emit component) and 7.52% (among the source and emit component) to 58.72% (at emit edging component) for MSL and DKL models respectively as shown in Fig. 11b . The MSL model on an average perfectly estimate the default dynamic latencies but scenario are very different for model DKL as shown in Fig. 11c, d . Moreover, the default normalized dynamic execution latencies of spout is almost similar with DKL and MSL prediction model observations but there exist significant deviation in 0-20 min range which can't be negligible as shown in Fig. 11c with observations of DKL model. However, similar observation exists with bolts but there exists a huge variations among DKL and MSL prediction model comparing with default normalized execution latencies as shown in Fig. 11d over a range from 0 to 20 min.
US-DAG topology
The experimental conclusion of the contemporary graph is delineated in Fig. 12 . The average prediction accuracy rate of individual processing logic components are shown in Fig. 12a that varies from 95.37% (among the source component and unique component) to 70.30% (at merge component) for MSL model. The performance assessment of contemporary models would be interesting in presence of unpredictable workload variation. To represent dynamic behavior we forcefully ingest skewness in the processing of user-defined components by restricting the parallelism count to be 12, 22 and 5 respectively. Due to dynamic variations in process unit metrics, the available metrics are far enough to cover all possible values; thus it reduces the predictive accuracy of topology to 79.56%, which is slightly higher than individual accuracy rates. Surprisingly, for model DKL prediction accuracy rate of individual processing logic components varies from 97.35% (among the source component and unique component) to 65.53% (at emit merge component). The presence of sparse attributed metrics leads to 84.82% reduction in prediction accuracy which is slightly lesser than prediction accuracy of individual processing component. The average prediction error rates vary from 6.34% (among the source and unique component) to 26.60% (at merge component) and 2.11% (among the source and unique component) to 8.43% (at merge edging component) for MSL and DKL models respectively as shown in Fig. 12b . The DKL model on an average perfectly estimate the default dynamic latencies but scenarios are very different for model MSL as shown in Fig. 12c, d . Moreover, the default normalized dynamic execution latencies of spouts is almost similar with MSL and DKL prediction model observations but there is small deviation among latencies in MSL observations comparing with default observations as shown in Fig. 12c . Unfortunately, there exists a huge variations among default normalized dynamic latencies and normalized latencies of MSL prediction model as shown in Fig. 12 d over a range from 2.5 to 20 min.
