8 research outputs found

    Execution Time Prediction for Cypher Queries in the Neo4j Database Using a Learning Approach

    No full text
    With database management systems becoming complex, predicting the execution time of graph queries before they are executed is one of the challenges for query scheduling, workload management, resource allocation, and progress monitoring. Through the comparison of query performance prediction methods, existing research works have solved such problems in traditional SQL queries, but they cannot be directly applied in Cypher queries on the Neo4j database. Additionally, most query performance prediction methods focus on measuring the relationship between correlation coefficients and retrieval performance. Inspired by machine-learning methods and graph query optimization technologies, we used the RBF neural network as a prediction model to train and predict the execution time of Cypher queries. Meanwhile, the corresponding query pattern features, graph data features, and query plan features were fused together and then used to train our prediction models. Furthermore, we also deployed a monitor node and designed a Cypher query benchmark for the database clusters to obtain the query plan information and native data store. The experimental results of four benchmarks showed that the average mean relative error of the RBF model reached 16.5% in the Northwind dataset, 12% in the FIFA2021 dataset, and 16.25% in the CORD-19 dataset. This experiment proves the effectiveness of our proposed approach on three real-world datasets

    Flink-ER: An Elastic Resource-Scheduling Strategy for Processing Fluctuating Mobile Stream Data on Flink

    No full text
    As real-time and immediate feedback becomes increasingly important in tasks related to mobile information, big data stream processing systems are increasingly applied to process massive amounts of mobile data. However, when processing a drastically fluctuating mobile data stream, the lack of an elastic resource-scheduling strategy limits the elasticity and scalability of data stream processing systems. To address this problem, this paper builds a flow-network model, a resource allocation model, and a data redistribution model as the foundation for proposing Flink with an elastic resource-scheduling strategy (Flink-ER), which consists of a capacity detection algorithm, an elastic resource reallocation algorithm, and a data redistribution algorithm. The strategy improves the performance of the platform by dynamically rescaling the cluster and increasing the parallelism of operators based on the processing load. The experimental results show that the throughput of a cluster was promoted under the premise of meeting latency constraints, which verifies the efficiency of the strategy
    corecore