4,490 research outputs found

    A Survey on Automatic Parameter Tuning for Big Data Processing Systems

    Get PDF
    Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

    An autonomous system for maintenance scheduling data-rich complex infrastructure:Fusing the railways’ condition, planning and cost

    Get PDF
    National railways are typically large and complex systems. Their network infrastructure usually includes extended track sections, bridges, stations and other supporting assets. In recent years, railways have also become a data-rich environment. Railway infrastructure assets have a very long life, but inherently degrade. Interventions are necessary but they can cause lateness, damage and hazards. Every day, thousands of discrete maintenance jobs are scheduled according to time and urgency. Service disruption has a direct economic impact. Planning for maintenance can be complex, expensive and uncertain. Autonomous scheduling of maintenance jobs is essential. The design strategy of a novel integrated system for automatic job scheduling is presented; from concept formulation to the examination of the data to information transitional level interface, and at the decision making level. The underlying architecture configures high-level fusion of technical and business drivers; scheduling optimized intervention plans that factor-in cost impact and added value. A proof of concept demonstrator was developed to validate the system principle and to test algorithm functionality. It employs a dashboard for visualization of the system response and to present key information. Real track incident and inspection datasets were analyzed to raise degradation alarms that initiate the automatic scheduling of maintenance tasks. Optimum scheduling was realized through data analytics and job sequencing heuristic and genetic algorithms, taking into account specific cost & value inputs from comprehensive task cost modelling. Formal face validation was conducted with railway infrastructure specialists and stakeholders. The demonstrator structure was found fit for purpose with logical component relationships, offering further scope for research and commercial exploitation

    Cloud computing resource scheduling and a survey of its evolutionary approaches

    Get PDF
    A disruptive technology fundamentally transforming the way that computing services are delivered, cloud computing offers information and communication technology users a new dimension of convenience of resources, as services via the Internet. Because cloud provides a finite pool of virtualized on-demand resources, optimally scheduling them has become an essential and rewarding topic, where a trend of using Evolutionary Computation (EC) algorithms is emerging rapidly. Through analyzing the cloud computing architecture, this survey first presents taxonomy at two levels of scheduling cloud resources. It then paints a landscape of the scheduling problem and solutions. According to the taxonomy, a comprehensive survey of state-of-the-art approaches is presented systematically. Looking forward, challenges and potential future research directions are investigated and invited, including real-time scheduling, adaptive dynamic scheduling, large-scale scheduling, multiobjective scheduling, and distributed and parallel scheduling. At the dawn of Industry 4.0, cloud computing scheduling for cyber-physical integration with the presence of big data is also discussed. Research in this area is only in its infancy, but with the rapid fusion of information and data technology, more exciting and agenda-setting topics are likely to emerge on the horizon
    corecore