Search CORE

482 research outputs found

Understanding Performance Inefficiencies In Native And Managed Languages

Author: Su Pengfei
Publication venue: W&M ScholarWorks
Publication date: 01/01/2020
Field of study

Production software packages have become increasingly complex with millions of lines of code, sophisticated control and data flow, and references to a hierarchy of external libraries. This complexity often introduces performance inefficiencies across software stacks, making it practically impossible for users to pinpoint them manually. Performance profiling tools (a.k.a. profilers) abound in the tools community to aid software developers in understanding program behavior. Classical profiling techniques focus on identifying hotspots. The hotspot analysis is indispensable; however, it can hardly diagnose whether a resource is being used in a productive manner that contributes to the overall efficiency of a program. Consequently, a significant burden is on developers to make a judgment call on whether there is scope to optimize a hotspot. Derived metrics, e.g., cache miss ratio, offer slightly better intuition into hotspots but are still not panaceas. Hence, there is a need for profilers that investigate resource wastage instead of usage. To overcome the critical missing pieces in prior work and complement existing profilers, we propose novel fine- and coarse-grained profilers to pinpoint varieties of performance inefficiencies and provide optimization guidance for a wide range of software covering benchmarks, enterprise applications, and large-scale parallel applications running on supercomputers and data centers. Fine-grained profilers are indispensable to understand performance inefficiencies comprehensively. We propose a whole-program profiler called LoadSpy, which works on binary executables to detect and quantify wasteful memory operations in their context and scope. Our observation, which is justified by myriad case studies, is that wasteful memory operations are often an indicator of various forms of performance inefficiencies, such as suboptimal choices of algorithms or data structures, missed compiler optimizations, and developers’ inattention to performance. Guided by LoadSpy, we are able to optimize a large number of well-known benchmarks and real-world applications, yielding significant speedups. Despite deep performance insights offered by fine-grained profilers, the high overhead keeps them away from widespread adoption, particularly in production. By contrast, coarse-grained profilers introduce low overhead at the cost of poor performance insights. Hence, another research topic is how we benefit from both, that is, the combination of deep insights of fine-grained profilers and low overhead of coarse-grained ones. The first effort to do so is proposing a lightweight profiler called JXPerf. It abandons heavyweight instrumentation by combining hardware performance monitoring units and debug registers available in commodity CPUs to detect wasteful memory operations. Compared with LoadSpy, JXPerf reduces the runtime overhead from 10x to 7% on average. The lightweight nature makes it useful in production. Another effort is proposing a lightweight profiler called FVSampler, the first nonintrusive profiler to study function execution variance

College of William & Mary: W&M Publish

Multi-objective scheduling of a steelmaking plant integrated with renewable energy sources and energy storage systems: Balancing costs, emissions and make-span

Author: Su Pengfei
Wu Jianzhong
Zhou Yue
Publication venue: Elsevier
Publication date: 20/11/2023
Field of study

As an energy-intensive industry, the steel industry grapples with increasing energy costs and decarbonisation pressures. Therefore, multi-objective optimisation is widely applied in the production scheduling of the steelmaking plant. However, the optimal solution prioritising energy savings and emission reductions may lead to impractical or less economically efficient solutions, since the processing time requirement (PTR) of steel production orders in real-world production is neglected. This study fills the research gap by discussing the impact of PTR on the make-span of the steelmaking process and incorporating it into the optimisation model. Considering the variability of PTR, the solving of the multi-objective scheduling problem is transformed into the selection from Pareto solutions with different make-spans. To better leverage the temporal flexibility of the steelmaking process, a what-if-analysis-based strategy coupled with the Normal Boundary Intersection method is proposed to generate a series of evenly distributed Pareto solutions. The energy storage system is integrated to improve the time granularity of the steelmaking plant's flexibility. Our case studies demonstrate that the electricity and emission costs are reduced by 68.5%, indirect emissions are reduced by 83.5%, and the on-site renewable energy self-consumption rate increases by 12.1%. The effectiveness of the proposed method implies that it is of great relevance to the development of a cleaner steel industry in the future

Online Research @ Cardiff