3 research outputs found

    Modeling High-throughput Applications for in situ Analytics

    Get PDF
    International audienceWith the goal of performing exascale computing, the importance of I/Omanagement becomes more and more critical to maintain system performance.While the computing capacities of machines are getting higher, the I/O capa-bilities of systems do not increase as fast. We are able to generate more databut unable to manage them eciently due to variability of I/O performance.Limiting the requests to the Parallel File System (PFS) becomes necessary. Toaddress this issue, new strategies are being developed such as online in situanalysis. The idea is to overcome the limitations of basic post-mortem dataanalysis where the data have to be stored on PFS rst and processed later.There are several software solutions that allow users to specically dedicatenodes for analysis of data and distribute the computation tasks over dier-ent sets of nodes. Thus far, they rely on a manual resource partitioning andallocation by the user of tasks (simulations, analysis).In this work, we propose a memory-constraint modelization for in situ anal-ysis. We use this model to provide dierent scheduling policies to determineboth the number of resources that should be dedicated to analysis functions,and that schedule eciently these functions. We evaluate them and show theimportance of considering memory constraints in the model. Finally, we discussthe dierent challenges that have to be addressed in order to build automatictools for in situ analytics

    From complex data to clear insights: visualizing molecular dynamics trajectories

    Get PDF
    Advances in simulations, combined with technological developments in high-performance computing, have made it possible to produce a physically accurate dynamic representation of complex biological systems involving millions to billions of atoms over increasingly long simulation times. The analysis of these computed simulations is crucial, involving the interpretation of structural and dynamic data to gain insights into the underlying biological processes. However, this analysis becomes increasingly challenging due to the complexity of the generated systems with a large number of individual runs, ranging from hundreds to thousands of trajectories. This massive increase in raw simulation data creates additional processing and visualization challenges. Effective visualization techniques play a vital role in facilitating the analysis and interpretation of molecular dynamics simulations. In this paper, we focus mainly on the techniques and tools that can be used for visualization of molecular dynamics simulations, among which we highlight the few approaches used specifically for this purpose, discussing their advantages and limitations, and addressing the future challenges of molecular dynamics visualization

    Adaptive Asynchronous Control and Consistency in Distributed Data Exploration Systems

    Get PDF
    Advances in machine learning and streaming systems provide a backbone to transform vast arrays of raw data into valuable information. Leveraging distributed execution, analysis engines can process this information effectively within an iterative data exploration workflow to solve problems at unprecedented rates. However, with increased input dimensionality, a desire to simultaneously share and isolate information, as well as overlapping and dependent tasks, this process is becoming increasingly difficult to maintain. User interaction derails exploratory progress due to manual oversight on lower level tasks such as tuning parameters, adjusting filters, and monitoring queries. We identify human-in-the-loop management of data generation and distributed analysis as an inhibiting problem precluding efficient online, iterative data exploration which causes delays in knowledge discovery and decision making. The flexible and scalable systems implementing the exploration workflow require semi-autonomous methods integrated as architectural support to reduce human involvement. We, thus, argue that an abstraction layer providing adaptive asynchronous control and consistency management over a series of individual tasks coordinated to achieve a global objective can significantly improve data exploration effectiveness and efficiency. This thesis introduces methodologies which autonomously coordinate distributed execution at a lower level in order to synchronize multiple efforts as part of a common goal. We demonstrate the impact on data exploration through serverless simulation ensemble management and multi-model machine learning by showing improved performance and reduced resource utilization enabling a more productive semi-autonomous exploration workflow. We focus on the specific genres of molecular dynamics and personalized healthcare, however, the contributions are applicable to a wide variety of domains
    corecore