148 research outputs found

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    Autonomic visualisation.

    Get PDF
    This thesis introduces the concept of autonomic visualisation, where principles of autonomic systems are brought to the field of visualisation infrastructure. Problems in visualisation have a specific set of requirements which are not always met by existing systems. The first half of this thesis explores a specific problem for large scale visualisation; that of data management. Visualisation algorithms have somewhat different requirements to other external memory problems, due to the fact that they often require access to all, or a large subset, of the data in a way that is highly dependent on the view. This thesis proposes a knowledge-based approach to pre-fetching in this context, and presents evidence that such an approach yields good performance. The knowledge based approach is incorporated into a five-layer model, which provides a systematic way of categorising and designing out-of-core, or external memory, systems. This model is demonstrated with two example implementations, on in the local and one in the remote context. The second half explores autonomic visualisation in the more general case. A simulation tool, created for the purpose of designing autonomic visualisation infrastructure is presented. This tool, SimEAC, provides a way of facilitating the development of techniques for managing large-scale visualisation systems. The abstract design of the simulation system, as well as details of the implementation are presented. The architecture of the simulator is explored, and then the system is evaluated in a number of case studies indicating some of the ways in which it can be used. The simulator provides a framework for experimentation and rapid prototyping of large scale autonomic systems

    Multiresolution Techniques for Real–Time Visualization of Urban Environments and Terrains

    Get PDF
    In recent times we are witnessing a steep increase in the availability of data coming from real–life environments. Nowadays, virtually everyone connected to the Internet may have instant access to a tremendous amount of data coming from satellite elevation maps, airborne time-of-flight scanners and digital cameras, street–level photographs and even cadastral maps. As for other, more traditional types of media such as pictures and videos, users of digital exploration softwares expect commodity hardware to exhibit good performance for interactive purposes, regardless of the dataset size. In this thesis we propose novel solutions to the problem of rendering large terrain and urban models on commodity platforms, both for local and remote exploration. Our solutions build on the concept of multiresolution representation, where alternative representations of the same data with different accuracy are used to selectively distribute the computational power, and consequently the visual accuracy, where it is more needed on the base of the user’s point of view. In particular, we will introduce an efficient multiresolution data compression technique for planar and spherical surfaces applied to terrain datasets which is able to handle huge amount of information at a planetary scale. We will also describe a novel data structure for compact storage and rendering of urban entities such as buildings to allow real–time exploration of cityscapes from a remote online repository. Moreover, we will show how recent technologies can be exploited to transparently integrate virtual exploration and general computer graphics techniques with web applications

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Building Efficient Software to Support Content Delivery Services

    Get PDF
    Many content delivery services use key components such as web servers, databases, and key-value stores to serve content over the Internet. These services, and their component systems, face unique modern challenges. Services now operate at massive scale, serving large files to wide user-bases. Additionally, resource contention is more prevalent than ever due to large file sizes, cloud-hosted and collocated services, and the use of resource-intensive features like content encryption. Existing systems have difficulty adapting to these challenges while still performing efficiently. For instance, streaming video web servers work well with small data, but struggle to service large, concurrent requests from disk. Our goal is to demonstrate how software can be augmented or replaced to help improve the performance and efficiency of select components of content delivery services. We first introduce Libception, a system designed to help improve disk throughput for web servers that process numerous concurrent disk requests for large content. By using serialization and aggressive prefetching, Libception improves the throughput of the Apache and nginx web servers by a factor of 2 on FreeBSD and 2.5 on Linux when serving HTTP streaming video content. Notably, this improvement is achieved without changing the source code of either web server. We additionally show that Libception's benefits translate into performance gains for other workloads, reducing the runtime of a microbenchmark using the diff utility by 50% (again without modifying the application's source code). We next implement Nessie, a distributed, RDMA-based, in-memory key-value store. Nessie decouples data from indexing metadata, and its protocol only consumes CPU on servers that initiate operations. This design makes Nessie resilient against CPU interference, allows it to perform well with large data values, and conserves energy during periods of non-peak load. We find that Nessie doubles throughput versus other approaches when CPU contention is introduced, and has 70% higher throughput when managing large data in write-oriented workloads. It also provides 41% power savings (over idle power consumption) versus other approaches when system load is at 20% of peak throughput. Finally, we develop RocketStreams, a framework which facilitates the dissemination of live streaming video. RocketStreams exposes an easy-to-use API to applications, obviating the need for services to manually implement complicated data management and networking code. RocketStreams' TCP-based dissemination compares favourably to an alternative solution, reducing CPU utilization on delivery nodes by 54% and increasing viewer throughput by 27% versus the Redis data store. Additionally, when RDMA-enabled hardware is available, RocketStreams provides RDMA-based dissemination which further increases overall performance, decreasing CPU utilization by 95% and increasing concurrent viewer throughput by 55% versus Redis

    Scalable Real-Time Rendering for Extremely Complex 3D Environments Using Multiple GPUs

    Get PDF
    In 3D visualization, real-time rendering of high-quality meshes in complex 3D environments is still one of the major challenges in computer graphics. New data acquisition techniques like 3D modeling and scanning have drastically increased the requirement for more complex models and the demand for higher display resolutions in recent years. Most of the existing acceleration techniques using a single GPU for rendering suffer from the limited GPU memory budget, the time-consuming sequential executions, and the finite display resolution. Recently, people have started building commodity workstations with multiple GPUs and multiple displays. As a result, more GPU memory is available across a distributed cluster of GPUs, more computational power is provided throughout the combination of multiple GPUs, and a higher display resolution can be achieved by connecting each GPU to a display monitor (resulting in a tiled large display configuration). However, using a multi-GPU workstation may not always give the desired rendering performance due to the imbalanced rendering workloads among GPUs and overheads caused by inter-GPU communication. In this dissertation, I contribute a multi-GPU multi-display parallel rendering approach for complex 3D environments. The approach has the capability to support a high-performance and high-quality rendering of static and dynamic 3D environments. A novel parallel load balancing algorithm is developed based on a screen partitioning strategy to dynamically balance the number of vertices and triangles rendered by each GPU. The overhead of inter-GPU communication is minimized by transferring only a small amount of image pixels rather than chunks of 3D primitives with a novel frame exchanging algorithm. The state-of-the-art parallel mesh simplification and GPU out-of-core techniques are integrated into the multi-GPU multi-display system to accelerate the rendering process
    • …
    corecore