121 research outputs found

    Extending OLAP Querying to External Object

    Get PDF

    A New Approach in Advance Network Reservation and Provisioning for High-Performance Scientific Data Transfers

    Get PDF
    Scientific applications already generate many terabytes and even petabytes of data from supercomputer runs and large-scale experiments. The need for transferring data chunks of ever-increasing sizes through the network shows no sign of abating. Hence, we need high-bandwidth high speed networks such as ESnet (Energy Sciences Network). Network reservation systems, i.e. ESnet's OSCARS (On-demand Secure Circuits and Advance Reservation System) establish guaranteed bandwidth of secure virtual circuits at a certain time, for a certain bandwidth and length of time. OSCARS checks network availability and capacity for the specified period of time, and allocates requested bandwidth for that user if it is available. If the requested reservation cannot be granted, no further suggestion is returned back to the user. Further, there is no possibility from the users view-point to make an optimal choice. We report a new algorithm, where the user specifies the total volume that needs to be transferred, a maximum bandwidth that he/she can use, and a desired time period within which the transfer should be done. The algorithm can find alternate allocation possibilities, including earliest time for completion, or shortest transfer duration - leaving the choice to the user. We present a novel approach for path finding in time-dependent networks, and a new polynomial algorithm to find possible reservation options according to given constraints. We have implemented our algorithm for testing and incorporation into a future version of ESnet?s OSCARS. Our approach provides a basis for provisioning end-to-end high performance data transfers over storage and network resources

    Grid collector: an event catalog with automated file management

    Full text link
    High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides ''direct'' access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, ''production date between March 10 and 20, and the number of charged tracks > 100.'' The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large variety of users

    Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

    Full text link
    Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment

    Parallel in situ indexing for data-intensive computing

    Full text link
    As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings

    Finding regions of interest on toroidal meshes

    Get PDF
    Fusion promises to provide clean and safe energy, and a considerable amount of research effort is underway to turn this aspiration intoreality. This work focuses on a building block for analyzing data produced from the simulation of microturbulence in magnetic confinementfusion devices: the task of efficiently extracting regions of interest. Like many other simulations where a large amount of data are produced,the careful study of ``interesting'' parts of the data is critical to gain understanding. In this paper, we present an efficient approach forfinding these regions of interest. Our approach takes full advantage of the underlying mesh structure in magnetic coordinates to produce acompact representation of the mesh points inside the regions and an efficient connected component labeling algorithm for constructingregions from points. This approach scales linearly with the surface area of the regions of interest instead of the volume as shown with bothcomputational complexity analysis and experimental measurements. Furthermore, this new approach is 100s of times faster than a recentlypublished method based on Cartesian coordinates
    corecore