60,957 research outputs found

    Distributed Caching for Processing Raw Arrays

    Get PDF
    As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority - by as much as two orders of magnitude - of the proposed framework over existing techniques in terms of cache overhead and workload execution time

    Distributed Caching for Complex Querying of Raw Arrays

    Get PDF
    As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority -- by as much as two orders of magnitude -- of the proposed framework over existing techniques in terms of cache overhead and workload execution time

    The use of gliders for oceanographic science: the data processing gap

    Get PDF
    Autonomous gliders represent a step change in the way oceanographic data can be collected and as such they are increasingly seen as valuable tools in the oceanographer’s arsenal. However, their increase in use has left a gap regarding the conversion of the signals that their sensors collect into scientifically useable data.At present the novelty of gliders means that only a few research groups within the UK are capable of processing glider data whilst the wider oceanographic community is often unaware that requesting deployment of a glider by MARS does not mean that they will be provided with fully processed and calibrated data following the deployment. This is not a failing of MARS – it is not in their remit – but it does mean that a solution is needed at the UK community level. The solution is also needed quickly given the rapidly growing glider fleet and requests to use it.To illustrate the far from trivial resources and issues needed to solve this problem at a community level, this document briefly summarises the resources and steps involved in carrying glider data through from collection to final product, for the glider owning research groups within the UK which have the capability.This report does not provide a recommendation on whether such a community facility should be the responsibility of NOC, BODC or MARS but does provide information on possible protocols and available software that could be part of a solution.This report does, however, recommend that, to support the growing use of the MARS gliders, a permanently staffed group is needed as a priority, to provide data processing and calibration necessary to allow the translation of glider missions into high impact scientific publications

    Serving GODAE Data and Products to the Ocean Community

    Get PDF
    The Global Ocean Data Assimilation Experiment (GODAE [http:// www.godae.org]) has spanned a decade of rapid technological development. The ever-increasing volume and diversity of oceanographic data produced by in situ instruments, remote-sensing platforms, and computer simulations have driven the development of a number of innovative technologies that are essential for connecting scientists with the data that they need. This paper gives an overview of the technologies that have been developed and applied in the course of GODAE, which now provide users of oceanographic data with the capability to discover, evaluate, visualize, download, and analyze data from all over the world. The key to this capability is the ability to reduce the inherent complexity of oceanographic data by providing a consistent, harmonized view of the various data products. The challenges of data serving have been addressed over the last 10 years through the cooperative skills and energies of many individuals

    ArrayBridge: Interweaving declarative array processing with high-performance computing

    Full text link
    Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.Comment: 12 pages, 13 figure

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Numerical analysis of a fin-tube plate heat exchanger with winglets

    Get PDF
    In this presented work, numerical analysis of heat transfer and flow characteristic using longitudinal vortex generators (LVGS) in fin and flat tube heat exchanger has been presented. Conjugate heat transfer 3D numerical model has been developed and successfully carried out. Rectangular winglets were set in pairs, with downstream orientation. The effects of impact angles of (20⁰ , 30⁰, and 40⁰ ) as well as tubes and winglets were placed in one row lined arrangement and air flow by forward arrangement and backward arrangement. Reynolds number is ranged from 500 to 5000. The numerical results showed that in the range of the present study, the variation of these parameters can result in the increase of heat transfer. The study focuses on the Influence of the different parameters of VGs on heat transfer and fluid flow characteristics of one row lined circular-tube banks. The characteristics of average Nu number and skin friction coefficient are studied numerically by the aid of the computational fluid dynamics (CFD) commercial code of FLUENT ANSYS 14. The results showed increasing in the heat transfer and skin friction coefficient with the increasing of Re number. It has been observed that the overall Nuav number of one circular tubes increases by 23-31% ,by 23-43% and by 23-47% with angles of (20⁰, 30°, and 40⁰) respectively, in forward arrangement and the overall Nuav number of one circular tubes increases by 23-42%, by 23-46% and 23-52%with angles of (20⁰, 30°, and 40⁰) respectively, in backward arrangement, with increasing in the overall average of skin friction coefficient. Also the results showed that the rectangular winglet pairs (RWPs) can significantly improve the heat transfer performance of the fin and-tube heat exchangers with a moderate pressure loss penalty

    Image processing for smarter browsing of ocean color data products: investigating algal blooms

    Get PDF
    Remote sensing technology continues to play a significant role in the understanding of our environment and the investigation of the Earth. Ocean color is the water hue due to the presence of tiny plants containing the pigment chlorophyll, sediments, and colored dissolved organic material and so can provide valuable information on coastal ecosystems. We propose to make the browsing of Ocean Color data more efficient for users by using image processing techniques to extract useful information which can be accessible through browser searching. Image processing is applied to chlorophyll and sea surface temperature images. The automatic image processing of the visual level 1 and level 2 data allow us to investigate the occurrence of algal blooms. Images with colors in a certain range (red, orange etc.) are used to address possible algal blooms and allow us to examine the seasonal variation of algal blooms in Europe (around Ireland and in the Baltic Sea). Yearly seasonal variation of algal blooms in Europe based on image processing for smarting browsing of Ocean Color are presented

    Streamlining Sound Speed Profile Pre-Processing: Case Studies and Field Trials

    Get PDF
    High rate sound speed profiling systems have the potential to maximize the efficiency of multibeam echosounder systems (MBES) by increasing the accuracy at the outer edges of the swath where refraction effects are at their worst. In some cases, high rate sampling on the order of tens of casts per hour is required to capture the spatio-temporal oceanographic variability and this increased sampling rate can challenge the data acquisition workflow if refraction corrections are to be applied in real-time. Common bottlenecks result from sound speed profile (SSP) preprocessing requirements, e.g. file format conversion, cast extension, reduction of the number of points in the cast, filtering, etc. Without the ability to quickly pre-process SSP data, the MBES operator can quickly become overwhelmed with SSP related tasks, potentially to the detriment of their other duties. A series of algorithms are proposed in which SSPs are automatically pre-processed to meet input criteria of MBES acquisition systems, specifically the problems of cast extrapolation and thinning are addressed. The algorithmic performance will be assessed in terms of sounding uncertainty through a series of case studies in a variety of oceanographic conditions and water depths. Results from a field trial in the French Mediterranean will be used to assess the improvement in real-time MBES acquisition workflow and survey accuracy and will also highlight where further improvements can be made in the pre-processing pipeline
    • 

    corecore