106 research outputs found

    Storage and Querying of Large Persistent Arrays

    Get PDF
    The scientic and analytical applications today are increasingly becoming data in- tensive. Many such applications deal with data that is multidimensional in nature. Traditionally, relational database systems have been used by many data intensive application, and relational paradigm has proved to be both natural and ecient. However, for multidimensional data, when the number of dimensions becomes large, relational databases are inecient both in terms of storage and query response time. In this thesis, we explore linearised storage, and indexed and skiplist based retrieval on persistent arrays. The application programs are provided with a logical view of multidimensional array. The techniques have been implemented in a home-grown database management system called MuBase

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Big Data Analytics for Earth Sciences: the EarthServer approach

    Get PDF
    Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains

    Evaluation of standards and techniques for retrieval of geospatial raster data : a study for the ICOS Carbon Portal

    Get PDF
    Evaluation of Standards and Techniques for Retrieval of Geospatial Raster Data - A study for ICOS Carbon Portal Geospatial raster data represent the world as a surface with its geographic information which varies continuously. These data can be grid-based data like Digital Terrain Elevation Data (DTED) and geographic image data like multispectral images. The Integrated Carbon Observation System (ICOS) European project is launched to measure greenhouse gases emission. The outputs of these measurements are the data in both geospatial vector (raw data) and raster formats (elaborated data). By using these measurements, scientists create flux maps over Europe. The flux maps are important for many groups such as researchers, stakeholders and public users. In this regard, ICOS Carbon Portal (ICOS CP) looks for a sufficient way to make the ICOS elaborated data available for all of these groups in an online environment. Among others, ICOS CP desires to design a geoportal to let users download the modelled geospatial raster data in different formats and geographic extents. Open GeoSpatial Consortium (OGC) Web Coverage Service (WCS) defines a geospatial web service to render geospatial raster data such as flux maps in any desired subset in space and time. This study presents two techniques to design a geoportal compatible with WCS. This geoportal should be able to retrieve the ICOS data in both NetCDF and GeoTIFF formats as well as allow retrieval of subsets in time and space. In the first technique, a geospatial raster database (Rasdaman) is used to store the data. Rasdaman OGC component (Petascope) as the server tool connects the database to the client side through WCS protocol. In the Second technique, an advanced file-based system (NetCDF) is applied to maintain the data. THREDDS as the WCS server ships the data to the client side through WCS protocol. These two techniques returned good result to download the data in desired formats and subsets.Evaluation of Standards and Techniques for Retrieval of Geospatial Raster Data Geospatial data refer to an object or phenomena located on the specific scene in space, in relation with the other objects. They are linked to geometry and topology. Geospatial raster data are a subset of geospatial data. Geospatial raster data represent the world as a surface with its geographic information which varies continuously. These data can be grid-based data like Digital Terrain Elevation Data (DTED) and geographic image data like multispectral images. The challenges present in working with geospatial raster data are related to three important components: I) storage and management systems, II) standardized services and III) software interface of geospatial raster data. Each component has its own importance in the aim of improving the interaction with geospatial raster data. A proper geospatial raster data storage and management system makes it easy to classify, search and retrieve the data. A standardized service is needed to unify, download, process and share these data among other users. The last challenge is choosing suitable software interface to support the standardized services on the web. The aim is to provide ability for users to download geospatial raster data in different formats in any desired space and time subsets. In this regard, two different techniques are evaluated to connect the main three components to provide such aim. In the first technique, a geospatial raster database is used to store the data. Then this database is connected to the software interface through standardized service. In the Second technique, an advanced file-based system is applied to maintain the data. The server ships the data to software interface through standardized service. Although these two techniques have their own difficulties, they returned good result. Users can download the data in desired formats on the web. In addition, they can download the data for any specific area and specific time

    BIG GEO DATA MANAGEMENT: AN EXPLORATION WITH SOCIAL MEDIA AND TELECOMMUNICATIONS OPEN DATA

    Get PDF
    The term Big Data has been recently used to define big, highly varied, complex data sets, which are created and updated at a high speed and require faster processing, namely, a reduced time to filter and analyse relevant data. These data is also increasingly becoming Open Data (data that can be freely distributed) made public by the government, agencies, private enterprises and among others. There are at least two issues that can obstruct the availability and use of Open Big Datasets: Firstly, the gathering and geoprocessing of these datasets are very computationally intensive; hence, it is necessary to integrate high-performance solutions, preferably internet based, to achieve the goals. Secondly, the problems of heterogeneity and inconsistency in geospatial data are well known and affect the data integration process, but is particularly problematic for Big Geo Data. Therefore, Big Geo Data integration will be one of the most challenging issues to solve. With these applications, we demonstrate that is possible to provide processed Big Geo Data to common users, using open geospatial standards and technologies. NoSQL databases like MongoDB and frameworks like RASDAMAN could offer different functionalities that facilitate working with larger volumes and more heterogeneous geospatial data sources

    City Focus: A web-based interactive 2D and 3D GIS application to find the best place in a city, using open data and open source software

    Get PDF
    City Focus is a webbased interactive 2D and 3D GIS application to find the best place in a city to live as well as to pass shorter staying. The user can select among different criteria and decide their importance by assigning weights to each of them. The application provides thematic maps displaying insights on the places which better fit the user’s preferences. The resulting map is computed through map algebra by means of Web Coverage Processing Service WCPS provided by RASDAMAN Database Management System. Data visualization is mainly based on NASA Web WorldWind opensource virtual globe. The app exploits exclusively open data as well as Free and Open Source Software (FOSS) for its implementation by enabling continuous improvements while minimizing development costs

    SciQL, A query language for science applications

    Get PDF
    Scientific applications are still poorly served by contemporary relational database systems. At best, the system provides a bridge towards an external library using user-defined functions, explicit import/export facilities or linked-in Java/C# interpreters. Time has come to rectify this with SciQL, a SQL-query language for science applications with arrays as first class citizens. It provides a seamless symbiosis of array-, set-, and sequence- interpretation using a clear separation of the mathematical object from its underlying storage representation. The language extends value-based grouping in SQL with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between its index attributes. It leads to a generalization of window-based query processing. The SciQL architecture benefits from a column store system with an adaptive storage scheme, including keeping multiple representations around for reduced impedance mismatch. This paper is focused on the language features, its architectural consequences and extensive examples of its intended use

    RAM: array processing over a relational DBMS

    Get PDF
    Developing multimedia applications in relational databases is hindered by a mismatch in computational frameworks. Efficient manipulation of multimedia data calls for array-based processing, which at best is available as a database add-on, not supported by the query optimizer. As a result, array-based processing ends up in dedicated programs outside the DBMS: non-reusable black boxes. The goal of our research is to reduce this gap between user-needs and system functionality by developing a seemless integration of array processing in a relational algebra engine. The paper introduces a declarative language for array-expressions based on the array comprehension, and its mapping to a relational kernel in a prototype implementation. The layered architecture of the resulting array database management system allows the use of structural knowledge available in the array data type. This additional source of information can be exploited for query optimization, which is demonstrated with a case study. The experiments show how the performance of a standard tool for matrix computations can be achieved without sacrificing data independence, highlighting however a critical aspect in the DBMS architecture proposed

    Big Geospatial Data Analysis with Array Technologies for Agricultural Applications

    Get PDF
    165 σ.Οι δορυφορικές αποστολές US Landsat και EU Sentinel παρέχουν μαζικά διαχρονικά τηλεπισκοπικά δεδομένα. Για το λόγο αυτό, η ανάπτυξη αποδοτικών τεχνολογιών για την απευθείας διαχείριση και επεξεργασία αυτών των τηλεπισκοπικών δεδομένων είναι θεμελιώδους σημασίας. Προς την κατεύθυνση αυτή, σχεδιάστηκε, αναπτύχθηκε και αξιολογήθηκε ένα WebGIS σύστημα για την online ανάλυση ανοιχτών τηλεπισκοπικών δεδομένων και για εφαρμογές γεωργίας ακριβείας. Ειδικότερα, ο πυρήνας του συστήματος βασίζεται στο rasdaman Array Database Management System για την αποθήκευση των δεδομένων και το πρότυπο Web Coverage Processing Service του Open Geospatial Consortium για την εκτέλεση ερωτημάτων πάνω σε αυτά. Διάφορα ερωτήματα σχεδιάστηκαν και υλοποιήθηκαν για την πρόσβαση και την επεξεργασία πολυφασματικών δορυφορικών εικόνων. Το πρόγραμμα πελάτη του WebGIS συστήματος, το οποίο βασίζεται στις βιβλιοθήκες OpenLayers και GeoExt οι οποίες είναι γραμμένες στην γλώσσα προγραμματισμού javascript, χρησιμοποιεί τα υλοποιημένα ερωτήματα για την ad-hoc, online χωρική και φασματική ανάλυση των τηλεπισκοπικών δεδομένων. Το ανεπτυγμένο σύστημα στην τρέχουσα μορφή του καλύπτει πλήρως τον Ελλαδικό χώρο με πολυφασματικά δεδομένα τα οποία προέρχονται από το δορυφόρο Landsat 8,τα οποία με αυτόματο τρόπο συλλέγονται, προ-επεξεργάζονται, καταλογοποιούνται και είναι έτοιμα προς διάθεση και για τις περαιτέρω βασικές επεξεργασίες ανάλυσης. Τα ανεπτυγμένα ερωτήματα επεξεργασίας των δεδομένων τα οποία και εστιάζουν σε αγροτικές εφαρμογές είναι σε θέση να υπολογίσουν αποτελεσματικά την κάλυψη της βλάστησης, την κόμη φυλλώματος (canopy) και το υδατικό στρες της βλάστησης σε αγροτικές και δασώδεις εκτάσεις.Τα online παρεχόμενα τηλεπισκοπικά προϊόντα του συστήματος, συγκρίθηκαν και αξιολογήθηκαν σε σχέση με παρόμοιες διεργασίες οι οποίες πραγματοποιήθηκαν σε τυπικό λογισμικότηλεπισκόπησης και GIS συστημάτωνUS Landsat and EU Sentinel missions provide massive multitemporal remote sensing data. Therefore, the development of efficient technologies for their direct manipulation and processing is of fundamental importance. Towards this direction, we have designed, developed and evaluated a WebGIS system for the online analysis of open remote sensing data and for precision agriculture applications. In particular, the core functionality consists of the rasdaman Array Database Management System for storage, and the Open Geospatial Consortium Web Coverage Processing Service for data querying. Various queries have been designed and implemented in order to access and process multispectral satellite imagery. The web-client, which is based on the OpenLayers and GeoExt javascript libraries, exploits these queries enabling the online ad-hoc spatial and spectral remote sensing data analysis. The developed framework is fully covering Greece with Landsat 8 multispectral data which are stored and pre-processed automatically in our hardware for demonstration purposes. The developed queries, which are focusing on agricultural applications, can efficiently estimate vegetation coverage, canopy and water stress over agricultural and forest areas. The online delivered remote sensing products have been evaluated and compared with similar processes performed from standard desktop remote sensing and GIS software.Αθανάσιος K. Κάρμα

    SciQL, Bridging the Gap between Science and Relational DBMS

    Get PDF
    Scientific discoveries increasingly rely on the ability to efficiently grind massive amounts of experimental data using database technologies. To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we propose SciQL (pronounced as ‘cycle’), the first SQL-based query language for scientific applications with both tables and arrays as first class citizens. It provides a seamless symbiosis of array-, set- and sequence- interpretations. A key innovation is the extension of value-based grouping of SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between elements positions. This leads to a generalisation of window-based query processing with wide applicability in science domains. This paper describes the main language features of SciQL and illustrates it using time-series concepts
    corecore