709 research outputs found

    AsterixDB: A Scalable, Open Source BDMS

    Full text link
    AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

    Optimization Technique for Efficient Dynamic Query Forms with Keyword Search and NoSQL

    Get PDF
    Modern web database as well as scientific database maintains tremendous and heterogeneous that is unstructured data. In order to mine this data traditional data mining technologies cannot work properly. These real word databases may contain hundreds or even thousands of relations and attributes. Latest trends like Big data and cloud computing that leads to the adoption of NoSQL which simply means Not Only SQL. In current scenario Most of the web applications are hosted in cloud and that available through internet. This create explosion in number of concurrent users. So the technique to handle unstructured data is proposed in our work named as Dynamic query forms with nosql. This system presents dynamic query form interface for database exploration of an organization. In this system document oriented NoSQL database is used for that purpose MONGODB is used which support dynamic queries that do not require predefined map reduce function. And further the process generation of a query form is an iterative process guided by user. At each step system automatically generate ranking list of form components and user adds the desired form component into query form and submit queries to view query result. Two traditional measures to evaluate the quality of query result i.e, precision and recall is presented. Quality measures can be derived using overall performance measure as Fscore DOI: 10.17762/ijritcc2321-8169.15070

    Overview of Indexes Used in NOSQL Databases of MongoDB Architecture

    Get PDF
    The present day transactions result into petabytes of data collected and the credit almost goes to the booming ICT industry. The data received is able to detect the hidden patterns for the enterprises and research industry and help them to improve their traditional methods. However the data is unstructured and requires new innovative technologies to be implemented in the architecture handling big data. In the present paper, we have tried to explore the NOSQL database handling techniques and specifically the indexes that help to reduce the time complexity to handle the unstructured data. This paper is divided into four sections where the first section compares the DBMS and DSMS followed by the literature review on this technique and introduction to the MongoDB architecture and overview of NOSQL databases. The fourth section handles the types of the databases, it’s index types. Fifth section describes the performance comparison of the various MongoDB with RDBMS

    Translation of Heterogeneous Databases into RDF, and Application to the Construction of a SKOS Taxonomical Reference

    Get PDF
    International audienceWhile the data deluge accelerates, most of the data produced remains locked in deep Web databases. For the linked open data to benefit from the potential represented by this huge amount of data, it is crucial to come up with solutions to expose heterogeneous databases as linked data. The xR2RML mapping language is an endeavor towards this goal: it is designed to map various types of databases to RDF, by flexibly adapting to heterogeneous query languages and data models while remaining free from any specific language. It extends R2RML, the W3C recommendation for the mapping of relational databases to RDF, and relies on RML for the handling of various data formats. In this paper we present xR2RML, we analyse data models of several modern databases as well as the format in which query results are returned , and we show how xR2RML translates any result data element into RDF, relying on existing languages such as XPath and JSONPath when necessary. We illustrate some features of xR2RML such as the generation of RDF collections and containers, and the ability to deal with mixed data formats. We also describe a real-world use case in which we applied xR2RML to build a SKOS thesaurus aimed at supporting studies on History of Zoology, Archaeozoology and Conservation Biology

    Survey of time series database technology

    Get PDF
    This report has been prepared by Epimorphics Ltd. as part of the ENTRAIN project (NERC grant number NE/S016244/1) which is a feasibility project within the “NERC Constructing a Digital Environment Strategic Priorities Fund Programme”. The Centre for Ecology and Hydrology(CEH) is a research organisation focusing on land and freshwater ecosystems and their interaction with the atmosphere. The organization manages a number of sensor networks to monitor the environment, and also handles large databases of 3rd party data (e.g. river flows measured by the Environment Agency and equivalents in Scotland and Wales). Data from these networks is stored and made available to users, both internally (through direct query of databases, and externally via web-services). The ENTRAIN project aims to address a number of issues in relation to sensor data storage and integration, using a number of hydrological datasets to help define use cases: COSMOS-UK (a network of ~50 sites measuring soil moisture and meteorological variables at 1-30 minute resolutions); the CEH Greenhouse Gas (GHG) network (~15 sites measuring sub-second fluxes of gases and moisture, subsequently processed up to 30-minute aggregations); the Thames Initiative (a database of weekly and hourly water quality samples from sites around the Thames basin). In addition this report considers the UK National River Flow Archive, a database of daily river flows and catchment rainfall derived by regional environmental agencies from 15-minute measurements of river levels and flows. CEH commissioned this report to survey alternative technologies for storing sensor data that scale better, could manage larger data volumes more easily and less expensively, and that might be readily deployed on different infrastructures

    Database management system performance comparisons: A systematic literature review

    Full text link
    Efficiency has been a pivotal aspect of the software industry since its inception, as a system that serves the end-user fast, and the service provider cost-efficiently benefits all parties. A database management system (DBMS) is an integral part of effectively all software systems, and therefore it is logical that different studies have compared the performance of different DBMSs in hopes of finding the most efficient one. This study systematically synthesizes the results and approaches of studies that compare DBMS performance and provides recommendations for industry and research. The results show that performance is usually tested in a way that does not reflect real-world use cases, and that tests are typically reported in insufficient detail for replication or for drawing conclusions from the stated results.Comment: 36 page

    QueRIE: Collaborative Database Exploration

    Get PDF
    Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach

    Incorporating Census Data into a Geospatial Student Database

    Get PDF
    The University of New Mexico(UNM) stores data on students, faculty, and staff at the University. The data is used to generate reports and fill surveys for several local, statewide and nationwide reporting entities. The reports convey statistical and analytical information such as the graduation rates, retention, performance, ethnicity, age, and gender of students. Furthermore, the Institute of Design and Innovation (IDI), and the Office of Institutional Analytics (OIA) at UNM use the data provided for various predictive studies aimed at improving student outcomes. This thesis proposes geospatial data as an additional layer of information for the data repository. The paper runs through the general steps involved in setting up a geospatial database using PostgreSQL and geospatial extensions including PostGIS, Tiger Geocoder, and Address Standardizer. With geospatial functionality incorporated into the data repository, the university can know how far students live, which amenities are in proximity to students, and other geospatial features which describe students’ journeys through college. To demonstrate how the university could exploit geospatial functionality a dataset of UNM students is spatially joined to socioeconomic data from the United States’ Census Bureau. Various student related geospatial queries are shown, as well as, how to set up a geospatial database
    • …
    corecore