121 research outputs found

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    Smart Query Answering for Marine Sensor Data

    Get PDF
    We review existing query answering systems for sensor data. We then propose an extended query answering approach termed smart query, specifically for marine sensor data. The smart query answering system integrates pattern queries and continuous queries. The proposed smart query system considers both streaming data and historical data from marine sensor networks. The smart query also uses query relaxation technique and semantics from domain knowledge as a recommender system. The proposed smart query benefits in building data and information systems for marine sensor networks

    Semantic trajectories: Mobility data computation and annotation

    Get PDF
    With the large-scale adoption of GPS equipped mobile sensing devices, positional data generated by moving objects (e.g., vehicles, people, animals) are being easily collected. Such data are typically modeled as streams of spatio-temporal (x,y,t) points, called trajectories. In recent years trajectory management research has progressed significantly towards efficient storage and indexing techniques, as well as suitable knowledge discovery. These works focused on the geometric aspect of the raw mobility data. We are now witnessing a growing demand in several application sectors (e.g., from shipment tracking to geo-social networks) on understanding the semantic behavior of moving objects. Semantic behavior refers to the use of semantic abstractions of the raw mobility data, including not only geometric patterns but also knowledge extracted jointly from the mobility data and the underlying geographic and application domains information. The core contribution of this article lies in a semantic model and a computation and annotation platform for developing a semantic approach that progressively transforms the raw mobility data into semantic trajectories enriched with segmentations and annotations. We also analyze a number of experiments we did with semantic trajectories in different domains

    Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files.

    Get PDF
    Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results

    Integration of Skyline Queries into Spark SQL

    Full text link
    Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL

    Referential integrity and dependencies between documents in a document oriented database

    Get PDF
    Reliability of foreign keys, which is natural in relationaldatabases, requires additional efforts when working withnon-relational databases, as non-relational database managementsystems generally don’t support foreign key constraints due totheir distributed nature. Referential integrity is an importantproperty whenever documents need to refer to each other, whichis the common case. This work discusses an implementationof a verification approach which makes use of the MapReduceprogramming model, in order to detect incorrect references indocument oriented databases that may be caused by errors inthe program code or incomplete transactions. Furthermore, themethod can be applied for the verification of more complex dependenciesbetween documents, such that bind aggregated valuesfrom certain sets of documents with the values of documentsreferred by them

    A Novel Trip Planner Using Effective Indexing Structure

    Get PDF
    ABSTRACT: The administration of transportation frameworks has ended up progressively imperative in numerous genuine applications such as area based administrations, production network administration, movement control, et cetera. These applications normally include questions over spatial street systems with powerfully changing and confused activity conditions. In this paper, we model such a system by a probabilistic time-dependent graph (PTGraph), whose edges are connected with unverifiable postponement capacities. We propose a valuable inquiry in the PT-Graph, in particular an Trip planner query (TPQ), which recovers excursion arranges that cross a set of inquiry focuses in PT-Graph, having the base voyaging time with high certainty. To handle the proficiency issue, we display the pruning systems time interim pruning and probabilistic pruning to viably discount bogus alerts of trek arrangements. Besides, we outline a pre computation method in view of the expense model and develop a list structure over the pre computed information to empower the pruning by means of the file. We coordinate our proposed pruning techniques into a productive question system to answer TPQs. Through far reaching tests, we exhibit the proficiency and adequacy of our TPQ question noting methodology
    • 

    corecore