281 research outputs found

    Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams

    Full text link
    Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as an important real-time computing paradigm for analyzing continuous data streams. However, existing work on CEP is largely limited to relational query processing, exposing two distinctive gaps for query specification and execution: (1) infusing the relational query model with higher level knowledge semantics, and (2) seamless query evaluation across temporal spaces that span past, present and future events. These allow accessible analytics over data streams having properties from different disciplines, and help span the velocity (real-time) and volume (persistent) dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. We translate this query model to efficient query execution over online and offline data streams, proposing several optimizations to mitigate the overheads introduced by evaluating semantic predicates and in accessing high-volume historic data streams. The proposed X-CEP query model and execution approaches are implemented in our prototype semantic CEP engine, SCEPter. We validate our query model using domain-aware CEP queries from a real-world Smart Power Grid application, and experimentally analyze the benefits of our optimizations for executing these queries, using event streams from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems, October 27, 201

    Processing Big Data Using Secure HDFS

    Get PDF
    The main objective of this project was to collect the data and provide a solution to the problems faced by a huge organization, which holds the data of many diverse fields. The challenge here was to understand Hadoop and its key features for successful implementation of a Hadoop platform. Users and clients evaluate or analyze the functioning and progress of it. By applying DAIMC methodology, which supports a rapid, iterative development style and better result driven. The team focused on the decision driven as well as data driven. The team also concentrated on the necessities of the decisions to be made, rather than enclosing all existing data. While following this, organization totally relied on agile development and business opportunity management for a successful implementation

    On Big Data and Hydroinformatics:12th International Conference on Hydroinformatics (HIC 2016) - Smart Water for the Future

    Get PDF
    AbstractBig data is an increasingly hot concept in the past five years in the area of computer science, e-commence, and bioinformatics, because more and more data has been collected by the internet, remote sensing network, wearable devices and the Internet of Things. The big data technology provides techniques and analytical tools to handle large datasets, so that creative ideas and new values can be extracted from them. However, the hydroinformatics research community are not so familiar with big data. This paper provides readers who are embracing the data-rich era with a timely review on big data and its relevant technology, and then points out the relevance with hydroinformatics in three aspects

    Transportation data InTegration and ANalytic

    Get PDF
    State transportation agencies regularly collect and store various types of data for different uses such as planning, traffic operations, design, and construction. These large datasets contain treasure troves of information that could be fused and mined, but the size and complexity of data mining require the use of advanced tools such as big data analytics, machine learning, and cluster computing. TITAN (Transportation data InTegration and ANalytics) is an initial prototype of an interactive web-based platform that demonstrates the possibilities of such big data software. The current study succeeded in showing a user-friendly front end, graphical in nature, and a scalable back end capable of integrating multiple big databases with minimal latencies. This thesis documents how the key components of TITAN were designed. Several applications, including mobility, safety, transit performance, and predictive crash analytics, are used to explore the strengths and limitations of the platform. A comparative analysis of the current TITAN platform with traditional database systems such as Oracle and Tableau is also conducted to explain who needs to use the platform and when to use which platform. As TITAN was shown to be feasible and efficient, the future research direction should aim to add more types of data and deploy TITAN in various data-driven decision-making processes.Includes bibliographical reference

    Attribute Based Access Control for Big Data Applications by Query Modification

    Get PDF
    We present concepts which can be used for the efficient implementation of Attribute Based Access Control (ABAC) in large applications using maybe several data storage technologies, including Hadoop, NoSQL and relational database systems. The ABAC authorization process takes place in two main stages. Firstly a sequence of permissions is derived which specifies permitted data to be retrieved for the user's transaction. Secondly, query modification is used to augment the user's transaction with code which implements the ABAC controls. This requires the storage technologies to support a high-level language such as SQL or similar. The modified user transactions are then optimized and processed using the full functionality of the underlying storage systems. We use an extended ABAC model (TCM2) which handles negative permissions and overrides in a single permissions processing mechanism. We illustrate these concepts using a compelling electronic health records scenario

    Evaluation of Big Data Platforms for Industrial Process Data

    Get PDF
    When the number of IoT devices, as well as human activities on the Internet, has increased fast in recent years, data generated has also witnessed an exponential growth in volume. Therefore, various frameworks and software such as Cassandra, Hive, and Spark have been developed to store and explore this massive amount of data. In particular, the waves of Big Data have also reached the industrial businesses. As the number of sensors installed in machines and mills significantly increases, log data is generated from these devices in higher frequencies and enormously complex calculations are applied to this data. The thesis is aimed at evaluating how effectively the current Big Data frameworks and tools manipulate industrial Big Data, especially process data. After surveying several techniques and potential frameworks and tools, the thesis focuses on building a prototype of a data pipeline. The prototype must satisfy a set of use cases. The data pipeline contains several components including Spark, Impala, and Sqoop. Also, it uses Parquet as the file format and stores the Parquet files in S3. Several experiments were also conducted in AWS, to validate the requirements in the use cases. The workload used for these tests was around 690 GBs of Parquet files. This amount of data includes one million channels, divided into one thousand groups, and the data sampling rate was one data point per second. The results of the experiments show that the performance of current big data frameworks may fulfill the performance requirements and the features in the use cases and industrial businesses in general

    Web Archive Services Framework for Tighter Integration Between the Past and Present Web

    Get PDF
    Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users\u27 needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build ArcSys, a new service framework that extracts, preserves, and exposes APIs for the web archive corpus. The dissertation introduces a novel categorization technique to divide the archived corpus into four levels. For each level, we will propose suitable services and APIs that enable both users and third-party developers to build new interfaces. The first level is the content level that extracts the content from the archived web data. We develop ArcContent to expose the web archive content processed through various filters. The second level is the metadata level; we extract the metadata from the archived web data and make it available to users. We implement two services, ArcLink for temporal web graph and ArcThumb for optimizing the thumbnail creation in the web archives. The third level is the URI level that focuses on using the URI HTTP redirection status to enhance the user query. Finally, the highest level in the web archiving service framework pyramid is the archive level. In this level, we define the web archive by the characteristics of its corpus and building Web Archive Profiles. The profiles are used by the Memento Aggregator for query optimization
    corecore