3,445 research outputs found

    iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams

    Full text link
    Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift

    Semantics-Empowered Big Data Processing with Applications

    Get PDF
    We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the Five Vs of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize relevant new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive value for supporting practical applications transcending physical-cyber-social continuum

    Scalability Benchmarking of Cloud-Native Applications Applied to Event-Driven Microservices

    Get PDF
    Cloud-native applications constitute a recent trend for designing large-scale software systems. This thesis introduces the Theodolite benchmarking method, allowing researchers and practitioners to conduct empirical scalability evaluations of cloud-native applications, their frameworks, configurations, and deployments. The benchmarking method is applied to event-driven microservices, a specific type of cloud-native applications that employ distributed stream processing frameworks to scale with massive data volumes. Extensive experimental evaluations benchmark and compare the scalability of various stream processing frameworks under different configurations and deployments, including different public and private cloud environments. These experiments show that the presented benchmarking method provides statistically sound results in an adequate amount of time. In addition, three case studies demonstrate that the Theodolite benchmarking method can be applied to a wide range of applications beyond stream processing

    Real-Time Big Data Analytics in Smart Cities from LoRa-Based IoT Networks

    Get PDF
    The currently burst of the Internet of Things (IoT) tech-nologies implies the emergence of new lines of investigation regarding not only to hardware and protocols but also to new methods of pro-duced data analysis satisfying the IoT environment constraints: a real-time and a big data approach. The Real-time restriction is about the continuous generation of data provided by the endpoints connected to an IoT network; due to the connection and scaling capabilities of an IoT network, the amount of data to process is so high that Big data tech-niques become essential. In this article, we present a system consisting of two main modules. In one hand, the infrastructure, a complete LoRa based network designed, tested and deployment in the Pablo de Olavide University and, on the other side, the analytics, a big data streaming sys-tem that processes the inputs produced by the network to obtain useful, valid and hidden information.Ministerio de Economía y Competitividad TIN2017-88209-C2-1-
    corecore