121 research outputs found

    Node Classification in Uncertain Graphs

    Full text link
    In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. If the information about link reliability is not used explicitly, the classification accuracy in the underlying network may be affected adversely. In this paper, we focus on situations that require the analysis of the uncertainty that is present in the graph structure. We study the novel problem of node classification in uncertain graphs, by treating uncertainty as a first-class citizen. We propose two techniques based on a Bayes model and automatic parameter selection, and show that the incorporation of uncertainty in the classification process as a first-class citizen is beneficial. We experimentally evaluate the proposed approach using different real data sets, and study the behavior of the algorithms under different conditions. The results demonstrate the effectiveness and efficiency of our approach

    Real-Time Data Analytics in Sensor Networks

    Full text link
    Abstract. The proliferation of Wireless Sensor Networks (WSNS) in the past decade has provided the bridge between the physical and digital worlds, enabling the monitoring and study of physical phenomena at a granularity and level of detail that was never before possible. In this study, we review the efforts of the research community with respect to two important problems in the context of WSNS: real-time collection of the sensed data, and real-time processing of these data series

    Report on the First International Workshop on Personal Data Analytics in the Internet of Things (PDA@IOT 2014)

    Get PDF
    International audienceThe 1st International Workshop on Personal Data Analytics in the Internet of Things (PDA@IOT), held in conjunction with VLDB 2014, aims at sparking research on data analytics, shifting the focus from business to consumers services. While much of the public and academic discourse about personal data has been dominated by a focus on the privacy concerns and the risks they raise to the individual, especially when they are seen as the new oil of the global economy. PDA@IOT focus on how persons could effectively exploit the data they massively create in CyberPhysicalworlds. We believe that the full potential of the IoT goes far beyond connecting “things” to the Internet: it is about using data to create new value for people. In a People-centric computing paradigm, both small scalepersonal data and large scale aggregated data should be exploited to identify unmet needs and proactively offerthem to users. PDA@IOT seeks to address current technology barriers that impede existing personal dataprocessing and analytics solutions to empower people in personal decision making.The PDA@IOT ambition is to provide a unique forum for researchers and practitioners that approach personal data from different angles, ranging from data management and processing, to data mining and human-data interaction, as well as to nourish the interdisciplinary synergies required to tackle the challenges and problems emerging in People-centric Computing

    FreSh: A Lock-Free Data Series Index

    Full text link
    We present FreSh, a lock-free data series index that exhibits good performance (while being robust). FreSh is based on Refresh, which is a generic approach we have developed for supporting lock-freedom in an efficient way on top of any localityaware data series index. We believe Refresh is of independent interest and can be used to get well-performed lock-free versions of other locality-aware blocking data structures. For developing FreSh, we first studied in depth the design decisions of current state-of-the-art data series indexes, and the principles governing their performance. This led to a theoretical framework, which enables the development and analysis of data series indexes in a modular way. The framework allowed us to apply Refresh, repeatedly, to get lock-free versions of the different phases of a family of data series indexes. Experiments with several synthetic and real datasets illustrate that FreSh achieves performance that is as good as that of the state-of-the-art blocking in-memory data series index. This shows that the helping mechanisms of FreSh are light-weight, respecting certain principles that are crucial for performance in locality-aware data structures.This paper was published in SRDS 2023.Comment: 12 pages, 18 figures, Conference: Symposium on Reliable Distributed Systems (SRDS 2023

    A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms

    Full text link
    Entity resolution (ER) is the process of identifying records that refer to the same entities within one or across multiple databases. Numerous techniques have been developed to tackle ER challenges over the years, with recent emphasis placed on machine and deep learning methods for the matching phase. However, the quality of the benchmark datasets typically used in the experimental evaluations of learning-based matching algorithms has not been examined in the literature. To cover this gap, we propose four different approaches to assessing the difficulty and appropriateness of 13 established datasets: two theoretical approaches, which involve new measures of linearity and existing measures of complexity, and two practical approaches: the difference between the best non-linear and linear matchers, as well as the difference between the best learning-based matcher and the perfect oracle. Our analysis demonstrates that most of the popular datasets pose rather easy classification tasks. As a result, they are not suitable for properly evaluating learning-based matching algorithms. To address this issue, we propose a new methodology for yielding benchmark datasets. We put it into practice by creating four new matching tasks, and we verify that these new benchmarks are more challenging and therefore more suitable for further advancements in the field
    corecore