121 research outputs found
Node Classification in Uncertain Graphs
In many real applications that use and analyze networked data, the links in
the network graph may be erroneous, or derived from probabilistic techniques.
In such cases, the node classification problem can be challenging, since the
unreliability of the links may affect the final results of the classification
process. If the information about link reliability is not used explicitly, the
classification accuracy in the underlying network may be affected adversely. In
this paper, we focus on situations that require the analysis of the uncertainty
that is present in the graph structure. We study the novel problem of node
classification in uncertain graphs, by treating uncertainty as a first-class
citizen. We propose two techniques based on a Bayes model and automatic
parameter selection, and show that the incorporation of uncertainty in the
classification process as a first-class citizen is beneficial. We
experimentally evaluate the proposed approach using different real data sets,
and study the behavior of the algorithms under different conditions. The
results demonstrate the effectiveness and efficiency of our approach
Real-Time Data Analytics in Sensor Networks
Abstract. The proliferation of Wireless Sensor Networks (WSNS) in the past decade has provided the bridge between the physical and digital worlds, enabling the monitoring and study of physical phenomena at a granularity and level of detail that was never before possible. In this study, we review the efforts of the research community with respect to two important problems in the context of WSNS: real-time collection of the sensed data, and real-time processing of these data series
Report on the First International Workshop on Personal Data Analytics in the Internet of Things (PDA@IOT 2014)
International audienceThe 1st International Workshop on Personal Data Analytics in the Internet of Things (PDA@IOT), held in conjunction with VLDB 2014, aims at sparking research on data analytics, shifting the focus from business to consumers services. While much of the public and academic discourse about personal data has been dominated by a focus on the privacy concerns and the risks they raise to the individual, especially when they are seen as the new oil of the global economy. PDA@IOT focus on how persons could effectively exploit the data they massively create in CyberPhysicalworlds. We believe that the full potential of the IoT goes far beyond connecting “things” to the Internet: it is about using data to create new value for people. In a People-centric computing paradigm, both small scalepersonal data and large scale aggregated data should be exploited to identify unmet needs and proactively offerthem to users. PDA@IOT seeks to address current technology barriers that impede existing personal dataprocessing and analytics solutions to empower people in personal decision making.The PDA@IOT ambition is to provide a unique forum for researchers and practitioners that approach personal data from different angles, ranging from data management and processing, to data mining and human-data interaction, as well as to nourish the interdisciplinary synergies required to tackle the challenges and problems emerging in People-centric Computing
FreSh: A Lock-Free Data Series Index
We present FreSh, a lock-free data series index that exhibits good
performance (while being robust). FreSh is based on Refresh, which is a generic
approach we have developed for supporting lock-freedom in an efficient way on
top of any localityaware data series index. We believe Refresh is of
independent interest and can be used to get well-performed lock-free versions
of other locality-aware blocking data structures. For developing FreSh, we
first studied in depth the design decisions of current state-of-the-art data
series indexes, and the principles governing their performance. This led to a
theoretical framework, which enables the development and analysis of data
series indexes in a modular way. The framework allowed us to apply Refresh,
repeatedly, to get lock-free versions of the different phases of a family of
data series indexes. Experiments with several synthetic and real datasets
illustrate that FreSh achieves performance that is as good as that of the
state-of-the-art blocking in-memory data series index. This shows that the
helping mechanisms of FreSh are light-weight, respecting certain principles
that are crucial for performance in locality-aware data structures.This paper
was published in SRDS 2023.Comment: 12 pages, 18 figures, Conference: Symposium on Reliable Distributed
Systems (SRDS 2023
A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms
Entity resolution (ER) is the process of identifying records that refer to
the same entities within one or across multiple databases. Numerous techniques
have been developed to tackle ER challenges over the years, with recent
emphasis placed on machine and deep learning methods for the matching phase.
However, the quality of the benchmark datasets typically used in the
experimental evaluations of learning-based matching algorithms has not been
examined in the literature. To cover this gap, we propose four different
approaches to assessing the difficulty and appropriateness of 13 established
datasets: two theoretical approaches, which involve new measures of linearity
and existing measures of complexity, and two practical approaches: the
difference between the best non-linear and linear matchers, as well as the
difference between the best learning-based matcher and the perfect oracle. Our
analysis demonstrates that most of the popular datasets pose rather easy
classification tasks. As a result, they are not suitable for properly
evaluating learning-based matching algorithms. To address this issue, we
propose a new methodology for yielding benchmark datasets. We put it into
practice by creating four new matching tasks, and we verify that these new
benchmarks are more challenging and therefore more suitable for further
advancements in the field
- …