1,258 research outputs found
A framework for distributed managing uncertain data in RFID traceability networks
The ability to track and trace individual items, especially through large-scale and distributed networks, is the key to realizing many important business applications such as supply chain management, asset tracking, and counterfeit detection. Networked RFID (radio frequency identification), which uses the Internet to connect otherwise isolated RFID systems and software, is an emerging technology to support traceability applications. Despite its promising benefits, there remains many challenges to be overcome before these benefits can be realized. One significant challenge centers around dealing with uncertainty of raw RFID data. In this paper, we propose a novel framework to effectively manage the uncertainty of RFID data in large scale traceability networks. The framework consists of a global object tracking model and a local RFID data cleaning model. In particular, we propose a Markov-based model for tracking objects globally and a particle filter based approach for processing noisy, low-level RFID data locally. Our implementation validates the proposed approach and the experimental results show its effectiveness.Jiangang Ma, Quan Z. Sheng, Damith Ranasinghe, Jen Min Chuah and Yanbo W
A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams
Active learning (AL) is a promising way to efficiently
building up training sets with minimal supervision. A learner
deliberately queries specific instances to tune the classifier’s
model using as few labels as possible. The challenge for streaming
is that the data distribution may evolve over time and therefore
the model must adapt. Another challenge is the sampling bias
where the sampled training set does not reflect the underlying
data distribution. In presence of concept drift, sampling bias is
more likely to occur as the training set needs to represent the
whole evolving data. To tackle these challenges, we propose a
novel bi-criteria AL approach (BAL) that relies on two selection
criteria, namely
label uncertainty criterion
and
density-based cri-
terion
. While the first criterion selects instances that are the most
uncertain in terms of class membership, the latter dynamically
curbs the sampling bias by weighting the samples to reflect on the
true underlying distribution. To design and implement these two
criteria for learning from streams, BAL adopts a Bayesian online
learning approach and combines online classification and online
clustering through the use of
online logistic regression
and
online
growing Gaussian mixture models
respectively. Empirical results
obtained on standard synthetic and real-world benchmarks show
the high performance of the proposed BAL method compared to
the state-of-the-art AL method
Recommended from our members
Maritime data integration and analysis: Recent progress and research challenges
The correlated exploitation of heterogeneous data sources offering very large historical as well as streaming data is important to increasing the accuracy of computations when analysing and predicting future states of moving entities. This is particularly critical in the maritime domain, where online tracking, early recognition of events, and real-time forecast of anticipated trajectories of vessels are crucial to safety and operations at sea. The objective of this paper is to review current research challenges and trends tied to the integration, management, analysis, and visualization of objects moving at sea as well as a few suggestions for a successful development of maritime forecasting and decision-support systems
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
- …