Skip to main content
Article thumbnail
Location of Repository

Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets

By Xingquan Zhu, Xindong Wu and Ying Yang

Abstract

Given a noisy dataset, how to locate erroneous instances and attributes and rank suspicious instances based on their impacts on the system performance is an interesting and important research issue. We provide in this paper an Error Detection and Impact-sensitive instance Ranking (EDIR) mechanism to address this problem. Given a noisy dataset D, we first train a benchmark classifier T from D. The instances, that cannot be effectively classified by T are treated as suspicious and forwarded to a subset S. For each attribute A i, we switch A i and the class label C to train a classifier AP i for A i. Given an instance I k in S, we use AP i and the benchmark classifier T to locate the erroneous value of each attribute A i. To quantitatively rank instances in S, we define an impact measure based on the Information-gain Ratio (IR). We calculate IR i between attribute A i and C, and use IR i as the impact-sensitive weight of A i. The sum of impact-sensitive weights from all located erroneous attributes of I k indicates its total impact value. The experimental results demonstrate the effectiveness of our strategies. 1

Year: 2009
OAI identifier: oai:CiteSeerX.psu:10.1.1.134.7233
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.csse.monash.edu.au/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.