2 research outputs found

    Classifying Imbalanced Data Sets by a Novel RE-Sample and Cost-Sensitive Stacked Generalization Method

    Get PDF
    Learning with imbalanced data sets is considered as one of the key topics in machine learning community. Stacking ensemble is an efficient algorithm for normal balance data sets. However, stacking ensemble was seldom applied in imbalance data. In this paper, we proposed a novel RE-sample and Cost-Sensitive Stacked Generalization (RECSG) method based on 2-layer learning models. The first step is Level 0 model generalization including data preprocessing and base model training. The second step is Level 1 model generalization involving cost-sensitive classifier and logistic regression algorithm. In the learning phase, preprocessing techniques can be embedded in imbalance data learning methods. In the cost-sensitive algorithm, cost matrix is combined with both data characters and algorithms. In the RECSG method, ensemble algorithm is combined with imbalance data techniques. According to the experiment results obtained with 17 public imbalanced data sets, as indicated by various evaluation metrics (AUC, GeoMean, and AGeoMean), the proposed method showed the better classification performances than other ensemble and single algorithms. The proposed method is especially more efficient when the performance of base classifier is low. All these demonstrated that the proposed method could be applied in the class imbalance problem

    Cost-sensitive stacking for audio tag annotation and retrieval

    No full text
    Audio tags correspond to keywords that people use to de-scribe different aspects of a music clip, such as the genre, mood, and instrumentation. Since social tags are usually as-signed by people with different levels of musical knowledge, they inevitably contain noisy information. By treating the tag counts as costs, we can model the audio tagging prob-lem as a cost-sensitive classification problem. In addition, tag correlation is another useful information for automatic audio tagging since some tags often co-occur. By considering the co-occurrences of tags, we can model the audio tagging prob-lem as a multi-label classification problem. To exploit the tag count and correlation information jointly, we formulate the audio tagging task as a novel cost-sensitive multi-label (CSML) learning problem. The results of audio tag annota-tion and retrieval experiments demonstrate that the new ap-proach outperforms our MIREX 2009 winning method. Index Terms β€” Audio tag annotation, audio tag retrieval, tag count, cost-sensitive learning, multi-label 1
    corecore