92,031 research outputs found

    Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

    Full text link
    This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

    A survey of cost-sensitive decision tree induction algorithms

    Get PDF
    The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

    HiTrust: building cross-organizational trust relationship based on a hybrid negotiation tree

    Get PDF
    Small-world phenomena have been observed in existing peer-to-peer (P2P) networks which has proved useful in the design of P2P file-sharing systems. Most studies of constructing small world behaviours on P2P are based on the concept of clustering peer nodes into groups, communities, or clusters. However, managing additional multilayer topology increases maintenance overhead, especially in highly dynamic environments. In this paper, we present Social-like P2P systems (Social-P2Ps) for object discovery by self-managing P2P topology with human tactics in social networks. In Social-P2Ps, queries are routed intelligently even with limited cached knowledge and node connections. Unlike community-based P2P file-sharing systems, we do not intend to create and maintain peer groups or communities consciously. In contrast, each node connects to other peer nodes with the same interests spontaneously by the result of daily searches

    Low Size-Complexity Inductive Logic Programming: The East-West Challenge Considered as a Problem in Cost-Sensitive Classification

    Get PDF
    The Inductive Logic Programming community has considered proof-complexity and model-complexity, but, until recently, size-complexity has received little attention. Recently a challenge was issued "to the international computing community" to discover low size-complexity Prolog programs for classifying trains. The challenge was based on a problem first proposed by Ryszard Michalski, 20 years ago. We interpreted the challenge as a problem in cost-sensitive classification and we applied a recently developed cost-sensitive classifier to the competition. Our algorithm was relatively successful (we won a prize). This paper presents our algorithm and analyzes the results of the competition

    CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

    Full text link
    Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201

    StackInsights: Cognitive Learning for Hybrid Cloud Readiness

    Full text link
    Hybrid cloud is an integrated cloud computing environment utilizing a mix of public cloud, private cloud, and on-premise traditional IT infrastructures. Workload awareness, defined as a detailed full range understanding of each individual workload, is essential in implementing the hybrid cloud. While it is critical to perform an accurate analysis to determine which workloads are appropriate for on-premise deployment versus which workloads can be migrated to a cloud off-premise, the assessment is mainly performed by rule or policy based approaches. In this paper, we introduce StackInsights, a novel cognitive system to automatically analyze and predict the cloud readiness of workloads for an enterprise. Our system harnesses the critical metrics across the entire stack: 1) infrastructure metrics, 2) data relevance metrics, and 3) application taxonomy, to identify workloads that have characteristics of a) low sensitivity with respect to business security, criticality and compliance, and b) low response time requirements and access patterns. Since the capture of the data relevance metrics involves an intrusive and in-depth scanning of the content of storage objects, a machine learning model is applied to perform the business relevance classification by learning from the meta level metrics harnessed across stack. In contrast to traditional methods, StackInsights significantly reduces the total time for hybrid cloud readiness assessment by orders of magnitude
    • …
    corecore