59 research outputs found

    Learning Active Learning from Data

    Get PDF
    In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from previous AL outcomes. We show that a strategy can be learnt either from simple synthetic 2D datasets or from a subset of domain-specific data. Our method yields strategies that work well on real data from a wide range of domains

    An Efficient Learning of Constraints For Semi-Supervised Clustering using Neighbour Clustering Algorithm

    Get PDF
    Data mining is the process of finding the previously unknown and potentially interesting patterns and relation in database. Data mining is the step in the knowledge discovery in database process (KDD) .The structures that are the outcome of the data mining process must meet certain condition so that these can be considered as knowledge. These conditions are validity, understandability, utility, novelty, interestingness. Researcher identifies two fundamental goals of data mining: prediction and description. The proposed research work suggests the semi-supervised clustering problem where to know (with varying degree of certainty) that some sample pairs are (or are not) in the same class. A probabilistic model for semi-supervised clustering based on Shared Semi-supervised Neighbor clustering (SSNC) that provides a principled framework for incorporating supervision into prototype-based clustering. Semi-supervised clustering that combines the constraint-based and fitness-based approaches in a unified model. The proposed method first divides the Constraint-sensitive assignment of instances to clusters, where points are assigned to clusters so that the overall distortion of the points from the cluster centroids is minimized, while a minimum number of must-link and cannot-link constraints are violated. Experimental results across UCL Machine learning semi-supervised dataset results show that the proposed method has higher F-Measures than many existing Semi-Supervised Clustering methods

    Event-based clustering for reducing labeling costs of event-related microposts

    Full text link
    Automatically identifying the event type of event-related information in the sheer amount of social media data makes machine learning inevitable. However, this is highly dependent on (1) the number of correctly labeled instances and (2) labeling costs. Active learning has been proposed to reduce the number of instances to label. Albeit the thematic dimension is already used, other metadata such as spatial and temporal information that is helpful for achieving a more fine-grained clustering is currently not taken into account. In this paper, we present a novel event-based clustering strategy that makes use of temporal, spatial, and thematic metadata to determine instances to label. An evaluation on incident-related tweets shows that our selection strategy for active learning outperforms current state-of-the-art approaches even with few labeled instances
    • …
    corecore