8 research outputs found

    Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review

    Get PDF
    Active learning strategies are often deployed in technology-assisted review tasks, such as e-discovery and sensitivity review, to learn a classifier that can assist the reviewers with their task. In particular, an active learning strategy selects the documents that are expected to be the most useful for learning an effective classifier, so that these documents can be reviewed before the less useful ones. However, when reviewing for sensitivity, the order in which the documents are reviewed can impact on the reviewers' ability to perform the review. Therefore, when deploying active learning in technology-assisted sensitivity review, we want to know when a sufficiently effective classifier has been learned, such that the active learning can stop and the reviewing order of the documents can be selected by the reviewer instead of the classifier. In this work, we propose two active learning stopping strategies for technology-assisted sensitivity review. We evaluate the effectiveness of our proposed approaches in comparison with three state-of-the-art stopping strategies from the literature. We show that our best performing approach results in a significantly more effective sensitivity classifier (+6.6% F2) than the best performing stopping strategy from the literature (McNemar's test, p<0.05)

    Probabilistic active learning : an online framework for structural health monitoring

    Get PDF
    A novel, probabilistic framework for the classification, investigation and labelling of data is suggested as an online strategy for Structural Health Monitoring (SHM). A critical issue for data-based SHM is a lack of descriptive labels (for measured data), which correspond to the condition of the monitored system. For many applications, these labels are costly and/or impractical to obtain, and as a result, conventional supervised learning is not feasible. This fact forces a dependence on outlier analysis, or one-class classifiers, in practical applications, as a means of damage detection. The model suggested in this work, however, allows for the definition of a multi-class classifier, to aid both damage detection and identification, while using a limited number of the most informative labelled data. The algorithm is applied to three datasets in the online setting; the Z24 bridge data, a machining (acoustic emission) dataset, and measurements from ground vibration aircraft tests. In the experiments, active learning is shown to improve the online classification performance for damage detection and classification

    The impact of linkage methods in hierarchical clustering for active learning to rank

    No full text
    Document ranking is a central problem in many areas, including information retrieval and recommendation. .The goal of learning to rank is to automatically create ranking models from training data. .The performance of ranking models is strongly a.ected by the quality and quantity of training data. Collecting large scale training samples with relevance labels involves human labor which is timeconsuming and expensive. Selective sampling and active learning techniques have been developed and proven effective in addressing this problem. However, most active methods do not scale well and need to rebuild the model a.er selected samples are added to the previous training set. We propose a sampling method which selects a set of instances and labels the full set only once before training the ranking model. Our method is based on hierarchical agglomerative clustering (average linkage) and we also report the performance of other linkage criteria that measure the distance between two clusters of query-document pairs. Another di.erence from previous hierarchical clustering is that we cluster the instances belonging to the same query, which usually outperforms the baselines
    corecore