946,595 research outputs found

    Deep Active Learning for Classifying Cancer Pathology Reports

    Get PDF
    Background: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model. Results: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes. Conclusions: Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling

    Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation

    Full text link
    Image segmentation is a fundamental problem in biomedical image analysis. Recent advances in deep learning have achieved promising results on many biomedical image segmentation benchmarks. However, due to large variations in biomedical images (different modalities, image settings, objects, noise, etc), to utilize deep learning on a new application, it usually needs a new set of training data. This can incur a great deal of annotation effort and cost, because only biomedical experts can annotate effectively, and often there are too many instances in images (e.g., cells) to annotate. In this paper, we aim to address the following question: With limited effort (e.g., time) for annotation, what instances should be annotated in order to attain the best performance? We present a deep active learning framework that combines fully convolutional network (FCN) and active learning to significantly reduce annotation effort by making judicious suggestions on the most effective annotation areas. We utilize uncertainty and similarity information provided by FCN and formulate a generalized version of the maximum set cover problem to determine the most representative and uncertain areas for annotation. Extensive experiments using the 2015 MICCAI Gland Challenge dataset and a lymph node ultrasound image segmentation dataset show that, using annotation suggestions by our method, state-of-the-art segmentation performance can be achieved by using only 50% of training data.Comment: Accepted at MICCAI 201

    Putting theory into practice: The creation of REALs in the context of today's universities

    Get PDF
    Rich Environments for Active Learning (REALs), as described by R. Scott Grabinger and Joanna Dunlap, are comprehensive educational systems based on constructivist principles that present an intellectual and practical challenge to university lecturers. As teachers and researchers, academics are concerned with improving the learning potential of teaching strategies and, to this end, the theory of the REAL provides inspiration and ideas based on sound theoretical principles. Yet in the context of the current pressured climate, having the time and resources to put such an extensive, theory into practice can seem little more than a pipe‐dream. It is argued that using a computer‐based application such as the Hypermedia Learning Tutorials (HLTs) as the heart of a REAL allows lecturers to take positive steps towards the creation of comprehensive, flexible, integrated learning environments. The concept of the HLT is discussed and a practical application in the field of advanced second‐language acquisition is described. Based on conceptual analysis and the results of preliminary student evaluation, it is argued that the HLT encompasses both in theory and in practice the chief qualities of REALs and can form the basis for their creation in a wide variety of disciplines

    Engaging and empowering first-year students through curriculum design: perspectives from the literature

    Get PDF
    There is an increasing value being placed on engaging and empowering first-year students and first-year curriculum design is a key driver and opportunity to ensure early enculturation into successful learning at university. This paper summarises the literature on first-year curriculum design linked to student engagement and empowerment. We present conceptualisations of ‘curriculum’ and examples from first-year curriculum design. We also note the limited literature where students have been involved in designing first-year curricula. The results of the literature review suggest that key characteristics of engaging first-year curricula include active learning, timely feedback, relevance and challenge. The literature also points to the importance of identifying students' abilities on entry to university as well as being clear about desired graduate attributes and developmental goals. Acknowledging realities and constraints, we present a framework for the first-year curriculum design process based on the literature

    Low-Cost Learning via Active Data Procurement

    Full text link
    We design mechanisms for online procurement of data held by strategic agents for machine learning tasks. The challenge is to use past data to actively price future data and give learning guarantees even when an agent's cost for revealing her data may depend arbitrarily on the data itself. We achieve this goal by showing how to convert a large class of no-regret algorithms into online posted-price and learning mechanisms. Our results in a sense parallel classic sample complexity guarantees, but with the key resource being money rather than quantity of data: With a budget constraint BB, we give robust risk (predictive error) bounds on the order of 1/B1/\sqrt{B}. Because we use an active approach, we can often guarantee to do significantly better by leveraging correlations between costs and data. Our algorithms and analysis go through a model of no-regret learning with TT arriving pairs (cost, data) and a budget constraint of BB. Our regret bounds for this model are on the order of T/BT/\sqrt{B} and we give lower bounds on the same order.Comment: Full version of EC 2015 paper. Color recommended for figures but nonessential. 36 pages, of which 12 appendi
    • 

    corecore