946,595 research outputs found
Deep Active Learning for Classifying Cancer Pathology Reports
Background: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model.
Results: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes.
Conclusions: Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling
Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation
Image segmentation is a fundamental problem in biomedical image analysis.
Recent advances in deep learning have achieved promising results on many
biomedical image segmentation benchmarks. However, due to large variations in
biomedical images (different modalities, image settings, objects, noise, etc),
to utilize deep learning on a new application, it usually needs a new set of
training data. This can incur a great deal of annotation effort and cost,
because only biomedical experts can annotate effectively, and often there are
too many instances in images (e.g., cells) to annotate. In this paper, we aim
to address the following question: With limited effort (e.g., time) for
annotation, what instances should be annotated in order to attain the best
performance? We present a deep active learning framework that combines fully
convolutional network (FCN) and active learning to significantly reduce
annotation effort by making judicious suggestions on the most effective
annotation areas. We utilize uncertainty and similarity information provided by
FCN and formulate a generalized version of the maximum set cover problem to
determine the most representative and uncertain areas for annotation. Extensive
experiments using the 2015 MICCAI Gland Challenge dataset and a lymph node
ultrasound image segmentation dataset show that, using annotation suggestions
by our method, state-of-the-art segmentation performance can be achieved by
using only 50% of training data.Comment: Accepted at MICCAI 201
Putting theory into practice: The creation of REALs in the context of today's universities
Rich Environments for Active Learning (REALs), as described by R. Scott Grabinger and Joanna Dunlap, are comprehensive educational systems based on constructivist principles that present an intellectual and practical challenge to university lecturers. As teachers and researchers, academics are concerned with improving the learning potential of teaching strategies and, to this end, the theory of the REAL provides inspiration and ideas based on sound theoretical principles. Yet in the context of the current pressured climate, having the time and resources to put such an extensive, theory into practice can seem little more than a pipeâdream. It is argued that using a computerâbased application such as the Hypermedia Learning Tutorials (HLTs) as the heart of a REAL allows lecturers to take positive steps towards the creation of comprehensive, flexible, integrated learning environments. The concept of the HLT is discussed and a practical application in the field of advanced secondâlanguage acquisition is described. Based on conceptual analysis and the results of preliminary student evaluation, it is argued that the HLT encompasses both in theory and in practice the chief qualities of REALs and can form the basis for their creation in a wide variety of disciplines
Engaging and empowering first-year students through curriculum design: perspectives from the literature
There is an increasing value being placed on engaging and empowering first-year students and first-year curriculum design is a key driver and opportunity to ensure early enculturation into successful learning at university. This paper summarises the literature on first-year curriculum design linked to student engagement and empowerment. We present conceptualisations of âcurriculumâ and examples from first-year curriculum design. We also note the limited literature where students have been involved in designing first-year curricula. The results of the literature review suggest that key characteristics of engaging first-year curricula include active learning, timely feedback, relevance and challenge. The literature also points to the importance of identifying students' abilities on entry to university as well as being clear about desired graduate attributes and developmental goals. Acknowledging realities and constraints, we present a framework for the first-year curriculum design process based on the literature
Low-Cost Learning via Active Data Procurement
We design mechanisms for online procurement of data held by strategic agents
for machine learning tasks. The challenge is to use past data to actively price
future data and give learning guarantees even when an agent's cost for
revealing her data may depend arbitrarily on the data itself. We achieve this
goal by showing how to convert a large class of no-regret algorithms into
online posted-price and learning mechanisms. Our results in a sense parallel
classic sample complexity guarantees, but with the key resource being money
rather than quantity of data: With a budget constraint , we give robust risk
(predictive error) bounds on the order of . Because we use an
active approach, we can often guarantee to do significantly better by
leveraging correlations between costs and data.
Our algorithms and analysis go through a model of no-regret learning with
arriving pairs (cost, data) and a budget constraint of . Our regret bounds
for this model are on the order of and we give lower bounds on the
same order.Comment: Full version of EC 2015 paper. Color recommended for figures but
nonessential. 36 pages, of which 12 appendi
- âŠ