6,762 research outputs found

    Active Learning for Text Classification

    Get PDF
    Text classification approaches are used extensively to solve real-world challenges. The success or failure of text classification systems hangs on the datasets used to train them, without a good dataset it is impossible to build a quality system. This thesis examines the applicability of active learning in text classification for the rapid and economical creation of labelled training data. Four main contributions are made in this thesis. First, we present two novel selection strategies to choose the most informative examples for manually labelling. One is an approach using an advanced aggregated confidence measurement instead of the direct output of classifiers to measure the confidence of the prediction and choose the examples with least confidence for querying. The other is a simple but effective exploration guided active learning selection strategy which uses only the notions of density and diversity, based on similarity, in its selection strategy. Second, we propose new methods of using deterministic clustering algorithms to help bootstrap the active learning process. We first illustrate the problems of using non-deterministic clustering for selecting initial training sets, showing how non-deterministic clustering methods can result in inconsistent behaviour in the active learning process. We then compare various deterministic clustering techniques and commonly used non-deterministic ones, and show that deterministic clustering algorithms are as good as non-deterministic clustering algorithms at selecting initial training examples for the active learning process. More importantly, we show that the use of deterministic approaches stabilises the active learning process. Our third direction is in the area of visualising the active learning process. We demonstrate the use of an existing visualisation technique in understanding active learning selection strategies to show that a better understanding of selection strategies can be achieved with the help of visualisation techniques. Finally, to evaluate the practicality and usefulness of active learning as a general dataset labelling methodology, it is desirable that actively labelled dataset can be reused more widely instead of being only limited to some particular classifier. We compare the reusability of popular active learning methods for text classification and identify the best classifiers to use in active learning for text classification. This thesis is concerned using active learning methods to label large unlabelled textual datasets. Our domain of interest is text classification, but most of the methods proposed are quite general and so are applicable to other domains having large collections of data with high dimensionality

    The future of technology enhanced active learning – a roadmap

    Get PDF
    The notion of active learning refers to the active involvement of learner in the learning process, capturing ideas of learning-by-doing and the fact that active participation and knowledge construction leads to deeper and more sustained learning. Interactivity, in particular learnercontent interaction, is a central aspect of technology-enhanced active learning. In this roadmap, the pedagogical background is discussed, the essential dimensions of technology-enhanced active learning systems are outlined and the factors that are expected to influence these systems currently and in the future are identified. A central aim is to address this promising field from a best practices perspective, clarifying central issues and formulating an agenda for future developments in the form of a roadmap

    Stacked Denoising Autoencoders and Transfer Learning for Immunogold Particles Detection and Recognition

    Get PDF
    In this paper we present a system for the detection of immunogold particles and a Transfer Learning (TL) framework for the recognition of these immunogold particles. Immunogold particles are part of a high-magnification method for the selective localization of biological molecules at the subcellular level only visible through Electron Microscopy. The number of immunogold particles in the cell walls allows the assessment of the differences in their compositions providing a tool to analise the quality of different plants. For its quantization one requires a laborious manual labeling (or annotation) of images containing hundreds of particles. The system that is proposed in this paper can leverage significantly the burden of this manual task. For particle detection we use a LoG filter coupled with a SDA. In order to improve the recognition, we also study the applicability of TL settings for immunogold recognition. TL reuses the learning model of a source problem on other datasets (target problems) containing particles of different sizes. The proposed system was developed to solve a particular problem on maize cells, namely to determine the composition of cell wall ingrowths in endosperm transfer cells. This novel dataset as well as the code for reproducing our experiments is made publicly available. We determined that the LoG detector alone attained more than 84\% of accuracy with the F-measure. Developing immunogold recognition with TL also provided superior performance when compared with the baseline models augmenting the accuracy rates by 10\%

    LEARNING OBJECT. DEFINITION AND CLASSIFICATION

    Full text link
    [EN] The current trend in higher education includes competencies in the curricula. This integration can be done through the competency-based learning. The competence is acquired through various learning objects to be achieved. In this paper different dimensions to define a learning object (LO) and different classifications associated to them have been proposed. An analysis and synthesis of the results obtained have been presented.AlarcĂłn Valero, F.; Alemany DĂ­az, MDM.; Boza, A.; Cuenca, L.; Gordo MonzĂł, ML.; FernĂĄndez-Diego, M.; Ruiz Font, L. (2015). LEARNING OBJECT. DEFINITION AND CLASSIFICATION. EDULEARN Proceedings (Internet). 4479-4488. http://hdl.handle.net/10251/95287S4479448

    CASP-DM: Context Aware Standard Process for Data Mining

    Get PDF
    We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

    Content-driven design and architecture of E-learning applications

    Get PDF
    E-learning applications combine content with learning technology systems to support the creation of content and its delivery to the learner. In the future, we can expect the distinction between learning content and its supporting infrastructure to become blurred. Content objects will interact with infrastructure services as independent objects. Our solution to the development of e-learning applications – content-driven design and architecture – is based on content-centric ontological modelling and development of architectures. Knowledge and modelling will play an important role in the development of content and architectures. Our approach integrates content with interaction (in technical and educational terms) and services (the principle organization for a system architecture), based on techniques from different fields, including software engineering, learning design, and knowledge engineering

    EGAL: Exploration Guided Active Learning for TCBR

    Get PDF
    The task of building labelled case bases can be approached using active learning (AL), a process which facilitates the labelling of large collections of examples with minimal manual labelling effort. The main challenge in designing AL systems is the development of a selection strategy to choose the most informative examples to manually label. Typical selection strategies use exploitation techniques which attempt to refine uncertain areas of the decision space based on the output of a classifier. Other approaches tend to balance exploitation with exploration, selecting examples from dense and interesting regions of the domain space. In this paper we present a simple but effective exploration only selection strategy for AL in the textual domain. Our approach is inherently case-based, using only nearest-neighbour-based density and diversity measures. We show how its performance is comparable to the more computationally expensive exploitation-based approaches and that it offers the opportunity to be classifier independent

    A MODEL OF ANALYSIS OF THE E-LEARNING SYSTEM QUALITY

    Get PDF
    The wide proliferation of the e-learning formation system become a true fact. The infrastructure delivered by the internet network permitted the decreases of the exploitation costs in favor of the beneficiary of formations. Like in the case of the classic system of formation a question about how to measure the its quality is raised. It is very specialized environment with the same actors like in the classic system too but with different type of interactions. Can we say that in both system the results are similar and their quality is the same. This paper reveal a model that can be used when try to evaluate the quality of this kind of system using specialized indicators for every aspect that can be measured.e-learning system, quality, indicators, beneficiary of formation elearning activities, e-learning materials

    Semantic modelling of learning objects and instruction

    Get PDF
    We introduce an ontology-based semantic modelling framework that addresses subject domain modelling, instruction modelling, and interoperability aspects in the development of complex reusable learning objects. Ontologies are knowledge representation frameworks, ideally suited to support knowledge-based modelling of these learning objects. We illustrate the benefits of semantic modelling for learning object assemblies within the context of standards such as SCORM Sequencing and Navigation and Learning Object Metadata
    • …
    corecore