371 research outputs found

    An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications

    Full text link
    Both humans and machines learn the meaning of unknown words through contextual information in a sentence, but not all contexts are equally helpful for learning. We introduce an effective method for capturing the level of contextual informativeness with respect to a given target word. Our study makes three main contributions. First, we develop models for estimating contextual informativeness, focusing on the instructional aspect of sentences. Our attention-based approach using pre-trained embeddings demonstrates state-of-the-art performance on our single-context dataset and an existing multi-sentence context dataset. Second, we show how our model identifies key contextual elements in a sentence that are likely to contribute most to a reader's understanding of the target word. Third, we examine how our contextual informativeness model, originally developed for vocabulary learning applications for students, can be used for developing better training curricula for word embedding models in batch learning and few-shot machine learning settings. We believe our results open new possibilities for applications that support language learning for both human and machine learner

    The Benefits of Word Embeddings Features for Active Learning in Clinical Information Extraction

    Get PDF
    This study investigates the use of unsupervised word embeddings and sequence features for sample representation in an active learning framework built to extract clinical concepts from clinical free text. The objective is to further reduce the manual annotation effort while achieving higher effectiveness compared to a set of baseline features. Unsupervised features are derived from skip-gram word embeddings and a sequence representation approach. The comparative performance of unsupervised features and baseline hand-crafted features in an active learning framework are investigated using a wide range of selection criteria including least confidence, information diversity, information density and diversity, and domain knowledge informativeness. Two clinical datasets are used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Our results demonstrate significant improvements in terms of effectiveness as well as annotation effort savings across both datasets. Using unsupervised features along with baseline features for sample representation lead to further savings of up to 9% and 10% of the token and concept annotation rates, respectively

    Neural models of language use:Studies of language comprehension and production in context

    Get PDF
    Artificial neural network models of language are mostly known and appreciated today for providing a backbone for formidable AI technologies. This thesis takes a different perspective. Through a series of studies on language comprehension and production, it investigates whether artificial neural networks—beyond being useful in countless AI applications—can serve as accurate computational simulations of human language use, and thus as a new core methodology for the language sciences

    A framework for developing requirements engineering tools for computational business intelligence

    Get PDF
    立命館大学博士(工学)doctoral thesi

    Neural models of language use:Studies of language comprehension and production in context

    Get PDF
    Artificial neural network models of language are mostly known and appreciated today for providing a backbone for formidable AI technologies. This thesis takes a different perspective. Through a series of studies on language comprehension and production, it investigates whether artificial neural networks—beyond being useful in countless AI applications—can serve as accurate computational simulations of human language use, and thus as a new core methodology for the language sciences

    Uncertainty Measures and Transfer Learning in Active Learning for Text Classification

    Get PDF
    Dyp læring har blitt et fremtredende og populært verktøy i et bredt spekter av applikasjoner som omhandler behandling av komplekse data. For å kunne trene en modell tilstrekkelig, er imidlertid dyp læring avhengig av store mengder annotert data. Selv når data i seg selv er lett tilgjengelig, kan annotering være tidkrevende, dyrt, og ofte avhengig av en ekspert. Aktiv læring (AL) tar sikte på å redusere datakravet i dyp læring, og maskinlæring generelt, og dermed redusere annoteringskostnadene. Ved å la modellen aktivt velge de dataene den ønsker å lære fra, ønsker aktiv læring å kun annotere de mest verdifulle dataene, og trene en modell med kun et lite annotert treningssett. Ideén er at modellen skal kunne identifisere informative eksempler fra en stor samling med uannotert data, hvor informativitet ofte knyttes til modellens usikkerhet. Gjennom denne oppgaven utforskes flere aspekter ved aktiv læring i tekstklassifisering, ved å kombinere idéer som har vist gode resultater individuelt. For å sikre mangfold i aktivt valgte data har to metoder for å utforske større deler av rommet blitt utforsket. Den ene blander inn noen tilfeldig valgte data i det aktive utvalget, mens den andre grupperer den store samlingen med uannortert data, og velger kun ett datapunkt i hver klynge. Videre har en bayesiansk tilnærming til modellusikkerhet blitt testet, i og med at dype modeller som regel ikke representerer modellusikkerhet. Til slutt utforskes også de ulike idéene sammen med transfer learning. Forsøkene viser tydelig hvordan aktiv læring avhenger av data og modell, da de to forskjellige modellene og datasettene viste tydelig ulike resultater. De to modellene er en CNN for setningsklassifisering, og an AWD LSTM med pre-trening, som begge er testet på et filmanmeldelse-datasett (IMDB) med to klasser, of et nyhetsartikkel-datasett (AG) med fire klasser. Selv om ingen metoder viste noen effekt på AG, forbedret alle variasjoner resultatene for IMDB med CNN. Mens grupperingsmetoden virket som det mest fordelsaktige valget for CNN, ga det kun negativ effekt med AWD LSTM. Kombinasjonen av gruppering og bayesianske tilnærminger ga ingen bedre sammenlagt effekt, selv om begge ga gode resultater individuelt. Alt i alt viste ingen metoder overdrevent bedre resultater enn tilfeldig utvalgt data, men mange av resultatene ga interessante idéer for videre arbeid.Deep learning has become a prominent and popular tool in a wide range of applications concerned with processing of complex data. However, in order to train a sufficient model for supervised tasks, deep learning relies on vast amounts of labelled data. Even when data itself is easily attainable, acquiring labels can be tedious, expensive, and in need of an expert annotator. Active learning (AL) aims to lower the data requirement in deep learning, and machine learning in general, and consequently reduce labelling cost. By letting the learner actively choose the data it wants to learn from, active learning aspires to label only the most valuable data, and to train a classifier with only a small labelled training set. The idea is that the model is able to single out examples of high informativeness from a pool of unlabelled data, i.e. instances from which the model will gain the most information, which often is linked to model uncertainty. Through this thesis, several aspects of pool-based active learning in text classification are explored, by combining ideas that have shown good results individually. To ensure diverse actively queried samples, both adding randomness to the active selection, and clustering of the unlabelled pool have been investigated. Further, seeing that deep models rarely represent models uncertainty, a Bayesian approximation is computed by sampling sub-models by applying dropout at test time, and averaging over their predictions. Lastly, active learning is studied in a transfer learning setting, combined with the previously explored ideas. The experiments clearly show how active learning depends on data and model, as the two different models and datasets showed quite dissimilar results. The models in question are a simple CNN for sentence classification, and an AWD LSTM with pre-training, both tested on the binary sentiment analysis IMDB movie review dataset, and the multi-class AG news corpus. While there were no effect from any AL strategy on AG, with or without advances, all variations showed improved results on IMDB with the CNN. Although clustering appeared as the preferred choice for the CNN, it had a negative effect when combined with transfer learning and the AWD LSTM. The combination of clustering and Bayesian approximations did not add anything more than raised computational cost, even though both boosted validation accuracy and loss individually with the CNN. All in all, no method was exceedingly better than random sampling, however, many results introduced interesting ideas for further work

    Understanding and Supporting Vocabulary Learners via Machine Learning on Behavioral and Linguistic Data

    Full text link
    This dissertation presents various machine learning applications for predicting different cognitive states of students while they are using a vocabulary tutoring system, DSCoVAR. We conduct four studies, each of which includes a comprehensive analysis of behavioral and linguistic data and provides data-driven evidence for designing personalized features for the system. The first study presents how behavioral and linguistic interactions from the vocabulary tutoring system can be used to predict students' off-task states. The study identifies which predictive features from interaction signals are more important and examines different types of off-task behaviors. The second study investigates how to automatically evaluate students' partial word knowledge from open-ended responses to definition questions. We present a technique that augments modern word-embedding techniques with a classic semantic differential scaling method from cognitive psychology. We then use this interpretable semantic scale method for predicting students' short- and long-term learning. The third and fourth studies show how to develop a model that can generate more efficient training curricula for both human and machine vocabulary learners. The third study illustrates a deep-learning model to score sentences for a contextual vocabulary learning curriculum. We use pre-trained language models, such as ELMo or BERT, and an additional attention layer to capture how the context words are less or more important with respect to the meaning of the target word. The fourth study examines how the contextual informativeness model, originally designed to develop curricula for human vocabulary learning, can also be used for developing curricula for various word embedding models. We identify sentences predicted as low informative for human learners are also less helpful for machine learning algorithms. Having a rich understanding of user behaviors, responses, and learning stimuli is imperative to develop an intelligent online system. Our studies demonstrate interpretable methods with cross-disciplinary approaches to understand various cognitive states of students during learning. The analysis results provide data-driven evidence for designing personalized features that can maximize learning outcomes. Datasets we collected from the studies will be shared publicly to promote future studies related to online tutoring systems. And these findings can also be applied to represent different user states observed in other online systems. In the future, we believe our findings can help to implement a more personalized vocabulary learning system, to develop a system that uses non-English texts or different types of inputs, and to investigate how the machine learning outputs interact with students.PHDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162999/1/sjnam_1.pd
    • …
    corecore