13,373 research outputs found
Challenges and issues in terminology mapping : a digital library perspective
In light of information retrieval problems caused by the use of different subject schemes, this paper provides an overview of the terminology problem within the digital library field. Various proposed solutions are outlined and issues within one approach - terminology mapping are highlighted.Desk-based review of existing research. Findings - Discusses benefits of the mapping approach, which include improved retrieval effectiveness for users and an opportunity to overcome problems associated with the use of multilingual schemes. Also describes various drawbacks such as the labour intensive nature and expense of such an approach, the different levels of granularity in existing schemes, and the high maintenance requirements due to scheme updates, and not least the nature of user terminology. General review of mapping techniques as a potential solution to the terminology problem
Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic
Language modeling for an inflected language
such as Arabic poses new challenges for speech recognition and
machine translation due to its rich morphology. Rich morphology
results in large increases in out-of-vocabulary (OOV) rate and
poor language model parameter estimation in the absence of large
quantities of data. In this study, we present a joint
morphological-lexical language model (JMLLM) that takes
advantage of Arabic morphology. JMLLM combines
morphological segments with the underlying lexical items and
additional available information sources with regards to
morphological segments and lexical items in a single joint model.
Joint representation and modeling of morphological and lexical
items reduces the OOV rate and provides smooth probability
estimates while keeping the predictive power of whole words.
Speech recognition and machine translation experiments in
dialectal-Arabic show improvements over word and morpheme
based trigram language models. We also show that as the
tightness of integration between different information sources
increases, both speech recognition and machine translation
performances improve
Word service for grades one through six
Thesis (Ed.M.)--Boston Universit
Masked Language Model Scoring
Pretrained masked language models (MLMs) require finetuning for most NLP
tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood
scores (PLLs), which are computed by masking tokens one by one. We show that
PLLs outperform scores from autoregressive language models like GPT-2 in a
variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an
end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on
state-of-the-art baselines for low-resource translation pairs, with further
gains from domain adaptation. We attribute this success to PLL's unsupervised
expression of linguistic acceptability without a left-to-right bias, greatly
improving on scores from GPT-2 (+10 points on island effects, NPI licensing in
BLiMP). One can finetune MLMs to give scores without masking, enabling
computation in a single inference pass. In all, PLLs and their associated
pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of
pretrained MLMs; e.g., we use a single cross-lingual model to rescore
translations in multiple languages. We release our library for language model
scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020
Linguistically-driven framework for computationally efficient and scalable sign recognition
We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)
Recommended from our members
Models for Learning (Mod4L) Final Report: Representing Learning Designs
The Mod4L Models of Practice project is part of the JISC-funded Design for Learning Programme. It ran from 1 May – 31 December 2006. The philosophy underlying the project was that a general split is evident in the e-learning community between development of e-learning tools, services and standards, and research into how teachers can use these most effectively, and is impeding uptake of new tools and methods by teachers. To help overcome this barrier and bridge the gap, a need is felt for practitioner-focused resources which describe a range of learning designs and offer guidance on how these may be chosen and applied, how they can support effective practice in design for learning, and how they can support the development of effective tools, standards and systems with a learning design capability (see, for example, Griffiths and Blat 2005, JISC 2006). Practice models, it was suggested, were such a resource.
The aim of the project was to: develop a range of practice models that could be used by practitioners in real life contexts and have a high impact on improving teaching and learning practice.
We worked with two definitions of practice models. Practice models are:
1. generic approaches to the structuring and orchestration of learning activities. They express elements of pedagogic principle and allow practitioners to make informed choices (JISC 2006)
However, however effective a learning design may be, it can only be shared with others through a representation. The issue of representation of learning designs is, then, central to the concept of sharing and reuse at the heart of JISC’s Design for Learning programme. Thus practice models should be both representations of effective practice, and effective representations of practice. Hence we arrived at the project working definition of practice models as:
2. Common, but decontextualised, learning designs that are represented in a way that is usable by practitioners (teachers, managers, etc).(Mod4L working definition, Falconer & Littlejohn 2006).
A learning design is defined as the outcome of the process of designing, planning and orchestrating learning activities as part of a learning session or programme (JISC 2006).
Practice models have many potential uses: they describe a range of learning designs that are found to be effective, and offer guidance on their use; they support sharing, reuse and adaptation of learning designs by teachers, and also the development of tools, standards and systems for planning, editing and running the designs.
The project took a practitioner-centred approach, working in close collaboration with a focus group of 12 teachers recruited across a range of disciplines and from both FE and HE. Focus group members are listed in Appendix 1. Information was gathered from the focus group through two face to face workshops, and through their contributions to discussions on the project wiki. This was supplemented by an activity at a JISC pedagogy experts meeting in October 2006, and a part workshop at ALT-C in September 2006. The project interim report of August 2006 contained the outcomes of the first workshop (Falconer and Littlejohn, 2006).
The current report refines the discussion of issues of representing learning designs for sharing and reuse evidenced in the interim report and highlights problems with the concept of practice models (section 2), characterises the requirements teachers have of effective representations (section 3), evaluates a number of types of representation against these requirements (section 4), explores the more technically focused role of sequencing representations and controlled vocabularies (sections 5 & 6), documents some generic learning designs (section 8.2) and suggests ways forward for bridging the gap between teachers and developers (section 2.6).
All quotations are taken from the Mod4L wiki unless otherwise stated
- …