Search CORE

13,373 research outputs found

Challenges and issues in terminology mapping : a digital library perspective

Author: McCulloch Emma
Nicholson Dennis
Shiri Ali
Publication venue: 'Emerald'
Publication date: 01/01/2004
Field of study

In light of information retrieval problems caused by the use of different subject schemes, this paper provides an overview of the terminology problem within the digital library field. Various proposed solutions are outlined and issues within one approach - terminology mapping are highlighted.Desk-based review of existing research. Findings - Discusses benefits of the mapping approach, which include improved retrieval effectiveness for users and an opportunity to overcome problems associated with the use of multilingual schemes. Also describes various drawbacks such as the labour intensive nature and expense of such an approach, the different levels of granularity in existing schemes, and the high maintenance requirements due to scheme updates, and not least the nature of user terminology. General review of mapping techniques as a potential solution to the terminology problem

E-LIS

Crossref

University of Strathclyde Institutional Repository

Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

Author: Afify Mohamed
Deng Yonggang
Erdogan Hakan
Erdoğan Hakan
Gao Yuqing
Sarıkaya Ruhi
Sarikaya Ruhi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve

CiteSeerX

Crossref

Sabanci University Research Database

Word service for grades one through six

Author: Catterson Jane H.
Hogan Florence P.
Hubbard Frederick G.
Journay Claire L.
Klein Terry F.
Knapp Elaine G.
Noall Mabel S.
Seiders Nancy D.
Smith Muriel V.
Publication venue: Boston University
Publication date: 01/01/1957
Field of study

Thesis (Ed.M.)--Boston Universit

Boston University Institutional Repository (OpenBU)

Masked Language Model Scoring

Author: Kirchhoff Katrin
Liang Davis
Nguyen Toan Q.
Salazar Julian
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one. We show that PLLs outperform scores from autoregressive language models like GPT-2 in a variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on state-of-the-art baselines for low-resource translation pairs, with further gains from domain adaptation. We attribute this success to PLL's unsupervised expression of linguistic acceptability without a left-to-right bias, greatly improving on scores from GPT-2 (+10 points on island effects, NPI licensing in BLiMP). One can finetune MLMs to give scores without masking, enabling computation in a single inference pass. In all, PLLs and their associated pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of pretrained MLMs; e.g., we use a single cross-lingual model to rescore translations in multiple languages. We release our library for language model scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020

arXiv.org e-Print Archive

Crossref

Linguistically-driven framework for computationally efficient and scalable sign recognition

Author: Dilsizian Mark
Metaxas Dimitris N.
Neidle Carol
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)

Boston University Institutional Repository (OpenBU)

Recommended from our members

Models for Learning (Mod4L) Final Report: Representing Learning Designs

Author: Beetham Helen
Falconer Isobel
Littlejohn Allison
Lockyer Lori
Oliver Ron
Publication venue: Jisc
Publication date: 01/03/2007
Field of study

The Mod4L Models of Practice project is part of the JISC-funded Design for Learning Programme. It ran from 1 May – 31 December 2006. The philosophy underlying the project was that a general split is evident in the e-learning community between development of e-learning tools, services and standards, and research into how teachers can use these most effectively, and is impeding uptake of new tools and methods by teachers. To help overcome this barrier and bridge the gap, a need is felt for practitioner-focused resources which describe a range of learning designs and offer guidance on how these may be chosen and applied, how they can support effective practice in design for learning, and how they can support the development of effective tools, standards and systems with a learning design capability (see, for example, Griffiths and Blat 2005, JISC 2006). Practice models, it was suggested, were such a resource. The aim of the project was to: develop a range of practice models that could be used by practitioners in real life contexts and have a high impact on improving teaching and learning practice. We worked with two definitions of practice models. Practice models are: 1. generic approaches to the structuring and orchestration of learning activities. They express elements of pedagogic principle and allow practitioners to make informed choices (JISC 2006) However, however effective a learning design may be, it can only be shared with others through a representation. The issue of representation of learning designs is, then, central to the concept of sharing and reuse at the heart of JISC’s Design for Learning programme. Thus practice models should be both representations of effective practice, and effective representations of practice. Hence we arrived at the project working definition of practice models as: 2. Common, but decontextualised, learning designs that are represented in a way that is usable by practitioners (teachers, managers, etc).(Mod4L working definition, Falconer & Littlejohn 2006). A learning design is defined as the outcome of the process of designing, planning and orchestrating learning activities as part of a learning session or programme (JISC 2006). Practice models have many potential uses: they describe a range of learning designs that are found to be effective, and offer guidance on their use; they support sharing, reuse and adaptation of learning designs by teachers, and also the development of tools, standards and systems for planning, editing and running the designs. The project took a practitioner-centred approach, working in close collaboration with a focus group of 12 teachers recruited across a range of disciplines and from both FE and HE. Focus group members are listed in Appendix 1. Information was gathered from the focus group through two face to face workshops, and through their contributions to discussions on the project wiki. This was supplemented by an activity at a JISC pedagogy experts meeting in October 2006, and a part workshop at ALT-C in September 2006. The project interim report of August 2006 contained the outcomes of the first workshop (Falconer and Littlejohn, 2006). The current report refines the discussion of issues of representing learning designs for sharing and reuse evidenced in the interim report and highlights problems with the concept of practice models (section 2), characterises the requirements teachers have of effective representations (section 3), evaluates a number of types of representation against these requirements (section 4), explores the more technically focused role of sequencing representations and controlled vocabularies (sections 5 & 6), documents some generic learning designs (section 8.2) and suggests ways forward for bridging the gap between teachers and developers (section 2.6). All quotations are taken from the Mod4L wiki unless otherwise stated

Open Research Online (The Open University)