298 research outputs found
Biomedical Term Extraction: NLP Techniques in Computational Medicine
Artificial Intelligence (AI) and its branch Natural Language Processing (NLP) in particular are main contributors to recent advances in classifying documentation and extracting information from assorted fields, Medicine being one that has gathered a lot of attention due to the amount of information generated in public professional journals and other means of communication within the medical profession. The typical information extraction task from technical texts is performed via an automatic term recognition extractor. Automatic Term Recognition (ATR) from technical texts is applied for the identification of key concepts for information retrieval and, secondarily, for machine translation. Term recognition depends on the subject domain and the lexical patterns of a given language, in our case, Spanish, Arabic and Japanese. In this article, we present the methods and techniques for creating a biomedical corpus of validated terms, with several tools for optimal exploitation of the information therewith contained in said corpus. This paper also shows how these techniques and tools have been used in a prototype
Computer tool for use by children with learning difficulties in spelling
The development of a computer tool to be used by children with learning
difficulties in spelling is described in this thesis.
Children with spelling disabilities were observed by the author, and their errors
were recorded. Based on analysis of these errors, a scheme of error
classification was devised. It was hypothesized that there were regularities in
the errors; that the classification scheme describing these errors could provide
adequate information to enable a computer program to 'debug' the children's
errors and to reconstruct the intended words; and that the children would be
able to recognize correct spellings even if they could not produce them.
Two computer programs, the EDITCOST and the PHONCODE programs, were
developed. These incorporated information about the types of errors that were
made by the children, described in terms of the classification scheme. They
were used both to test the hypotheses and as potential components of a larger
program to be used as a compensatory tool.
The main conclusions drawn from this research are:
The errors made by children with learning difficulties in spelling show
regularities in both the phoneme-grapheme correspondences and at the level of
the orthography.
The classification scheme developed, based on the children's errors, provides a
description of these errors. It provides adequate information to enable a
computer program to 'debug' the children's errors and to reconstruct the
intended words.
Computer tools in the form of interactive spelling correctors are able to offer a
correction for a substantial proportion of the child's errors, and could be
extended to provide more information about the children's errors. They are also
suitable for use with other groups of children
A Comparison of Different Machine Transliteration Models
Machine transliteration is a method for automatically converting words in one
language into phonetically equivalent ones in another language. Machine
transliteration plays an important role in natural language applications such
as information retrieval and machine translation, especially for handling
proper nouns and technical terms. Four machine transliteration models --
grapheme-based transliteration model, phoneme-based transliteration model,
hybrid transliteration model, and correspondence-based transliteration model --
have been proposed by several researchers. To date, however, there has been
little research on a framework in which multiple transliteration models can
operate simultaneously. Furthermore, there has been no comparison of the four
models within the same framework and using the same data. We addressed these
problems by 1) modeling the four models within the same framework, 2) comparing
them under the same conditions, and 3) developing a way to improve machine
transliteration through this comparison. Our comparison showed that the hybrid
and correspondence-based models were the most effective and that the four
models can be used in a complementary manner to improve machine transliteration
performance
Learner Modelling for Individualised Reading in a Second Language
Extensive reading is an effective language learning technique that involves fast reading of large quantities of easy and interesting second language (L2) text. However, graded readers used by beginner learners are expensive and often dull. The alternative is text written for native speakers (authentic text), which is generally too difficult for beginners. The aim of this research is to overcome this problem by developing a computer-assisted approach that enables learners of all abilities to perform effective extensive reading using freely-available text on the web.
This thesis describes the research, development and evaluation of a complex software system called FERN that combines learner modelling and iCALL with narrow reading of electronic text. The system incorporates four key components: (1) automatic glossing of difficult words in texts, (2) individualised search engine for locating interesting texts of appropriate difficulty, (3) supplementary exercises for introducing key vocabulary and reviewing difficult words and (4) reliably monitoring reading and reporting progress. FERN was optimised for English speakers learning Spanish, but is easily adapted for learners of others languages.
The suitability of the FERN system was evaluated through corpus analysis, machine translation analysis and a year-long study with second year university Spanish class. The machine translation analysis combined with the classroom study demonstrated that the word and phrase error rate generated in FERN is low enough to validate the use of machine translation to automatically generate glosses, but is high enough that a translation dictionary is required as a backup. The classroom study demonstrated that when aided by glosses students can read at over 100 words per minute if they know 95% of the words, whereas compared to the 98% word knowledge required for effective unaided extensive reading. A corpus analysis demonstrated that beginner learners of Spanish can do effective narrow reading of news articles using FERN after learning only 200–300 high-frequency word families, in addition to familiarity with English-Spanish cognates and proper nouns.
FERN also reliably monitors reading speeds and word counts, and provides motivating progress reports, which enable teachers to set concrete reading goals that dramatically increase the quantity that students read, as demonstrated in the user study
English/Veneto Resource Poor Machine Translation with STILVEN
The paper reports ongoing work for the
implementation of a system for automatic translation
from English-to-Veneto and viceversa. The system does
not have parallel texts to work on because of the
almost inexistence of such manual translations. The
project is called STILVEN and is financed by the
Regional Authorities of Veneto Region in Italy. After
the first year of activities, we managed to produce a
prototype which handles Venetian questions that have
a structure very close to English. We will present
problems related to Veneto, basic ideas, their
implementatiion and results obtained
- …