8,773 research outputs found
Automatic alignment of hieroglyphs and transliteration
Automatic alignment has important applications in philology, facilitating study of texts on the basis of electronic resources produced by different scholars. A simple technique is presented to realise such alignment for Ancient Egyptian hieroglyphic texts and transliteration. Preliminary experiments with the technique are reported, and plans for future work are discussed.Postprin
Dialogue as Data in Learning Analytics for Productive Educational Dialogue
This paper provides a novel, conceptually driven stance on the state of the contemporary analytic challenges faced in the treatment of dialogue as a form of data across on- and offline sites of learning. In prior research, preliminary steps have been taken to detect occurrences of such dialogue using automated analysis techniques. Such advances have the potential to foster effective dialogue using learning analytic techniques that scaffold, give feedback on, and provide pedagogic contexts promoting such dialogue. However, the translation of much prior learning science research to online contexts is complex, requiring the operationalization of constructs theorized in different contexts (often face-to-face), and based on different datasets and structures (often spoken dialogue). In this paper, we explore what could constitute the effective analysis of productive online dialogues, arguing that it requires consideration of three key facets of the dialogue: features indicative of productive dialogue; the unit of segmentation; and the interplay of features and segmentation with the temporal underpinning of learning contexts. The paper thus foregrounds key considerations regarding the analysis of dialogue data in emerging learning analytics environments, both for learning-science and for computationally oriented researchers
ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING
One of the fundamental characteristics of language is that it can change over time. One
method to monitor the change is by observing its corpora: a structured language
documentation. Recent development in technology, especially in the field of Natural
Language Processing allows robust linguistic processing, which support the description of
diverse historical changes of the corpora. The interference of human linguist is inevitable as
it determines the gold standard, but computer assistance provides considerable support by
incorporating computational approach in exploring the corpora, especially historical
corpora. This paper proposes a model for corpus development, where corpus are annotated
to support further computational operations such as lexicogrammatical pattern matching,
automatic retrieval and extraction. The corpus processing operations are performed by local
grammar based corpus processing software on a contemporary Indonesian corpus. This
paper concludes that data collection and data processing in a corpus are equally crucial
importance to monitor language change, and none can be set aside
The Effects of Fluency-Building Strategies on the Oral Reading Rates of First-Grade Students
The purpose of this study was to determine the effects of explicit fluency-building strategies on the oral reading rates of first-grade students. According to the National Reading Panel (2000) there are five essential components of reading instruction: phonemic awareness, phonics, fluency, vocabulary and comprehension. All components are needed to achieve the complex skill of reading. Due to the reciprocal nature of these skills pertaining to reading, a deficit in any reading component can cause difficulties in learning to read (O\u27Connor, 2007). Therefore, reading fluency is critical to proficiency in reading. Specifically, this study investigated whether explicit instruction in fluency-building strategies significantly increased the oral reading rates of first-grade students. The experimental group participated in explicit instruction of fluency strategies for 15-30 minutes a day, five days a week, for sixteen weeks. This treatment occurred within the hours of the regular school day. The target population of this study involved 56 first- grade students from three multicultural elementary schools in a suburban-rural school district. The measure of the dependent variable, oral reading rate, was the Dynamic Indicators of Basic Early Literacy Skills (DIBELS). The Oral Reading Fluency (ORF) measure was administered twice during the course of this study: pre and post treatment. The scores of the DIBELS ORF were analyzed to determine the effect of explicit fluency-building strategies on the reading rates of first-grade students.
The results of this research study did not indicate a significant increase in the oral reading rates of the first-grade students who participated in explicit fluency-building instruction. Students in both the experimental and control groups experienced increases in their oral reading rates as measured on by the Oral Reading Fluency measure of the DIBELS. The results of this study generated no empirical evidence to support the implementation of explicit research-based fluency strategies. Therefore, the null hypothesis was retained. In summary, the purpose for this dissertation topic was to investigate how fluency building strategies can be systematically implemented into reading instruction to increase the oral reading achievement rates of first-grade students. Further, this study provided opportunities for students to practice and assimilate fluency strategies
What Level of Quality can Neural Machine Translation Attain on Literary Text?
Given the rise of a new approach to MT, Neural MT (NMT), and its promising
performance on different text types, we assess the translation quality it can
attain on what is perceived to be the greatest challenge for MT: literary text.
Specifically, we target novels, arguably the most popular type of literary
text. We build a literary-adapted NMT system for the English-to-Catalan
translation direction and evaluate it against a system pertaining to the
previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this
end, for the first time we train MT systems, both NMT and PBSMT, on large
amounts of literary text (over 100 million words) and evaluate them on a set of
twelve widely known novels spanning from the the 1920s to the present day.
According to the BLEU automatic evaluation metric, NMT is significantly better
than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a
11% relative improvement (3 points absolute) over PBSMT. A complementary human
evaluation on three of the books shows that between 17% and 34% of the
translations, depending on the book, produced by NMT (versus 8% and 20% with
PBSMT) are perceived by native speakers of the target language to be of
equivalent quality to translations produced by a professional human translator.Comment: Chapter for the forthcoming book "Translation Quality Assessment:
From Principles to Practice" (Springer
Combining data-driven MT systems for improved sign language translation
In this paper, we investigate the feasibility of combining two data-driven machine translation (MT) systems for the translation of sign languages (SLs). We take the MT systems of two prominent data-driven research groups, the MaTrEx system developed at DCU and the Statistical Machine
Translation (SMT) system developed at RWTH Aachen University, and apply their respective approaches to the task of translating Irish Sign Language and German Sign Language into English and German. In a set of experiments supported by automatic evaluation results, we show that
there is a definite value to the prospective merging of MaTrEx’s Example-Based MT chunks and distortion limit increase with RWTH’s constraint reordering
Split and Rephrase
We propose a new sentence simplification task (Split-and-Rephrase) where the
aim is to split a complex sentence into a meaning preserving sequence of
shorter sentences. Like sentence simplification, splitting-and-rephrasing has
the potential of benefiting both natural language processing and societal
applications. Because shorter sentences are generally better processed by NLP
systems, it could be used as a preprocessing step which facilitates and
improves the performance of parsers, semantic role labellers and machine
translation systems. It should also be of use for people with reading
disabilities because it allows the conversion of longer sentences into shorter
ones. This paper makes two contributions towards this new task. First, we
create and make available a benchmark consisting of 1,066,115 tuples mapping a
single complex sentence to a sequence of sentences expressing the same meaning.
Second, we propose five models (vanilla sequence-to-sequence to
semantically-motivated models) to understand the difficulty of the proposed
task.Comment: 11 pages, EMNLP 201
Examining reading fluency in a foreign language: Effects of text segmentation on L2 readers
Grouping words into meaningful chunks is a fundamental process for fluent reading. The present study is an attempt to understand the relationship between chunking and second language (L2) reading fluency. The effects of text segmentation on comprehension, rate, and regression in L2 reading were investigated using a self-paced reading task in a moving-window condition. The participants were intermediate and advanced level Japanese EFL learners. The difficulty of chunking a text negatively affected comprehension and smoothness for the intermediate learners, while the advanced learners were able to overcome chunking difficulty. In this study, although the negative effects of chunking difficulty were observed, the positive effects of assisting chunking were not clearly detected, which was interpreted as suggesting that the relationship between chunking and reading needs to be considered in light of the complex interplay between text difficulty and different aspects of reading
Desiderata for an Every Citizen Interface to the National Information Infrastructure: Challenges for NLP
In this paper, I provide desiderata for an interface that would enable ordinary people to properly access the capabilities of the NII. I identify some of the technologies that will be needed to achieve these desiderata, and discuss current and future research directions that could lead to the development of such technologies. In particular, I focus on the ways in which theory and techniques from natural language processing could contribute to future interfaces to the NII. Introduction The evolving national information infrastructure (NII) has made available a vast array of on-line services and networked information resources in a variety of forms (text, speech, graphics, images, video). At the same time, advances in computing and telecommunications technology have made it possible for an increasing number of households to own (or lease or use) powerful personal computers that are connected to this resource. Accompanying this progress is the expectation that people will be able to more..
- …