8,773 research outputs found

    Automatic alignment of hieroglyphs and transliteration

    Get PDF
    Automatic alignment has important applications in philology, facilitating study of texts on the basis of electronic resources produced by different scholars. A simple technique is presented to realise such alignment for Ancient Egyptian hieroglyphic texts and transliteration. Preliminary experiments with the technique are reported, and plans for future work are discussed.Postprin

    Dialogue as Data in Learning Analytics for Productive Educational Dialogue

    Get PDF
    This paper provides a novel, conceptually driven stance on the state of the contemporary analytic challenges faced in the treatment of dialogue as a form of data across on- and offline sites of learning. In prior research, preliminary steps have been taken to detect occurrences of such dialogue using automated analysis techniques. Such advances have the potential to foster effective dialogue using learning analytic techniques that scaffold, give feedback on, and provide pedagogic contexts promoting such dialogue. However, the translation of much prior learning science research to online contexts is complex, requiring the operationalization of constructs theorized in different contexts (often face-to-face), and based on different datasets and structures (often spoken dialogue). In this paper, we explore what could constitute the effective analysis of productive online dialogues, arguing that it requires consideration of three key facets of the dialogue: features indicative of productive dialogue; the unit of segmentation; and the interplay of features and segmentation with the temporal underpinning of learning contexts. The paper thus foregrounds key considerations regarding the analysis of dialogue data in emerging learning analytics environments, both for learning-science and for computationally oriented researchers

    ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING

    Get PDF
    One of the fundamental characteristics of language is that it can change over time. One method to monitor the change is by observing its corpora: a structured language documentation. Recent development in technology, especially in the field of Natural Language Processing allows robust linguistic processing, which support the description of diverse historical changes of the corpora. The interference of human linguist is inevitable as it determines the gold standard, but computer assistance provides considerable support by incorporating computational approach in exploring the corpora, especially historical corpora. This paper proposes a model for corpus development, where corpus are annotated to support further computational operations such as lexicogrammatical pattern matching, automatic retrieval and extraction. The corpus processing operations are performed by local grammar based corpus processing software on a contemporary Indonesian corpus. This paper concludes that data collection and data processing in a corpus are equally crucial importance to monitor language change, and none can be set aside

    The Effects of Fluency-Building Strategies on the Oral Reading Rates of First-Grade Students

    Get PDF
    The purpose of this study was to determine the effects of explicit fluency-building strategies on the oral reading rates of first-grade students. According to the National Reading Panel (2000) there are five essential components of reading instruction: phonemic awareness, phonics, fluency, vocabulary and comprehension. All components are needed to achieve the complex skill of reading. Due to the reciprocal nature of these skills pertaining to reading, a deficit in any reading component can cause difficulties in learning to read (O\u27Connor, 2007). Therefore, reading fluency is critical to proficiency in reading. Specifically, this study investigated whether explicit instruction in fluency-building strategies significantly increased the oral reading rates of first-grade students. The experimental group participated in explicit instruction of fluency strategies for 15-30 minutes a day, five days a week, for sixteen weeks. This treatment occurred within the hours of the regular school day. The target population of this study involved 56 first- grade students from three multicultural elementary schools in a suburban-rural school district. The measure of the dependent variable, oral reading rate, was the Dynamic Indicators of Basic Early Literacy Skills (DIBELS). The Oral Reading Fluency (ORF) measure was administered twice during the course of this study: pre and post treatment. The scores of the DIBELS ORF were analyzed to determine the effect of explicit fluency-building strategies on the reading rates of first-grade students. The results of this research study did not indicate a significant increase in the oral reading rates of the first-grade students who participated in explicit fluency-building instruction. Students in both the experimental and control groups experienced increases in their oral reading rates as measured on by the Oral Reading Fluency measure of the DIBELS. The results of this study generated no empirical evidence to support the implementation of explicit research-based fluency strategies. Therefore, the null hypothesis was retained. In summary, the purpose for this dissertation topic was to investigate how fluency building strategies can be systematically implemented into reading instruction to increase the oral reading achievement rates of first-grade students. Further, this study provided opportunities for students to practice and assimilate fluency strategies

    What Level of Quality can Neural Machine Translation Attain on Literary Text?

    Get PDF
    Given the rise of a new approach to MT, Neural MT (NMT), and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of twelve widely known novels spanning from the the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a 11% relative improvement (3 points absolute) over PBSMT. A complementary human evaluation on three of the books shows that between 17% and 34% of the translations, depending on the book, produced by NMT (versus 8% and 20% with PBSMT) are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.Comment: Chapter for the forthcoming book "Translation Quality Assessment: From Principles to Practice" (Springer

    Combining data-driven MT systems for improved sign language translation

    Get PDF
    In this paper, we investigate the feasibility of combining two data-driven machine translation (MT) systems for the translation of sign languages (SLs). We take the MT systems of two prominent data-driven research groups, the MaTrEx system developed at DCU and the Statistical Machine Translation (SMT) system developed at RWTH Aachen University, and apply their respective approaches to the task of translating Irish Sign Language and German Sign Language into English and German. In a set of experiments supported by automatic evaluation results, we show that there is a definite value to the prospective merging of MaTrEx’s Example-Based MT chunks and distortion limit increase with RWTH’s constraint reordering

    Split and Rephrase

    Get PDF
    We propose a new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. Like sentence simplification, splitting-and-rephrasing has the potential of benefiting both natural language processing and societal applications. Because shorter sentences are generally better processed by NLP systems, it could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems. It should also be of use for people with reading disabilities because it allows the conversion of longer sentences into shorter ones. This paper makes two contributions towards this new task. First, we create and make available a benchmark consisting of 1,066,115 tuples mapping a single complex sentence to a sequence of sentences expressing the same meaning. Second, we propose five models (vanilla sequence-to-sequence to semantically-motivated models) to understand the difficulty of the proposed task.Comment: 11 pages, EMNLP 201

    Examining reading fluency in a foreign language: Effects of text segmentation on L2 readers

    Get PDF
    Grouping words into meaningful chunks is a fundamental process for fluent reading. The present study is an attempt to understand the relationship between chunking and second language (L2) reading fluency. The effects of text segmentation on comprehension, rate, and regression in L2 reading were investigated using a self-paced reading task in a moving-window condition. The participants were intermediate and advanced level Japanese EFL learners. The difficulty of chunking a text negatively affected comprehension and smoothness for the intermediate learners, while the advanced learners were able to overcome chunking difficulty. In this study, although the negative effects of chunking difficulty were observed, the positive effects of assisting chunking were not clearly detected, which was interpreted as suggesting that the relationship between chunking and reading needs to be considered in light of the complex interplay between text difficulty and different aspects of reading

    Desiderata for an Every Citizen Interface to the National Information Infrastructure: Challenges for NLP

    Get PDF
    In this paper, I provide desiderata for an interface that would enable ordinary people to properly access the capabilities of the NII. I identify some of the technologies that will be needed to achieve these desiderata, and discuss current and future research directions that could lead to the development of such technologies. In particular, I focus on the ways in which theory and techniques from natural language processing could contribute to future interfaces to the NII. Introduction The evolving national information infrastructure (NII) has made available a vast array of on-line services and networked information resources in a variety of forms (text, speech, graphics, images, video). At the same time, advances in computing and telecommunications technology have made it possible for an increasing number of households to own (or lease or use) powerful personal computers that are connected to this resource. Accompanying this progress is the expectation that people will be able to more..
    corecore