158 research outputs found

    Orthographic Learning in Arabic-Speaking Primary School Students

    Get PDF
    The aim of this study was to examine how Arabic-speaking children construct orthographic representations and to identify cognitive/linguistic abilities that may facilitate novel word learning. The research involved first examining factors associated with single word reading and spelling accuracy in Arabic-speaking monolingual children and Arabic-English bilingual children, in order to separate universal from script-dependent predictors. Because Arabic is diglossic (i.e., two varieties of the language, one spoken, and one for literary purposes), it was considered important to include print exposure as a measure in investigating factors associated with single word reading and spelling. Thus, Study 1 involved the development of Title Recognition Tests (TRT) in Arabic and in English. Participants were children from grades three to five; 86 students participated in the development of the lists in Study 1a, and 76 in the development of the revised lists in Study 1b. Both lists were reliable and were used in the subsequent studies. Study 2 involved examining predictors of single word reading and spelling (receptive vocabulary, phonological processing, RAN, TRT, and orthographic matching) in 86 third- to fifth-grade bilingual children and 116 third-grade monolingual children. For the bilinguals, PA emerged as the strongest predictor of reading and spelling in Arabic. In English, verbal STM and orthographic matching were predictors for the younger bilinguals. PA was the strongest predictor of reading and spelling for the monolinguals. In Study 3, novel word learning in Arabic was examined using a paired-associate learning task, orthography present or absent and varying visual complexity (ligature and diacritics). The 116 monolingual children from Study 2 participated. Child-related predictors of novel word learning were examined. Results revealed that presence of orthography facilitated learning. There was evidence that consonant diacritics are a source of difficulty, but diglossic phonemes may also be responsible for reading difficulties documented in Arabic

    Redesigning Arabic Learning Books, An exploration of the role of graphic communication and typography as visual pedagogic tools in Arabic-Latin bilingual design

    Full text link
    What are ‘educational typefaces’ and why are they needed today? Do Arabic beginners need special typefaces that can simplify learning further? If so, what features should they have? Research findings on the complexity of learning Arabic confirm that the majority of language textbooks and pedagogic materials lead to challenging learning environments due to the poor quality of book design, text-heavy content and the restricted amount of visuals used. The complexity of the data and insufficient design quality of the learning materials reviewed in this practice-based research demand serious thought toward simplification, involving experts in the fields of graphic communication, learning and typeface design. The study offers solutions to some of the problems that arise in the course of designing language-learning books by reviewing selected English learning and information design books and methods of guidance for developing uniform learning material for basic Arabic. Key findings from this study confirm the significant role of Arabic designers and educators in the production of efficient and effective learning materials. Their role involves working closely with Arabic instructors, mastering good language skills and being aware of the knowledge available. Also, selecting legible typefaces with distinct design characteristics to help fulfil various objectives of the learning unit. This study raises awareness of the need for typefaces that can attract people to learn Arabic more easily within a globalized world. The absence of such typefaces led to the exploration of simplified twentieth-century Arabic typefaces that share a similar idea of facilitating reading and writing, and resolving script and language complexity issues. This study traces their historical context and studies their functional, technical and aesthetic features to incorporate their thinking and reassign them as learning tools within the right context. The final outcome is the construction of an experimental bilingual Arabic-English language book series for Arab and non-Arab adult beginners. The learning tools used to create the book series were tested through workshops in Kuwait and London to measure their level of simplification and accessibility. They have confirmed both accessibility and incompatibility within different areas of the learning material of the books and helped improve the final outcome of the practice. The tools have established the significant role of educational typefaces, bilingual and graphic communication within visual Arabic learning

    English speakers' common orthographic errors in Arabic as L2 writing system : an analytical case study

    Get PDF
    PhD ThesisThe research involving Arabic Writing System (WS) is quite limited. Yet, researching writing errors of L2WS Arabic against a certain L1WS seems to be relatively neglected. This study attempts to identify, describe, and explain common orthographic errors in Arabic writing amongst English-speaking learners. First, it outlines the Arabic Writing System’s (AWS) characteristics and available empirical studies of L2WS Arabic. This study embraced the Error Analysis approach, utilising a mixed-method design that deployed quantitative and qualitative tools (writing tests, questionnaire, and interview). The data were collected from several institutions around the UK, which collectively accounted for 82 questionnaire responses, 120 different writing samples from 44 intermediate learners, and six teacher interviews. The hypotheses for this research were; a) English-speaking learners of Arabic make common orthographic errors similar to those of Arabic native speakers; b) English-speaking learners share several common orthographic errors with other learners of Arabic as a second/foreign language (AFL); and c) English-speaking learners of Arabic produce their own common orthographic errors which are specifically related to the differences between the two WSs. The results confirmed all three hypotheses. Specifically, English-speaking learners of L2WS Arabic commonly made six error types: letter ductus (letter shape), orthography (spelling), phonology, letter dots, allographemes (i.e. letterform), and direction. Gemination and L1WS transfer error rates were not found to be major. Another important result showed that five letter groups in addition to two letters are particularly challenging to English-speaking learners. Study results indicated that error causes were likely to be from one of four factors: script confusion, orthographic difficulties, phonological realisation, and teaching/learning strategies. These results are generalizable as the data were collected from several institutions in different parts of the UK. Suggestions and implications as well as recommendations for further research are outlined accordingly in the conclusion chapter

    Agile in-litero experiments:how can semi-automated information extraction from neuroscientific literature help neuroscience model building?

    Get PDF
    In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles in peer-reviewed journals. One challenge for modern neuroinformatics is to design methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and its integration into computational models. In this thesis, we introduce novel natural language processing (NLP) models and systems to mine the neuroscientific literature. In addition to in vivo, in vitro or in silico experiments, we coin the NLP methods developed in this thesis as in litero experiments, aiming at analyzing and making accessible the extended body of neuroscientific literature. In particular, we focus on two important neuroscientific entities: brain regions and neural cells. An integrated NLP model is designed to automatically extract brain region connectivity statements from very large corpora. This system is applied to a large corpus of 25M PubMed abstracts and 600K full-text articles. Central to this system is the creation of a searchable database of brain region connectivity statements, allowing neuroscientists to gain an overview of all brain regions connected to a given region of interest. More importantly, the database enables researcher to provide feedback on connectivity results and links back to the original article sentence to provide the relevant context. The database is evaluated by neuroanatomists on real connectomics tasks (targets of Nucleus Accumbens) and results in significant effort reduction in comparison to previous manual methods (from 1 week to 2h). Subsequently, we introduce neuroNER to identify, normalize and compare instances of identify neuronsneurons in the scientific literature. Our method relies on identifying and analyzing each of the domain features used to annotate a specific neuron mention, like the morphological term 'basket' or brain region 'hippocampus'. We apply our method to the same corpus of 25M PubMed abstracts and 600K full-text articles and find over 500K unique neuron type mentions. To demonstrate the utility of our approach, we also apply our method towards cross-comparing the NeuroLex and Human Brain Project (HBP) cell type ontologies. By decoupling a neuron mention's identity into its specific compositional features, our method can successfully identify specific neuron types even if they are not explicitly listed within a predefined neuron type lexicon, thus greatly facilitating cross-laboratory studies. In order to build such large databases, several tools and infrastructureslarge-scale NLP were developed: a robust pipeline to preprocess full-text PDF articles, as well as bluima, an NLP processing pipeline specialized on neuroscience to perform text-mining at PubMed scale. During the development of those two NLP systems, we acknowledged the need for novel NLP approaches to rapidly develop custom text mining solutions. This led to the formalization of the agile text miningagile text-mining methodology to improve the communication and collaboration between subject matter experts and text miners. Agile text mining is characterized by short development cycles, frequent tasks redefinition and continuous performance monitoring through integration tests. To support our approach, we developed Sherlok, an NLP framework designed for the development of agile text mining applications

    Medieval Multilingual Manuscripts

    Get PDF
    Medieval manuscripts combining multiple languages, whether in fusion or in collision, provide tangible evidence for linguistic and cultural interactions. Such encounters are documented in this volume through case studies from across Europe and Asia, all the way from Ireland to Japan, exploring the creativity of medieval language use as a function of cross-cultural contact and fluidity in this key period of nation-formation (9th-14th centuries CE)

    Designing Khom Thai Letterforms for Accessibility

    Get PDF
    This practice-led research aimed to design letterforms for an ancient Thai script known as Khom Thai, to aid learning of the script by today’s Thai population. Khom is a script that was developed in Thailand around the 15th century. It was widely used as the country’s official script for historical documents and records in Pali, Sanskrit, and Thai until 1945. Now, very few members of younger generations can read the script, which poses a major obstacle for preserving the knowledge of Khom Thai and severely limits access to the country’s historical documents and heritage. Although there are some relationships between contemporary Thai letters and Khom Thai letters, the unfamiliar letterforms constitute the largest hurdle for Thai readers learning to read the Khom Thai script. This study’s goal was to resolve this problem by creating three new Khom Thai letterform designs for use as learning materials and writing models for beginners. This study investigated whether Khom Thai letterforms could be redesigned so that modern Thai readers could recognise them more easily. To explore this possibility, three letterform designs, TLK Deva, TLK Brahma and TLK Manussa, were developed. This practice-led research employed mixed methods, including interviews, a questionnaire, and a letter recognition study. The first section of the research discusses the theoretical framework regarding familiarity in enhancing letter recognition. Additionally, analyses on Thai, Khom Thai, and Khmer letterforms were also included in this part. The second section is about the design process resulted in three designs. Among the three, TLK Brahma and TLK Deva maintain a close connection to the proportions and writing style of the traditional script, and could potentially be used as writing models for those learning the script. By contrast, TLK Manussa is adapted to characteristics and proportions of the present-day Thai script and is intended to look more familiar to Thai readers. One potential use of TLK Manussa is as a mnemonic aid for learning Khom Thai characters. Interviews were conducted with Khom Thai palaeographic experts to gather opinions on the designs. A questionnaire was also used with 102 participants to establish which of the three TLK designs had most familiar characteristics for Thai readers. The results showed that TLK Manussa was the most familiar among the three. After further refinement of the designs, the third section describes the data collection procedures. A short-exposure technique was used with 32 participants who already had some knowledge of Khom Thai, to compare letter recognition. This method was used for gathering reader feedback on the designs. In general, the findings did not indicate any significant differences between the three designs regarding the accuracy rate of letter identification. However, certain individual letters that more closely resembled the Thai script received higher scores than did unfamiliar characters. The three TLK designs constitute the primary contribution to knowledge. However, further contributions made by this research are its analyses of Khom Thai characters and its systematic guidelines for developing Khom Thai letterforms, the guidelines will aid future type designers of Khom Thai on letterform design. The study contributes to the field of research in non-Latin type design by endorsing the role of design in enabling contemporary audiences to learn ancient Thai scripts

    Geometric Layout Analysis of Scanned Documents

    Get PDF
    Layout analysis--the division of page images into text blocks, lines, and determination of their reading order--is a major performance limiting step in large scale document digitization projects. This thesis addresses this problem in several ways: it presents new performance measures to identify important classes of layout errors, evaluates the performance of state-of-the-art layout analysis algorithms, presents a number of methods to reduce the error rate and catastrophic failures occurring during layout analysis, and develops a statistically motivated, trainable layout analysis system that addresses the needs of large-scale document analysis applications. An overview of the key contributions of this thesis is as follows. First, this thesis presents an efficient local adaptive thresholding algorithm that yields the same quality of binarization as that of state-of-the-art local binarization methods, but runs in time close to that of global thresholding methods, independent of the local window size. Tests on the UW-1 dataset demonstrate a 20-fold speedup compared to traditional local thresholding techniques. Then, this thesis presents a new perspective for document image cleanup. Instead of trying to explicitly detect and remove marginal noise, the approach focuses on locating the page frame, i.e. the actual page contents area. A geometric matching algorithm is presented to extract the page frame of a structured document. It is demonstrated that incorporating page frame detection step into document processing chain results in a reduction in OCR error rates from 4.3% to 1.7% (n=4,831,618 characters) on the UW-III dataset and layout-based retrieval error rates from 7.5% to 5.3% (n=815 documents) on the MARG dataset. The performance of six widely used page segmentation algorithms (x-y cut, smearing, whitespace analysis, constrained text-line finding, docstrum, and Voronoi) on the UW-III database is evaluated in this work using a state-of-the-art evaluation methodology. It is shown that current evaluation scores are insufficient for diagnosing specific errors in page segmentation and fail to identify some classes of serious segmentation errors altogether. Thus, a vectorial score is introduced that is sensitive to, and identifies, the most important classes of segmentation errors (over-, under-, and mis-segmentation) and what page components (lines, blocks, etc.) are affected. Unlike previous schemes, this evaluation method has a canonical representation of ground truth data and guarantees pixel-accurate evaluation results for arbitrary region shapes. Based on a detailed analysis of the errors made by different page segmentation algorithms, this thesis presents a novel combination of the line-based approach by Breuel with the area-based approach of Baird which solves the over-segmentation problem in area-based approaches. This new approach achieves a mean text-line extraction error rate of 4.4% (n=878 documents) on the UW-III dataset, which is the lowest among the analyzed algorithms. This thesis also describes a simple, fast, and accurate system for document image zone classification that results from a detailed comparative analysis of performance of widely used features in document analysis and content-based image retrieval. Using a novel combination of known algorithms, an error rate of 1.46% (n=13,811 zones) is achieved on the UW-III dataset in comparison to a state-of-the-art system that reports an error rate of 1.55% (n=24,177 zones) using more complicated techniques. In addition to layout analysis of Roman script documents, this work also presents the first high-performance layout analysis method for Urdu script. For that purpose a geometric text-line model for Urdu script is presented. It is shown that the method can accurately extract Urdu text-lines from documents of different layouts like prose books, poetry books, magazines, and newspapers. Finally, this thesis presents a novel algorithm for probabilistic layout analysis that specifically addresses the needs of large-scale digitization projects. The presented approach models known page layouts as a structural mixture model. A probabilistic matching algorithm is presented that gives multiple interpretations of input layout with associated probabilities. An algorithm based on A* search is presented for finding the most likely layout of a page, given its structural layout model. For training layout models, an EM-like algorithm is presented that is capable of learning the geometric variability of layout structures from data, without the need for a page segmentation ground-truth. Evaluation of the algorithm on documents from the MARG dataset shows an accuracy of above 95% for geometric layout analysis.Geometrische Layout-Analyse von gescannten Dokumente

    The Emergence of Standard English

    Get PDF
    Language scholars have traditionally agreed that the development of the English language was largely unplanned. Fisher challenges this view, demonstrating that the standardization of writing and pronunciation was, and still is, made under the control of political and intellectual forces. There is much to interest scholars of late Middle English language and literature. -- Journal of English and Germanic Philology Fisher\u27s argument about Chancery English in the fifteenth century deserves to be widely known. -- Speculum The coherence of the story that Fisher traces and the archival materials that he has provided will continue to stimulate scholarly investigation and discovery. -- Studies in the Age of Chaucerhttps://uknowledge.uky.edu/upk_english_language_and_literature/1000/thumbnail.jp
    • …
    corecore