20 research outputs found

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

    Script Effects as the Hidden Drive of the Mind, Cognition, and Culture

    Get PDF
    This open access volume reveals the hidden power of the script we read in and how it shapes and drives our minds, ways of thinking, and cultures. Expanding on the Linguistic Relativity Hypothesis (i.e., the idea that language affects the way we think), this volume proposes the “Script Relativity Hypothesis” (i.e., the idea that the script in which we read affects the way we think) by offering a unique perspective on the effect of script (alphabets, morphosyllabaries, or multi-scripts) on our attention, perception, and problem-solving. Once we become literate, fundamental changes occur in our brain circuitry to accommodate the new demand for resources. The powerful effects of literacy have been demonstrated by research on literate versus illiterate individuals, as well as cross-scriptal transfer, indicating that literate brain networks function differently, depending on the script being read. This book identifies the locus of differences between the Chinese, Japanese, and Koreans, and between the East and the West, as the neural underpinnings of literacy. To support the “Script Relativity Hypothesis”, it reviews a vast corpus of empirical studies, including anthropological accounts of human civilization, social psychology, cognitive psychology, neuropsychology, applied linguistics, second language studies, and cross-cultural communication. It also discusses the impact of reading from screens in the digital age, as well as the impact of bi-script or multi-script use, which is a growing trend around the globe. As a result, our minds, ways of thinking, and cultures are now growing closer together, not farther apart. ; Examines the origin, emergence, and co-evolution of written language, the human mind, and culture within the purview of script effects Investigates how the scripts we read over time shape our cognition, mind, and thought patterns Provides a new outlook on the four representative writing systems of the world Discusses the consequences of literacy for the functioning of the min

    The Nature of Writing – A Theory of Grapholinguistics [book cover]

    Get PDF
    Cover illustration: Purgatory: Canto VII – The Rule of the Mountain from A Typographic Dante (2008) by Barrie Tullett (also displayed in Barrie Tullett, Typewriter Art: A Modern Anthology, London: Laurence King Publishing, 2014, p. 167). With kind permission by Barrie Tullett. The text is taken from Dante. The Divine Comedy, translated by Dorothy L. Sayers, Harmondsworth­Middlesex: The Penguin Classics, 1949. On the lower part of the illustration, one can read the concluding verses of the Canto: But now the poet was going on before; “Forward!” said he; “look how the sun doth stand Meridian­high, while on the Western shore Night sets her foot upon Morocco’s strand.

    English speakers' common orthographic errors in Arabic as L2 writing system : an analytical case study

    Get PDF
    PhD ThesisThe research involving Arabic Writing System (WS) is quite limited. Yet, researching writing errors of L2WS Arabic against a certain L1WS seems to be relatively neglected. This study attempts to identify, describe, and explain common orthographic errors in Arabic writing amongst English-speaking learners. First, it outlines the Arabic Writing System’s (AWS) characteristics and available empirical studies of L2WS Arabic. This study embraced the Error Analysis approach, utilising a mixed-method design that deployed quantitative and qualitative tools (writing tests, questionnaire, and interview). The data were collected from several institutions around the UK, which collectively accounted for 82 questionnaire responses, 120 different writing samples from 44 intermediate learners, and six teacher interviews. The hypotheses for this research were; a) English-speaking learners of Arabic make common orthographic errors similar to those of Arabic native speakers; b) English-speaking learners share several common orthographic errors with other learners of Arabic as a second/foreign language (AFL); and c) English-speaking learners of Arabic produce their own common orthographic errors which are specifically related to the differences between the two WSs. The results confirmed all three hypotheses. Specifically, English-speaking learners of L2WS Arabic commonly made six error types: letter ductus (letter shape), orthography (spelling), phonology, letter dots, allographemes (i.e. letterform), and direction. Gemination and L1WS transfer error rates were not found to be major. Another important result showed that five letter groups in addition to two letters are particularly challenging to English-speaking learners. Study results indicated that error causes were likely to be from one of four factors: script confusion, orthographic difficulties, phonological realisation, and teaching/learning strategies. These results are generalizable as the data were collected from several institutions in different parts of the UK. Suggestions and implications as well as recommendations for further research are outlined accordingly in the conclusion chapter

    Employability and Communication Skills : Triangulating Views of Employers, Lecturers and Undergraduates

    Get PDF
    Employability skills are known as soft skills and transferrable skills. Employability refers to skills, understandings, and personal attributes that increase graduates’ chances of employment and success in their chosen occupations (Yorke, 2004). Some of the skills listed under employability skills are resourcefulness, adaptability, and flexibility which are not only needed for adapting to work situations (Curtis & McKenzie, 2002). In a VUCA (volatility, uncertainty, complexity, and ambiguity) environment, there is a limit to what universities can equip graduates with, and they need to be able to continue learning to adjust to new situations and demands. According to the Secretary’s Commission on Achieving Necessary Skills (SCANS) in the USA (1992), employability skills can be divided into four clusters of basic skills, thinking skills, personal qualities, and workplace competence. These skills would give them an edge during interviews and increase their chances of getting employed. Malaysia has been experiencing graduate unemployability. Approximately 60% of graduates remain unemployed for minimum of a year after graduation (“Graduate Employability”, 2020). There are many factors that contribute to graduate unemployability such as lack of experience, language proficiency, communication skills, problem-solving skills, and critical thinking skills (Hanapi & Nordin, 2014; Lim et al., 2016; Nooriah & Zakiah, 2017; Ooi & Ting, 2017). Employers often specify good communication skills and interpersonal skills as top requirements in job advertisements (Bakar et al., 2007; Ooi & Ting, 2017). However, graduates lack problem solving skills, communication skills (Hanapi & Nordin, 2014) and technical knowledge (Lim et al., 2016). In a knowledge-based economy, employees need to be independent and self-motivated (Menand, 2014) to acquire the necessary knowledge, information and high skill levels to cope with the fast pace of technological change. There is currently scarcity of findings on whether universities and students are preparing themselves appropriately to meet the expectations of employers. The study investigated importance of employability and communication skills based on the views of employers, lecturers and students. The research questions were: (1) how good are university students in their employability and communication skills? and (2) do employers and lecturers agree on the most important skills an effective employee should have? The descriptive study involved the use of a questionnaire on employability skills and language skills (listening and speaking, reading and writing). The items were formulated using a five-point rating scale of (1) not at all, (2) to some extent, (3) just enough, (4) to a reasonable extent, and (5) to a great extent. In addition, the questionnaire required lecturers and employers to select the top 10 skills out of the 25 skills listed. The data were collected from 123 students, 26 lecturers from a public university, and 26 employers in Sarawak, East Malaysia. The students were mostly female (74.80% female, 25.20% male) and had weak to moderate language proficiency, measured using the Malaysian University English Test (MUET). There were slightly more males among lecturers (12 female, 14 male) and employers (11 female, 15 male). The average years of work experience for lecturers was 8.7 (range: 1-25) and for employers, the average was 5.6 (range: 1-15). For the analysis, means and frequencies were calculated for comparison of the three perspectives on the importance of communication and employability skills. The results showed that there was a difference among employers, lecturers, and students in their ratings of how good university students are in their employability and communication skills. The students overrated themselves in all three set of skills. Based on the mean scores, the students rated themselves as having a moderate level of employability (M=3.74), reading and writing skills (M=3.75), and listening and speaking skills (M=3.61). The lecturers rated the university students as having a moderate level of skills as well, but the mean scores were slightly lower than the students’ (employability, M= 3.54; reading and writing skills, M=3.49; listening and speaking skills, M=3.29). To the employers, only the fresh graduates’ listening and speaking skills were moderate but on the weak side (M=3.15). The employers found the fresh graduates’ reading and writing skills (M=2.97) and listening and speaking skills (M=2.92) to be slightly weak. Interestingly, the students and lecturers rated the graduates’ employability skills to be moderate but the employers considered them to be weak. Another contrast was the students’ listening and speaking skills, which the students and lecturers considered to be the lowest level, compared to employability and reading and writing skills. However, the employers considered the fresh graduates’ listening and speaking skills to be better than the other two skills. This comparison shows that there is a mismatch in the ratings of university students’ employability and communication skills given by employers, lecturers, and students. The employers’ expectation was higher than the lecturers’. In other words, most employers expect students to be ready to handle the demands of the workforce upon graduation but sadly, most graduates fell short of their expectations. The employers may feel that they have to spoon feed the graduates on various matters upon graduation and they prefer employees who have a strong set of communication and employability skills. Next, the results on the ranking of the important skills an effective employee should have also showed a mismatch in the perspectives of employers and lecturers. To the employers, the top two skills were time management and problem-solving aptitude, both of which were employability skills. To the lecturers, the top two skills were leadership qualities and teamwork spirit, which were also employability skills. The employers prioritised skills for efficient handling of work situations to meet deadlines but the lecturers focussed on skills for the completion of group work. The mismatch shows that lecturers and universities may have overlooked the need to train students to be versatile to solve problems and complete projects on time. Indeed, students often submit work late and are not independent enough to resolve questions concerning their projects on their own, and constantly have to consult lecturers. To increase graduate employability, universities need to collaborate strategically with the industry to resolve the mismatch of expectations, as other Malaysian studies have also found a mismatch (Nadarajah, 2021; Nesaratnam et al., 2020). However, because of the fast-changing work environment, students need to develop lifelong learning skills so that they can develop their expertise, knowledge base, and a lifelong learning mindset to stay relevant. References Bakar, A. R., Mohamed, S., & Hanafi, I. (2007). Employability skills: Malaysian employers perspectives. The International Journal of Interdisciplinary Social Sciences, 2(1), 263-274. Curtis, D. D., & McKenzie, P. (2002). Employability skills for Australian industry: Literature review and framework development. http://www.voced.edu.au/content/ngv33428 Graduate employability: A priority of the Education Ministry. (2020, February 18). News Straits Times. https://www.nst.com.my/news/nation/2020/02/566731/graduate-employability-priority-education-ministry Hanapi, Z., & Nordin, M. S. (2014). Unemployment among Malaysia graduates: Graduates’ attributes, lecturers’ competency and quality of education. Procedia - Social and Behavioral Sciences, 112, 1056-1063. https://doi.org/10.1016/j.sbspro.2014.01.1269 Lim, Y. M, Teck, H. L., Ching, S. Y., & Chui, C. L. (2016). Employability skills, personal qualities, and early employment problems of entry-level auditors: Perspectives from employers, lecturers, auditors, and students. Journal of Education for Business, 91(4), 185-192. https://doi.org/10.1080/08832323.2016.1153998 Menand, H. (2014). Critical instruction, student achievement, and nurturing of global citizens: Global and comparative education in context. In S. A. Lawrence (Ed.), Critical practice in P-12 education (pp. 1-23). Hershey: Information Science Reference. Nadarajah, J. (2021). Measuring the gap in employability skills among Malaysian graduates. International Journal of Modern Trends in Social Sciences, 4(15), 81-87. https://doi.org/10.35631/IJMTSS.415007 Nesaratnam, S., Salleh, W. H. W., Foo, Y. V., Hisham, W. M. W. S. W. (2020). Enhancing English proficiency and communication skills among Malaysian graduates through training and coaching. International Journal of Learning and Development, 10(4), 1-12. https://doi.org/10.5296/ijld.v10i4.17875 Nooriah, Y., & Zakiah, J. (2017). Development of graduates employability: The role of university and challenges. Jurnal Personalia Pelajar, 20, 15-32. Ooi, K. B., & Ting, S. H. (2015). Employers’ emphasis on technical skills and soft skills in job advertisements. The English Teacher, 44(1), 1-12. Secretary’s Commission on Achieving Necessary Skills (SCANS) (1992). Learning a living: a blueprint for high performance. A SCANS report for America 2000. Washington: U.S. Department of Labour. Yorke, M. (2004). Employability in higher education: what it is – what it is not. York: The Higher Education Academy/ESECT

    Consequences of bi-literacy in bilingual individuals: in the healthy and neurologically impaired

    Get PDF
    Background. In the current global, cross-cultural scenario, being bilingual or multilingual is a norm rather than an exception. In such an environment an individual may be actively involved in reading and writing in all their languages in addition to speaking them. Regular use of two or more languages is termed as bilingualism and being able to read and write in both of them is referred to as bi-literacy. Research indicates that bilingualism has an impact on language production and cognition, specifically executive functions. Given the impact of literacy and bilingualism, the reasonable question that arises, is whether bi-literacy would offer an additional impact on language production and cognition. This becomes even more relevant in a multilingual, multi-cultural society such as India. We examined the impact of bi-literacy on oral language production (at word and connected speech level), comprehension and on non-verbal executive function measures in bi-literate bilingual healthy adults in an immigrant diaspora living in the UK. In addition to English, they were speakers of one of the South Indian languages (Kannada, Malayalam, Tamil and Telugu). The significance of bi-literacy among bilinguals assumes further importance in aphasia (language impairment due to brain damage). For those who have aphasia in one or more languages due to brain damage, the severity of impairment maybe different in both languages, also the modalities of language may be differentially affected. In particular, reading and writing maybe impaired differently in the languages used by a bi/multilingual. Manifestation of reading impairments are also dependent on the nature of the script of the language being read [e.g., Raman & Weekes (2005) report differential dyslexia in a Turkish-English speaker who exhibited surface dyslexia in English and deep dysgraphia in Turkish]. Our study contributes to the field of bilingual aphasia by focusing specifically on reading differing from the existing literature of aphasia in bilinguals, where the focus has predominantly been on language production and comprehension. Studying reading impairments provides a better understanding of how the reading impairments are manifested in the two languages, which will aid appropriate assessment and intervention. This research investigated the impact of bi-literacy in both populations (healthy adults and neurologically impaired) in two phases: Phase I (in UK) and Phase II (in India). Aim. Phase I investigated the impact of bi-literacy on oral language production (at word level and connected speech), comprehension and non-verbal executive function in bi-literate bilingual healthy adults. Phase II examined the reading impairments in two languages of bilingual persons with aphasia (BPWA). Methods. For Phase I, participants were thirty-four bi-literate bilingual healthy adults with English as their L2 and one of the Dravidian languages (Kannada, Malayalam, Tamil and Telugu) as their L1. We have used the term ‘print exposure’ as a proxy for literacy. They were divided into a high print exposure (HPE, n=22) and a low print exposure (LPE, n=12) group based on their performance on two tasks measuring L2 print exposure- grammaticality judgement task and sentence verification task. We also quantified their bilingual characteristics- proficiency, reading and writing characteristics and dominance. The groups were matched on years of education, age and gender. Participants completed a set of oral language production tasks in L2 (at word level) namely -verbal fluency, word and non-word repetition; comprehension tasks in L2 namely synonymy triplets task and sentence comprehension task (Chapter 2); oral narrative task in L2 (at connected speech level) (Chapter 3) followed by non-verbal executive function tasks tapping into inhibitory control (Spatial Stroop and Flanker tasks), working memory (visual n-back and auditory n-back) and task switching (colour-shape task) (Chapter 4). For Phase II, we characterized the reading abilities of four BPWA who spoke one of the Dravidian languages (Kannada, Tamil, Telugu) (alpha-syllabic) as their L1 and English (alphabetic) as their L2. We quantified their bilingual characteristics- proficiency, reading and writing characteristics and dominance. Subtests from the Psycholinguistic Assessment of Language Processing in Aphasia (PALPA; Kay, Lesser & Coltheart, 1992) were used to document the reading profile of BPWA in English and reading subtests from Reading Acquisition Profile (RAP-K; Rao, 1997) and words from Bilingual Aphasia test -Hindi (BAT; Paradis & Libben, 1987) were used to document the reading profile of BPWA in Kannada and Hindi respectively. Findings. Based on the findings of Phase I (i.e., results from Chapter 2-4), we found prominent differences between HPE and LPE on comprehension measures (synonymy triplets and sentence comprehension tasks). This is in contrast to the results observed in monolingual adults, were semantics is less impacted by print exposure. Moreover, our predictions that HPE will result in better oral language production skills were borne out in specific conditions-semantic fluency and non-word repetition task (at word level) and higher number of words in the narrative, higher verbs per utterance and fewer repetitions (at connected speech level). In addition, the non-verbal executive functions, we found no direct link between print exposure (in L2) and non-verbal executive functions in bi-literate bilinguals excepting working memory (auditory N-back task). Additionally, another consistency in our findings is that there seems to be a strong link between print exposure and semantic processing in our research. The findings on the semantic tasks have been consistent across comprehension (synonymy triplets task and sentence comprehension task) and production (semantic fluency) favouring HPE. The findings from Phase II (Chapter 5) reveal differences of reading characteristics in the two languages (with different scripts) of the four BPWA. This research provides preliminary evidence that a script related difference exists in the manifestation of dyslexia in bi-scriptal BPWA speaking a combination of alphabetic and alpha-syllabic languages. Conclusions. Our research contributes to the existing literature by highlighting the relationship between bi-literacy and language production, comprehension and non-verbal cognition where bi-literacy seems to have a higher impact on language than cognition. The contrary findings from the monolinguals and children literature, highlight the importance for considering nuances of bilingual research and specifically challenges the notion that semantic comprehension is not significantly affected by literacy. In the neurologically impaired population, our research provides a comprehensive profiling of reading abilities in BPWA in the Indian population with languages having different scripts. Using this profiling and classification, we are able to affirm the findings previously found in literature emphasizing the importance of script in the assessment of reading abilities in BPWA. Such profiling and classification assist in the development of bilingual models of reading aloud and classifying different types of reading impairments

    Incorporating Weak Statistics for Low-Resource Language Modeling

    Get PDF
    Automatic speech recognition (ASR) requires a strong language model to guide the acoustic model and favor likely utterances. While many tasks enjoy billions of language model training tokens, many domains which require ASR do not have readily available electronic corpora.The only source of useful language modeling data is expensive and time-consuming human transcription of in-domain audio. This dissertation seeks to quickly and inexpensively improve low-resource language modeling for use in automatic speech recognition. This dissertation first considers efficient use of non-professional human labor to best improve system performance, and demonstrate that it is better to collect more data, despite higher transcription error, than to redundantly transcribe data to improve quality. In the process of developing procedures to collect such data, this work also presents an efficient rating scheme to detect poor transcribers without gold standard data. As an alternative to this process, automatic transcripts are generated with an ASR system and explore efficiently combining these low-quality transcripts with a small amount of high quality transcripts. Standard n-gram language models are sensitive to the quality of the highest order n-gram and are unable to exploit accurate weaker statistics. Instead, a log-linear language model is introduced, which elegantly incorporates a variety of background models through MAP adaptation. This work introduces marginal class constraints which effectively capture knowledge of transcriber error and improve performance over n-gram features. Finally, this work constrains the language modeling task to keyword search of words unseen in the training text. While overall system performance is good, these words suffer the most due to a low probability in the language model. Semi-supervised learning effectively extracts likely n-grams containing these new keywords from a large corpus of audio. By using a search metric that favors recall over precision, this method captures over 80% of the potential gain
    corecore