81 research outputs found
Undergraduatesâ interest towards learning genetics concepts through integrated stemproblem based learning approach
Scientific and innovative society can be produced by giving priorities in Science, Technology, Engineering, and Mathematics (STEM) as emphasized by Malaysian Higher Education Blueprint (2015-2025). STEM need to be implemented at higher education because universities need to produce competent graduates to support economy growth and sustainable development. Learning STEM through Problem Based Learning might allow the undergraduates to become more enthusiastic when problem-based instruction is incorporated with STEM by implementing teamwork and problem-solving techniques to engage the first-year undergraduates fully with the learning. This study was conducted to investigate whether Integrated STEM Problem Based Learning module could enhance and retain the interest towards genetics concepts among first-year undergraduates. Topics in genetics was considered difficult not only to teach but also to learn. In this research, to overcome the genetic concepts learning difficulties, genetic related topics were chosen to introduce STEM through problem-based learning approach, which might help first-year undergraduates to acquire deep genetic content knowledge. This is very vital for the first-year undergraduates, as the knowledge gained in their first semester will be applied in the upcoming courses in their entire undergraduatesâ programs of study. A Pre-Experimental research design with one group-posttest design was applied. A total of 50 participants who are first-year undergraduates from Faculty of Biology from one of the public universities in Malaysia were involved. The Genetics Interest Questionnaire used to study if the STEM Problem Based Learning module could enhance and retain the interest towards genetics concepts. The research has proven that Integrated STEM through problem-based learning approach could enhance and retains the interest in learning genetics concepts among first-year undergraduates
Quantifying the Dialect Gap and its Correlates Across Languages
Historically, researchers and consumers have noticed a decrease in quality
when applying NLP tools to minority variants of languages (i.e. Puerto Rican
Spanish or Swiss German), but studies exploring this have been limited to a
select few languages. Additionally, past studies have mainly been conducted in
a monolingual context, so cross-linguistic trends have not been identified and
tied to external factors. In this work, we conduct a comprehensive evaluation
of the most influential, state-of-the-art large language models (LLMs) across
two high-use applications, machine translation and automatic speech
recognition, to assess their functionality on the regional dialects of several
high- and low-resource languages. Additionally, we analyze how the regional
dialect gap is correlated with economic, social, and linguistic factors. The
impact of training data, including related factors like dataset size and its
construction procedure, is shown to be significant but not consistent across
models or languages, meaning a one-size-fits-all approach cannot be taken in
solving the dialect gap. This work will lay the foundation for furthering the
field of dialectal NLP by laying out evident disparities and identifying
possible pathways for addressing them through mindful data collection.Comment: Accepted to EMNLP Findings 202
Recommended from our members
Machine Translation of Arabic Dialects
This thesis discusses different approaches to machine translation (MT) from Dialectal Arabic (DA) to English. These approaches handle the varying stages of Arabic dialects in terms of types of available resources and amounts of training data. The overall theme of this work revolves around building dialectal resources and MT systems or enriching existing ones using the currently available resources (dialectal or standard) in order to quickly and cheaply scale to more dialects without the need to spend years and millions of dollars to create such resources for every dialect.
Unlike Modern Standard Arabic (MSA), DA-English parallel corpora is scarcely available for few dialects only. Dialects differ from each other and from MSA in orthography, morphology, phonology, and to some lesser degree syntax. This means that combining all available parallel data, from dialects and MSA, to train DA-to-English statistical machine translation (SMT) systems might not provide the desired results. Similarly, translating dialectal sentences with an SMT system trained on that dialect only is also challenging due to different factors that affect the sentence word choices against that of the SMT training data. Such factors include the level of dialectness (e.g., code switching to MSA versus dialectal training data), topic (sports versus politics), genre (tweets versus newspaper), script (Arabizi versus Arabic), and timespan of test against training. The work we present utilizes any available Arabic resource such as a preprocessing tool or a parallel corpus, whether MSA or DA, to improve DA-to-English translation and expand to more dialects and sub-dialects.
The majority of Arabic dialects have no parallel data to English or to any other foreign language. They also have no preprocessing tools such as normalizers, morphological analyzers, or tokenizers. For such dialects, we present an MSA-pivoting approach where DA sentences are translated to MSA first, then the MSA output is translated to English using the wealth of MSA-English parallel data. Since there is virtually no DA-MSA parallel data to train an SMT system, we build a rule-based DA-to-MSA MT system, ELISSA, that uses morpho-syntactic translation rules along with dialect identification and language modeling components. We also present a rule-based approach to quickly and cheaply build a dialectal morphological analyzer, ADAM, which provides ELISSA with dialectal word analyses.
Other Arabic dialects have a relatively small-sized DA-English parallel data amounting to a few million words on the DA side. Some of these dialects have dialect-dependent preprocessing tools that can be used to prepare the DA data for SMT systems. We present techniques to generate synthetic parallel data from the available DA-English and MSA- English data. We use this synthetic data to build statistical and hybrid versions of ELISSA as well as improve our rule-based ELISSA-based MSA-pivoting approach. We evaluate our best MSA-pivoting MT pipeline against three direct SMT baselines trained on these three parallel corpora: DA-English data only, MSA-English data only, and the combination of DA-English and MSA-English data. Furthermore, we leverage the use of these four MT systems (the three baselines along with our MSA-pivoting system) in two system combination approaches that benefit from their strengths while avoiding their weaknesses.
Finally, we propose an approach to model dialects from monolingual data and limited DA-English parallel data without the need for any language-dependent preprocessing tools. We learn DA preprocessing rules using word embedding and expectation maximization. We test this approach by building a morphological segmentation system and we evaluate its performance on MT against the state-of-the-art dialectal tokenization tool
Is inverted diglossia coming to Wales? Domain use and language attitudes among Welsh-speaking youth.
Computational Language Assessment in patients with speech, language, and communication impairments
Speech, language, and communication symptoms enable the early detection,
diagnosis, treatment planning, and monitoring of neurocognitive disease
progression. Nevertheless, traditional manual neurologic assessment, the speech
and language evaluation standard, is time-consuming and resource-intensive for
clinicians. We argue that Computational Language Assessment (C.L.A.) is an
improvement over conventional manual neurological assessment. Using machine
learning, natural language processing, and signal processing, C.L.A. provides a
neuro-cognitive evaluation of speech, language, and communication in elderly
and high-risk individuals for dementia. ii. facilitates the diagnosis,
prognosis, and therapy efficacy in at-risk and language-impaired populations;
and iii. allows easier extensibility to assess patients from a wide range of
languages. Also, C.L.A. employs Artificial Intelligence models to inform theory
on the relationship between language symptoms and their neural bases. It
significantly advances our ability to optimize the prevention and treatment of
elderly individuals with communication disorders, allowing them to age
gracefully with social engagement.Comment: 36 pages, 2 figures, to be submite
Recommended from our members
Perspective Identification in Informal Text
This dissertation studies the problem of identifying the ideological perspective of people as expressed in their written text. One's perspective is often expressed in his/her stance towards polarizing topics. We are interested in studying how nuanced linguistic cues can be used to identify the perspective of a person in informal genres. Moreover, we are interested in exploring the problem from a multilingual perspective comparing and contrasting linguistics devices used in both English informal genres datasets discussing American ideological issues and Arabic discussion fora posts related to Egyptian politics. %In doing so, we solve several challenges.
Our first and utmost goal is building computational systems that can successfully identify the perspective from which a given informal text is written while studying what linguistic cues work best for each language and drawing insights into the similarities and differences between the notion of perspective in both studied languages. We build computational systems that can successfully identify the stance of a person in English informal text that deal with different topics that are determined by one's perspective, such as legalization of abortion, feminist movement, gay and gun rights; additionally, we are able to identify a more general notion of perspectiveânamely the 2012 choice of presidential candidateâas well as build systems for automatically identifying different elements of a person's perspective given an Egyptian discussion forum comment. The systems utilize several lexical and semantic features for both languages. Specifically, for English we explore the use of word sense disambiguation, opinion features, latent and frame semantics as well; as Linguistic Inquiry and Word Count features; in Arabic, however, in addition to using sentiment and latent semantics, we study whether linguistic code-switching (LCS) between the standard and dialectal forms for the language can help as a cue for uncovering the perspective from which a comment was written.
This leads us to the challenge of devising computational systems that can handle LCS in Arabic. The Arabic language has a diglossic nature where the standard form of the language (MSA) coexists with the regional dialects (DA) corresponding to the native mother tongue of Arabic speakers in different parts of the Arab world. DA is ubiquitously prevalent in written informal genres and in most cases it is code-switched with MSA. The presence of code-switching degrades the performance of almost any MSA-only trained Natural Language Processing tool when applied to DA or to code-switched MSA-DA content. In order to solve this challenge, we build a state-of-the-art systemâAIDAâto computationally handle token and sentence-level code-switching.
On a conceptual level, for handling and processing Egyptian ideological perspectives, we note the lack of a taxonomy for the most common perspectives among Egyptians and the lack of corresponding annotated corpora. In solving this challenge, we develop a taxonomy for the most common community perspectives among Egyptians and use an iterative feedback-loop process to devise guidelines on how to successfully annotate a given online discussion forum post with different elements of a person's perspective. Using the proposed taxonomy and annotation guidelines, we annotate a large set of Egyptian discussion fora posts to identify a comment's perspective as conveyed in the priority expressed by the comment, as well as the stance on major political entities
English speakers' common orthographic errors in Arabic as L2 writing system : an analytical case study
PhD ThesisThe research involving Arabic Writing System (WS) is quite limited. Yet, researching writing errors of L2WS Arabic against a certain L1WS seems to be relatively neglected. This study attempts to identify, describe, and explain common orthographic errors in Arabic writing amongst English-speaking learners. First, it outlines the Arabic Writing Systemâs (AWS) characteristics and available empirical studies of L2WS Arabic. This study embraced the Error Analysis approach, utilising a mixed-method design that deployed quantitative and qualitative tools (writing tests, questionnaire, and interview). The data were collected from several institutions around the UK, which collectively accounted for 82 questionnaire responses, 120 different writing samples from 44 intermediate learners, and six teacher interviews. The hypotheses for this research were; a) English-speaking learners of Arabic make common orthographic errors similar to those of Arabic native speakers; b) English-speaking learners share several common orthographic errors with other learners of Arabic as a second/foreign language (AFL); and c) English-speaking learners of Arabic produce their own common orthographic errors which are specifically related to the differences between the two WSs. The results confirmed all three hypotheses. Specifically, English-speaking learners of L2WS Arabic commonly made six error types: letter ductus (letter shape), orthography (spelling), phonology, letter dots, allographemes (i.e. letterform), and direction. Gemination and L1WS transfer error rates were not found to be major.
Another important result showed that five letter groups in addition to two letters are particularly challenging to English-speaking learners. Study results indicated that error causes were likely to be from one of four factors: script confusion, orthographic difficulties, phonological realisation, and teaching/learning strategies. These results are generalizable as the data were collected from several institutions in different parts of the UK. Suggestions and implications as well as recommendations for further research are outlined accordingly in the conclusion chapter
Linguistic and Cognitive Measures in Arabic-Speaking English Language Learners (ELLs) and monolingual children with and without Developmental Language Disorder (DLD)
Understanding the current level of language knowledge in English Language Learners (ELLs) can present a challenge. The standardized language tests that are commonly used to assess language tap prior knowledge and experience. ELLs may score poorly on such âknowledge-basedâ measures because of the low levels of exposure to each of their languages. Considerable overlap has been found on several knowledge-based measures (Paradis, 2010) between ELLs and monolingual children with an unexpected delay in language development known as Developmental Language Disorder (DLD). Measures of cognitive processing, on the other hand, are less dependent on ELLsâ linguistic knowledge because they employ nonlinguistic or novel stimuli to tap skills considered to underlie language learning. It has been suggested that processing-dependent tasks such as measures of verbal short-term memory may differentiate ELLs from children with DLD (Kohnert, Windsor, & Yim, 2006; Paradis, Schneider, & Duncan, 2013). This thesis presents three studies that investigated the performance of Arabic-speaking ELLs and monolingual children with and without DLD on linguistic and cognitive measures. Study 1 provided a description of the performance of monolingual Arabic-speaking children on a battery of Arabic language tests. The results of study 1 revealed that the majority of language measures were sensitive to developmental change in younger children between the ages of 6 and 7. Study 2 demonstrated lower standardized scores by ELLs on the Arabic and English knowledge-based language tasks. However, ELLs scored above or at age-level expectations on the cognitive measures, with the exception of an Arabic-nonword repetition task. Study 3 found a significant overlap between ELLs and monolingual Arabic-speaking children with DLD on first language (L1) knowledge-based measures. With the exception of the Arabic nonword repetition task, verbal short-term and working memory tasks distinguished ELLs from children with underlying language impairment. The results indicated that there is a need to develop language assessment measures that evaluate a broad range of language abilities for Arabic-speaking children. The findings also suggested that unlike knowledge-based measures, cognitive measures may be valid assessment tools that minimize the role of linguistic knowledge and experiences and help distinguish between ELLs and children with DLD
- âŠ