529 research outputs found

    The Grammar of Articles use in Mozambican Portuguese-accented English

    Get PDF
    Abstract English was introduced in formal education in Mozambique in the late 1990s in eighth grade. It is an important language in Mozambique because gives access to new technologies. It is also capital for social, and economic mobility. Proficiency in English is demonstrated mostly in writing. For this reason, articles misuses are taken seriously. Lack of mastery of articles can affect the learner’s academic progress in entrance exams and scores at least 50% to progress to the next level. This is the main reason why this thesis is devoted to articles usage. A total of 64 questionnaires about article usage were administered to 34 college students and 30 high school students. A total of 7808 tokens of articles usage with both noun phrases and acronyms were collected: 1792 on the indefinite article \u3c a \u3e, 1536 on \u3c an \u3e, 3584 on the definite \u3c the \u3e, and 896 on instances where no article is required. Independent samples T-tests with an alpha level of p\u3c .05 revealed significant statistical differences in articles misuses, except with the indefinite article \u3c an \u3e, with (t = .336, df = 62, p = .738). Overall, high school students slightly outperformed college students. However, the differences in the level of accuracy were less than 5% in all four categories under analysis. These findings have important pedagogical implications regarding the best ways to teach articles to Mozambican students. Throughout the thesis, explanations for these errors and ways to improve students’ mastery of English are discussed

    Data-driven machine translation for sign languages

    Get PDF
    This thesis explores the application of data-driven machine translation (MT) to sign languages (SLs). The provision of an SL MT system can facilitate communication between Deaf and hearing people by translating information into the native and preferred language of the individual. We begin with an introduction to SLs, focussing on Irish Sign Language - the native language of the Deaf in Ireland. We describe their linguistics and mechanics including similarities and differences with spoken languages. Given the lack of a formalised written form of these languages, an outline of annotation formats is discussed as well as the issue of data collection. We summarise previous approaches to SL MT, highlighting the pros and cons of each approach. Initial experiments in the novel area of example-based MT for SLs are discussed and an overview of the problems that arise when automatically translating these manual-visual languages is given. Following this we detail our data-driven approach, examining the MT system used and modifications made for the treatment of SLs and their annotation. Through sets of automatically evaluated experiments in both language directions, we consider the merits of data-driven MT for SLs and outline the mainstream evaluation metrics used. To complete the translation into SLs, we discuss the addition and manual evaluation of a signing avatar for real SL output

    Polarity-item "anything" in L3 English : Where does transfer come from when the L1 is Catalan and the L2 is Spanish?

    Get PDF
    This study explores the source of transfer in third language (L3) English by two distinct groups of Catalan–Spanish bilinguals, simultaneous bilinguals and late bilinguals. Our study addresses two research questions: (1) Does transfer come from the first language (L1), the second language (L2), or both? and (2) Does age of acquisition of the L2 affect how transfer occurs? We compare beginner and advanced English speakers from both L3 groups with beginner and advanced L1-Spanish L2-English speakers, and find that, on an acceptablity judgment task that investigates knowledge of the distribution of polarity item anything, the two L3 groups demonstrate a different response pattern from the L2 group. The results suggest that both L3 groups transfer from Catalan, and not from their L2, Spanish. Additionally, the cross-sectional nature of the study shows that negative transfer from the initial stages of acquisition is overcome to different extents by the L3 vs. the L2 groups. We conclude that the results show strong evidence against the L2 status factor (Bardel and Falk, 2007, 2012) and the cumulative enhancement (Flynn et al., 2004) models of L3 acquisition, while they can be accounted for by the typological primacy model (Rothman, 2010, 2011, 2015), although other models that predict L1 transfer in L3 acquisition are not ruled out. Further, our findings show no effect of age of acquisition of the L2 on L3 development

    Computational acquisition of knowledge in small-data environments: a case study in the field of energetics

    Get PDF
    The UK’s defence industry is accelerating its implementation of artificial intelligence, including expert systems and natural language processing (NLP) tools designed to supplement human analysis. This thesis examines the limitations of NLP tools in small-data environments (common in defence) in the defence-related energetic-materials domain. A literature review identifies the domain-specific challenges of developing an expert system (specifically an ontology). The absence of domain resources such as labelled datasets and, most significantly, the preprocessing of text resources are identified as challenges. To address the latter, a novel general-purpose preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The effectiveness of the pipeline is evaluated. Examination of the interface between using NLP tools in data-limited environments to either supplement or replace human analysis completely is conducted in a study examining the subjective concept of importance. A methodology for directly comparing the ability of NLP tools and experts to identify important points in the text is presented. Results show the participants of the study exhibit little agreement, even on which points in the text are important. The NLP, expert (author of the text being examined) and participants only agree on general statements. However, as a group, the participants agreed with the expert. In data-limited environments, the extractive-summarisation tools examined cannot effectively identify the important points in a technical document akin to an expert. A methodology for the classification of journal articles by the technology readiness level (TRL) of the described technologies in a data-limited environment is proposed. Techniques to overcome challenges with using real-world data such as class imbalances are investigated. A methodology to evaluate the reliability of human annotations is presented. Analysis identifies a lack of agreement and consistency in the expert evaluation of document TRL.Open Acces

    A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora

    Get PDF
    Statistical and rule-based methods are complementary approaches to machine translation (MT) that have different strengths and weaknesses. This complementarity has, over the last few years, resulted in the consolidation of a growing interest in hybrid systems that combine both data-driven and linguistic approaches. In this paper, we address the situation in which the amount of bilingual resources that is available for a particular language pair is not sufficiently large to train a competitive statistical MT system, but the cost and slow development cycles of rule-based MT systems cannot be afforded either. In this context, we formalise a new method that uses scarce parallel corpora to automatically infer a set of shallow-transfer rules to be integrated into a rule-based MT system, thus avoiding the need for human experts to handcraft these rules. Our work is based on the alignment template approach to phrase-based statistical MT, but the definition of the alignment template is extended to encompass different generalisation levels. It is also greatly inspired by the work of Sánchez-Martínez and Forcada (2009) in which alignment templates were also considered for shallow-transfer rule inference. However, our approach overcomes many relevant limitations of that work, principally those related to the inability to find the correct generalisation level for the alignment templates, and to select the subset of alignment templates that ensures an adequate segmentation of the input sentences by the rules eventually obtained. Unlike previous approaches in literature, our formalism does not require linguistic knowledge about the languages involved in the translation. Moreover, it is the first time that conflicts between rules are resolved by choosing the most appropriate ones according to a global minimisation function rather than proceeding in a pairwise greedy fashion. Experiments conducted using five different language pairs with the free/open-source rule-based MT platform Apertium show that translation quality significantly improves when compared to the method proposed by Sánchez-Martínez and Forcada (2009), and is close to that obtained using handcrafted rules. For some language pairs, our approach is even able to outperform them. Moreover, the resulting number of rules is considerably smaller, which eases human revision and maintenance.Research funded by Universitat d’Alacant through project GRE11-20, by the Spanish Ministry of Economy and Competitiveness through projects TIN2009-14009-C02-01 and TIN2012-32615, by Generalitat Valenciana through grant ACIF/2010/174, and by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement PIAP-GA-2012-324414 (Abu-MaTran)

    Resource Generation from Structured Documents for Low-density Languages

    Get PDF
    The availability and use of electronic resources for both manual and automated language related processing has increased tremendously in recent years. Nevertheless, many resources still exist only in printed form, restricting their availability and use. This especially holds true in low density languages or languages with limited electronic resources. For these documents, automated conversion into electronic resources is highly desirable. This thesis focuses on the semi-automated conversion of printed structured documents (dictionaries in particular) to usable electronic representations. In the first part we present an entry tagging system that recognizes, parses, and tags the entries of a printed dictionary to reproduce the representation. The system uses the consistent layout and structure of the dictionaries, and the features that impose this structure, to capture and recover lexicographic information. We accomplish this by adapting two methods: rule-based and HMM-based. The system is designed to produce results quickly with minimal human assistance and reasonable accuracy. The use of an adaptive transformation-based learning as a post-processor at two points in the system yields significant improvements, even with an extremely small amount of user provided training data. The second part of this thesis presents Morphology Induction from Noisy Data (MIND), a natural language morphology discovery framework that operates on information from limited, noisy data obtained from the conversion process. To use the resulting resources effectively, however, users must be able to search for them using the root form of morphologically deformed variant found in the text. Stemming and data driven methods are not suitable when data are sparse. The approach is based on the novel application of string searching algorithms. The evaluations show that MIND can segment words into roots and affixes from the noisy, limited data contained in a dictionary, and it can extract prefixes, suffixes, circumfixes, and infixes. MIND can also identify morphophonemic changes, i.e., phonemic variations between allomorphs of a morpheme, specifically point-of-affixation stem changes. This, in turn, allows non-native speakers to perform multilingual tasks for applications where response must be rapid, and they have limited knowledge. In addition, this analysis can feed other natural language processing tools requiring lexicons

    A Question of Language Vitality? On Interrogatives in an Endangered Creole

    Get PDF
    This dissertation investigates the endangered French-lexifier Creole language of Louisiana, Louisiana Creole (LC). The empirical study contains a sociolinguistic and a structural component. The sociolinguistic part reveals a decline of language proficiency across the generations, a lack of intergenerational language transmission, language attrition on the individual level, and the reduction of the functional domains in which LC is used. It is shown that use of LC is restricted to private casual domains and specific interlocutors. The linguistic focus in this study is on wh-questions as a cross-linguistically well-researched and deeply structural area. Here, a number of recent changes, which are related to the endangerment of LC, and newly documented structures are found. First, the loss of forms and structures that is typically associated with language endangerment is observable in the reduction of complex interrogative pronouns and the loss of the grammatical distinctions between subject and object forms. Second, a large amount of free variation persists in the interrogative system with regard to wh-expressions, where new lexemes appear and existing items occur in new functions, and even concerning structures. In particular, speakers show varying degrees of acceptance for structures that are not traditionally grammatical in LC, such as wh-scope marking, wh in-situ, and island violations. This structural variation is independent of socio-demographic variables and functional factors only play a minor role. The most relevant variation is present between individual speakers, indicating considerable, unsystematic differences between I-grammars as are characteristic of endangered languages. They represent structural effects of language endangerment on a deep syntactic level that is generally thought to be very resistant to language change. Third, a previously undocumented type of long-distance question appears which is analyzed as a wh-copying construction in support of the copy-theory of movement
    corecore