1,605 research outputs found

    Improving Cross-Lingual Transfer Learning for Event Detection

    Get PDF
    The widespread adoption of applications powered by Artificial Intelligence (AI) backbones has unquestionably changed the way we interact with the world around us. Applications such as automated personal assistants, automatic question answering, and machine-based translation systems have become mainstays of modern culture thanks to the recent considerable advances in Natural Language Processing (NLP) research. Nonetheless, with over 7000 spoken languages in the world, there still remain a considerable number of marginalized communities that are unable to benefit from these technological advancements largely due to the language they speak. Cross-Lingual Learning (CLL) looks to address this issue by transferring the knowledge acquired from a popular, high-resource source language (e.g., English, Chinese, or Spanish) to a less favored, lower-resourced target language (e.g., Urdu or Swahili). This dissertation leverages the Event Detection (ED) sub-task of Information Extraction (IE) as a testbed and presents three novel approaches that improve cross-lingual transfer learning from distinct perspectives: (1) direct knowledge transfer, (2) hybrid knowledge transfer, and (3) few-shot learning

    Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis

    Full text link
    This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios.Comment: 10 pages, 9 figure

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut fĂŒr Informationswissenschaft und Sprachtechnologie of UniversitĂ€t Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

    Advances in automatic terminology processing: methodology and applications in focus

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.The information and knowledge era, in which we are living, creates challenges in many fields, and terminology is not an exception. The challenges include an exponential growth in the number of specialised documents that are available, in which terms are presented, and the number of newly introduced concepts and terms, which are already beyond our (manual) capacity. A promising solution to this ‘information overload’ would be to employ automatic or semi-automatic procedures to enable individuals and/or small groups to efficiently build high quality terminologies from their own resources which closely reflect their individual objectives and viewpoints. Automatic terminology processing (ATP) techniques have already proved to be quite reliable, and can save human time in terminology processing. However, they are not without weaknesses, one of which is that these techniques often consider terms to be independent lexical units satisfying some criteria, when terms are, in fact, integral parts of a coherent system (a terminology). This observation is supported by the discussion of the notion of terms and terminology and the review of existing approaches in ATP presented in this thesis. In order to overcome the aforementioned weakness, we propose a novel methodology in ATP which is able to extract a terminology as a whole. The proposed methodology is based on knowledge patterns automatically extracted from glossaries, which we considered to be valuable, but overlooked resources. These automatically identified knowledge patterns are used to extract terms, their relations and descriptions from corpora. The extracted information can facilitate the construction of a terminology as a coherent system. The study also aims to discuss applications of ATP, and describes an experiment in which ATP is integrated into a new NLP application: multiplechoice test item generation. The successful integration of the system shows that ATP is a viable technology, and should be exploited more by other NLP applications

    The efficacy of a language intervention on the acquisition of past tense in children with Down syndrome

    Get PDF
    Background: Individuals with Down syndrome (DS) experience difficulties with receptive and expressive grammar and specifically morphosyntax. Despite these difficulties, there have been few studies to evaluate the effectiveness of intervention and limited evidence of generalisation to untaught items. / Aim: To evaluate the efficacy of a language intervention on the acquisition of the regular simple past tense (RSPT) in children with DS aged 7-11 years and to explore whether any gains in the use of this grammatical rule will generalise. / Method: A randomised controlled trial evaluated a 10-week intervention, using explicit and implicit methods, designed for children with DS. Fifty-two children with DS aged 7-11 years were randomly allocated into two groups: 1) intervention group and 2) delayed intervention group. All children were assessed at three timepoints: preintervention (t1), after the intervention group had received the intervention (t2), and 12-14 weeks later (after the delayed intervention group had received the intervention) (t3). The intervention was delivered by trained teaching assistants (TAs) in daily 20-minute sessions. / Results: The intervention group made significantly greater gains at t2 on a composite measure of the use of the RSPT (d=1.63). These gains were maintained 12-14 weeks later at t3 when the delayed intervention group also made similar gains. The use of the RSPT generalised to untaught regular verbs. In addition, the children made errors of overregularisation on irregular verbs demonstrating they had learnt the grammatical rule. Generalisation to other tense morphemes (e.g., the third person singular) did not occur. / Conclusions: An intervention, using explicit and implicit methods, was successful in teaching children with DS to use a grammatical rule. Furthermore, the children were able to generalise this rule to untaught items. This provides evidence for intervention targeting morphosyntax and the feasibility of training TAs to deliver this intervention

    Figurative Language Detection using Deep Learning and Contextual Features

    Get PDF
    The size of data shared over the Internet today is gigantic. A big bulk of it comes from postings on social networking sites such as Twitter and Facebook. Some of it also comes from online news sites such as CNN and The Onion. This type of data is very good for data analysis since they are very personalized and specific. For years, researchers in academia and various industries have been analyzing this type of data. The purpose includes product marketing, event monitoring, and trend analysis. The highest usage for this type of analysis is to find out the sentiments of the public about a certain topic or product. This field is called sentiment analysis. The writers of such posts have no obligation to stick to only literal language. They also have the freedom to use figurative language in their publications. Hence, online posts can be categorized into two: Literal and Figurative. Literal posts contain words or sentences that are direct or straight to the point. On the contrary, figurative posts contain words, phrases, or sentences that carry different meanings than usual. This could flip the whole polarity of a given post. Due to this nature, it can jeopardize sentiment analysis works that focus primarily on the polarity of the posts. This makes figurative language one of the biggest problems in sentiment analysis. Hence, detecting it would be crucial and significant. However, the study of figurative language detection is non-trivial. There have been many existing works that tried to execute the task of detecting figurative language correctly, with different methodologies used. The results are impressive but still can be improved. This thesis offers a new way to solve this problem. There are essentially seven commonly used figurative language categories: sarcasm, metaphor, satire, irony, simile, humor, and hyperbole. This thesis focuses on three categories. The thesis aims to understand the contextual meaning behind the three figurative language categories, using a combination of deep learning architecture with manually extracted features and explore the use of well know machine learning classifiers for the detection tasks. In the process, it also aims to describe a descending list of features according to the importance. The deep learning architecture used in this work is Convolutional Neural Network, which is combined with manually extracted features that are carefully chosen based on the literature and understanding of each figurative language. The findings of this work clearly showed improvement in the evaluation metrics when compared to existing works in the same domain. This happens in all of the figurative language categories, proving the framework’s possession of quality

    Phonological and phonetic factors affecting the early consonantal development in Setswana

    Get PDF
    This dissertation focuses on the phonological and phonetic development of three typically developing children of age ranging between 1;10 and 3;02 who are learning Setswana as their first language. We provide a detailed analysis of these children’s early speech development patterns, with a primary focus on the potential origins of these patterns. The aim is not to provide normative data, but to understand early patterns of phonological development in Setswana, whose acquisition by young children is relatively under-documented within the literature. Our data display the following trends: (1) early acquisition of obstruent stops, nasals, and N̩ C sequences: (2) production of fricatives through various substitution patterns (e.g. stopping, affrication as well as debuccalization); (3) simplification of target affricates (e.g. deaffrication, deaspiration and delabialization). Non-lateral affricates also yielded fewer errors (and earlier mastery) than their lateral counterparts, whose production displayed patterns of delateralization and velarization to velar [k], in addition to deaffrication. The target approximants |j, w| and |l, r| were generally acquired early, with the exception of the rhotic |r|, whose production was the most variable of all consonants documented in this study, also characterized by the lowest accuracy rates for all the children. We analyze these phenomena through current models of phonological emergence (MacWhinney 2015), as conceived within the area of phonology through the A-map model (McAllister Byun, Inkelas & Rose 2016). We highlight how the substitution patterns observed in the data can be captured through a consideration of the auditory properties of the target speech sounds, combined with an understanding of the types of articulatory gestures involved in the production of these sounds. These considerations in turn highlight some of the most central aspects of the challenges faced by the child toward learning these auditory-articulatory mappings. Beyond theoretical issues, this dissertation sets an initial foundation towards developing speech-language pathology materials and services for Setswana learning children, an emerging area of public service in Botswana

    A computer-assisted pproach to the comparison of mainland southeast Asian languages

    Get PDF
    This cumulative thesis is based on three separate projects based on a computer-assisted language comparison (CALC) framework to address common obstacles to studying the history of Mainland Southeast Asian (MSEA) languages, such as sparse and non-standardized lexical data, as well as an inadequate method of cognate judgments, and to provide caveats to scholars who will use Bayesian phylogenetic analysis. The first project provides a format that standardizes the sound inventories, regulates language labels, and clarifies lexical items. This standardized format allows us to merge various forms of raw data. The format also summarizes information to assist linguists in researching the relatedness among words and inferring relationships among languages. The second project focuses on increasing the transparency of lexical data and cognate judg- ments with regard to compound words. The method enables the annotation of each part of a word with semantic meanings and syntactic features. In addition, four different conversion methods were developed to convert morpheme cognates into word cognates for input into the Bayesian phylogenetic analysis. The third project applies the methods used in the first project to create a workflow by merging linguistic data sets and inferring a language tree using a Bayesian phylogenetic algorithm. Further- more, the project addresses the importance of integrating cross-disciplinary studies into historical linguistic research. Finally, the methods we proposed for managing lexical data for MSEA languages are discussed and summarized in six perspectives. The work can be seen as a milestone in reconstructing human prehistory in an area that has high linguistic and cultural diversity
    • 

    corecore