Search CORE

155 research outputs found

Proceedings of the 22nd Amsterdam Colloquium

Author
Publication venue: ILLC, UvA
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

Recommended from our members

Making Worlds Accessible. Essays in Honor of Angelika Kratzer

Author: Bhatt Rajesh
Frana Ilaria
Menéndez-Benito Paula
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

Every linguist knows how colossal Angelika’s impact on our field is. Hearing aboutthis would not be informative for anybody who might (virtually) pick up this volume, including Angelika herself. So, instead of writing about, say, Angelika’s crucial role in the development of our understanding of modality, we will write about what Angelika means to us, as a teacher, advisor, mentor, colleague, and friend. We know that these words will resonate with many of you (Angelika has meant so much to so many people). We just get to be the lucky ones to tell Angelika publicly.https://scholarworks.umass.edu/ak_festsite_schrift/1000/thumbnail.jp

ScholarWorks@UMass Amherst

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Anaphora resolution for Arabic machine translation :a case study of nafs

Author: Hamouda Wafya
Publication venue: Newcastle Univeristy
Publication date: 01/01/2014
Field of study

PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

Newcastle University eTheses

The acquisition and use of Mandarin relative clauses by monolingual and bilingual children and adults

Author: Zhang Shijie
Publication venue: Lancaster University
Publication date: 01/01/2022
Field of study

Children have been found to understand and use relative clauses (RCs) at an early age. However, not all types of RCs are acquired at the same time, and are used with the same frequency (e.g., Diessel & Tomasello, 2000, 2005). Using corpus-based and experimental methodologies, the three studies presented in this thesis investigate the acquisition and processing of different types of RCs in Mandarin, aiming to understand the mechanisms involved in the acquisition and processing of RC involving varying degrees of complexity. The first study (Chapter 3) presents a corpus analysis examining the naturalistic production of Mandarin RCs by Mandarin-speaking monolingual and heritage MandarinEnglish bilingual children (1;00-5;00). The results show that both monolingual and bilingual children produce more object RCs than subject RCs in Mandarin. This is because Mandarin object RCs resemble simple Subject-Verb-Object (SVO) sentences the children had previously acquired, and occur more frequently than subject RCs in their input. Compared to monolingual children, bilingual children produce more object RCs, suggesting that the acquisition of Mandarin RCs is not only facilitated by SVO transitives in Mandarin, but also SVO transitives in English. In contrast to the first study, the second study (Chapter 4) reports a subject RC advantage by looking at the comprehension of Mandarin subject and object RCs in heritage Mandarin-English bilingual children (4;00-10;11) and their vocabulary-matched monolingual peers (4;00-5;09). Using a character-sentence matching task, the results reveal that simple SVO transitives hinder children’s comprehension of Mandarin object RCs by misleading them to interpret the noun phrase occurring first as the head noun. Compared to monolingual children, bilingual children who are more English dominant make this type of error more frequently for Mandarin object RCs, suggesting that both English SVO transitives and language dominance contribute to cross-linguistic influence. However, unlike either the subject or object RC advantage shown in children, mixed results are found in the writing of adult Mandarin native speakers (L1) and advanced second language learners (L2) in the third study (Chapter 5). Using conditional inference trees and random forests, the results show that both adult Mandarin L1 and L2 speakers’ selection of subject and object RCs heavily depends on the discourse context that RCs are situated in. The first and second studies (Chapters 3 and 4) are novel in taking Mandarin RCs with omitted head nouns into account. In spontaneous speech (Chapter 3), the results indicate that monolingual and bilingual children as young as two can produce Mandarin RCs with omitted head nouns, and the omission of a head noun does not influence the subject-object asymmetry. Similarly, the absence of a head noun does not influence monolingual and bilingual children’s comprehension of Mandarin RCs (Chapter 4), suggesting that they are able to recover omitted head nouns from the context provided. In addition, the first and third studies (Chapters 3 and 5) also examine the matrixclause positions in which Mandarin RCs tend to occur. RCs that occur in the non-centreembedded matrix-clause position (e.g., The goat saw the horse [that hugged the pig]) are expected to be easier to process than RCs in the centre-embedded matrix-clause position (e.g., The horse [that hugged the pig] saw the goat), as they require lower working memory load (e.g., Gibson, 1998, 2000). Supporting this assumption, in adult Mandarin L1 and L2 speakers’ writing (Chapter 5), non-centre-embedded RCs occur more often than centreembedded RCs. Moreover, the longer the RCs, the higher the possibility they are placed in the non-centre-embedded matrix-clause position. However, in children’s spontaneous speech (Chapter 3), both monolingual and bilingual children do not show a tendency to prefer noncentre-embedded over centre-embedded RCs, which may relate to the short length of the RCs they produce. The shorter the RCs, the less memory load is needed to process centre-embedded RCs, and therefore the disadvantage of centre-embedded RCs may diminish. The three studies of this thesis present mixed findings regarding Mandarin RC processing, but consistently provide evidence to support the usage-based account. That is, the processing of RCs is shaped by an individual’s age and language experience, including input frequency, the related structures that have been acquired, language dominance and the discourse contexts that RCs tend to appear in

Lancaster E-Prints

The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

Author
Publication venue: Joint Conference on Language Evolution (JCoLE)
Publication date: 01/01/2022
Field of study

MPG.PuRe

The future of dialects: Selected papers from Methods in Dialectology XV

Traditional dialects have been encroached upon by the increasing mobility of their speakers and by the onslaught of national languages in education and mass media. Typically, older dialects are “leveling” to become more like national languages. This is regrettable when the last articulate traces of a culture are lost, but it also promotes a complex dynamics of interaction as speakers shift from dialect to standard and to intermediate compromises between the two in their forms of speech. Varieties of speech thus live on in modern communities, where they still function to mark provenance, but increasingly cultural and social provenance as opposed to pure geography. They arise at times from the need to function throughout the different groups in society, but they also may have roots in immigrants’ speech, and just as certainly from the ineluctable dynamics of groups wishing to express their identity to themselves and to the world. The future of dialects is a selection of the papers presented at Methods in Dialectology XV, held in Groningen, the Netherlands, 11-15 August 2014. While the focus is on methodology, the volume also includes specialized studies on varieties of Catalan, Breton, Croatian, (Belgian) Dutch, English (in the US, the UK and in Japan), German (including Swiss German), Italian (including Tyrolean Italian), Japanese, and Spanish as well as on heritage languages in Canada

Language Science Press