Search CORE

43 research outputs found

Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

Author: Ando Rie Kubota
Lee Lillian
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 10/05/2002
Field of study

Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Substring-based Machine Translation

Author: G Neubig
Graham Neubig
S Mori
Shinsuke Mori
T Kawahara
T Watanabe
Taro Watanabe
Tatsuya Kawahara
Publication venue
Publication date: 24/04/2020
Field of study

Abstract Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al (2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs

CiteSeerX

Recommended from our members

Understanding Semantic Implicit Learning through distributional linguistic patterns: A computational perspective

Author: Alikaniotis Dimitrios
Publication venue: University of Cambridge
Publication date: 27/06/2019
Field of study

The research presented in this PhD dissertation provides a computational perspective on Semantic Implicit Learning (SIL). It puts forward the idea that SIL does not depend on semantic knowledge as classically conceived but upon semantic-like knowledge gained through distributional analysis of massive linguistic input. Using methods borrowed from the machine learning and artificial intelligence literature, we construct computational models, which can simulate the performance observed during behavioural tasks of semantic implicit learning in a human-like way. We link this methodology to the current literature on implicit learning, arguing that this behaviour is a necessary by-product of efficient language processing. Chapter 1 introduces the computational problem posed by implicit learning in general, and semantic implicit learning, in particular, as well as the computational framework, used to tackle them. Chapter 2 introduces distributional semantics models as a way to learn semantic-like representations from exposure to linguistic input. Chapter 3 reports two studies on large datasets of semantic priming which seek to identify the computational model of semantic knowledge that best fits the data under conditions that resemble SIL tasks. We find that a model which acquires semantic-like knowledge gained through distributional analysis of massive linguistic input provides the best fit to the data. Chapter 4 generalises the results of the previous two studies by looking at the performance of the same models in languages other than English. Chapter 5 applies the results of the two previous Chapters on eight datasets of semantic implicit learning. Crucially, these datasets use various semantic manipulations and speakers of different L1s enabling us to test the predictions of different models of semantics. Chapter 6 examines more closely two assumptions which we have taken for granted throughout this thesis. Firstly, we test whether a simpler model based on phonological information can explain the generalisation patterns observed in the tasks. Secondly, we examine whether our definition of the computational problem in Chapter 5 is reasonable. Chapter 7 summarises and discusses the implications for implicit language learning and computational models of cognition. Furthermore, we offer one more study that seeks to bridge the literature on distributional models of semantics to `deeper' models of semantics by learning semantic relations. There are two main contributions of this dissertation to the general field of implicit learning research. Firstly, we highlight the superiority of distributional models of semantics in modelling unconscious semantic knowledge. Secondly, we question whether `deep' semantic knowledge is needed to achieve above chance performance in SIIL tasks. We show how a simple model that learns through distributional analysis of the patterns found in the linguistic input can match the behavioural results in different languages. Furthermore, we link these models to more general problems faced in psycholinguistics such as language processing and learning of semantic relations.Alexandros Onassis Foundatio

Apollo (Cambridge)

Language Learning as Language Use: A Cross-Linguistic Model of Child Language Development

Author: Christiansen Morten H
McCauley Stewart M
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2019
Field of study

University of Liverpool Repository

A cross-cultural comparison of evaluation in classical concert reviews in British and Hong Kong newspapers

Author: Ha Fong Wa
Publication venue
Publication date: 01/01/2017
Field of study

The present study investigates the rhetorical acts employed in classical concert reviews (CR) in British English and Hong Kong Chinese newspapers. It focuses on the uses of praise and criticism of different strength levels, targeting various aspects of the concert. It also explores the views of British and Hong Kong music critics on writing CRs, and factors which might affect their evaluation. This study adopted a mixed-method approach which consisted of textual analyses of CRs and semi-structural interviews with music critics. Drawing on a modified version of Hyland’s (2000) framework for evaluation in academic book reviews, 150 CRs selected from each language were examined in terms of dimensions and structural patterns of evaluation, and types of praise and criticism differentiated by their strengths. Semi-structured interviews were conducted with 14 British critics and 12 Hong Kong critics, which revealed their evaluative styles and factors that might affect their evaluation. Textual analysis results indicated more similarities than differences cross-culturally. Both groups were predominantly evaluative and contained more praise than criticism; more CRs opened and closed positively; evaluation focused primarily on performance; praise was less mitigated than criticism; Booster was the most frequently applied strategy to emphasise praise and criticism; Hedge was the predominant evaluation strategy, though each group also had their own favoured individual strategies to mitigate praise and criticism. Cross-cultural differences were observed upon more detailed examination. Chinese reviews contained more rhetorical acts while English reviews praised more. More English reviews were framed with praise. Only Chinese reviews commented on Concert Management. Interview results showed that British and Hong Kong critics shared more common than different views on evaluation. Cross-cultural differences were nevertheless observed concerning their understanding of the role of the critic and consideration for the readers. In closing, a range of implications regarding the analysis and teaching of evaluation were presented

University of Essex Research Repository

Reservoir SMILES: Towards SensoriMotor Interaction of Language and Embodiment of Symbols with Reservoir Architectures

Author: Hinaut Xavier
Publication venue: HAL CCSD
Publication date: 16/11/2022
Field of study

Language involves several hierarchical levels of abstraction. Most models focus on a particular level of abstraction making them unable to model bottom-up and top-down processes. Moreover, we do not know how the brain grounds symbols to perceptions and how these symbols emerge throughout development. Experimental evidence suggests that perception and action shape one-another (e.g. motor areas activated during speech perception) but the precise mechanisms involved in this action-perception shaping at various levels of abstraction are still largely unknown. My previous and current work include the modelling of language comprehension, language acquisition with a robotic perspective, sensorimotor models and extended models of Reservoir Computing to model working memory and hierarchical processing. I propose to create a new generation of neural-based computational models of language processing and production; to use biologically plausible learning mechanisms relying on recurrent neural networks; create novel sensorimotor mechanisms to account for action-perception shaping; build hierarchical models from sensorimotor to sentence level; embody such models in robots

INRIA a CCSD electronic archive server

Urban Studies

Author
Publication venue: 'Informa UK Limited'
Publication date: 01/04/2020
Field of study

This work contains a selection of papers from the International Conference on Urban Studies (ICUS 2017) and is a bi-annual periodical publication containing articles on urban cultural studies based on the international conference organized by the Faculty of Humanities at the Universitas Airlangga, Indonesia. This publication contains studies on issues that become phenomena in urban life, including linguistics, literary, identity, gender, architecture, media, locality, globalization, the dynamics of urban society and culture, and urban history

Directory of Open Access Books (DOAB)