8 research outputs found
An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping Table
Named Entity Translation Equivalents extraction plays a critical role in machine translation (MT) and cross language information retrieval (CLIR). Traditional methods are often based on large-scale parallel or comparable corpora. However, the applicability of these studies is constrained, mainly because of the scarcity of parallel corpora of the required scale, especially for language pairs of Chinese and Japanese. In this paper, we propose a method considering the characteristics of Chinese and Japanese to automatically extract the Chinese-Japanese Named Entity (NE) translation equivalents based on inductive learning (IL) from monolingual corpora. The method adopts the Chinese Hanzi and Japanese Kanji Mapping Table (HKMT) to calculate the similarity of the NE instances between Japanese and Chinese. Then, we use IL to obtain partial translation rules for NEs by extracting the different parts from high similarity NE instances in Chinese and Japanese. In the end, the feedback processing updates the Chinese and Japanese NE entity similarity and rule sets. Experimental results show that our simple, efficient method, which overcomes the insufficiency of the traditional methods, which are severely dependent on bilingual resource. Compared with other methods, our method combines the language features of Chinese and Japanese with IL for automatically extracting NE pairs. Our use of a weak correlation bilingual text sets and minimal additional knowledge to extract NE pairs effectively reduces the cost of building the corpus and the need for additional knowledge. Our method may help to build a large-scale Chinese-Japanese NE translation dictionary using mono-lingual corpora
Script Effects as the Hidden Drive of the Mind, Cognition, and Culture
This open access volume reveals the hidden power of the script we read in and how it shapes and drives our minds, ways of thinking, and cultures. Expanding on the Linguistic Relativity Hypothesis (i.e., the idea that language affects the way we think), this volume proposes the “Script Relativity Hypothesis” (i.e., the idea that the script in which we read affects the way we think) by offering a unique perspective on the effect of script (alphabets, morphosyllabaries, or multi-scripts) on our attention, perception, and problem-solving. Once we become literate, fundamental changes occur in our brain circuitry to accommodate the new demand for resources. The powerful effects of literacy have been demonstrated by research on literate versus illiterate individuals, as well as cross-scriptal transfer, indicating that literate brain networks function differently, depending on the script being read. This book identifies the locus of differences between the Chinese, Japanese, and Koreans, and between the East and the West, as the neural underpinnings of literacy. To support the “Script Relativity Hypothesis”, it reviews a vast corpus of empirical studies, including anthropological accounts of human civilization, social psychology, cognitive psychology, neuropsychology, applied linguistics, second language studies, and cross-cultural communication. It also discusses the impact of reading from screens in the digital age, as well as the impact of bi-script or multi-script use, which is a growing trend around the globe. As a result, our minds, ways of thinking, and cultures are now growing closer together, not farther apart. ; Examines the origin, emergence, and co-evolution of written language, the human mind, and culture within the purview of script effects Investigates how the scripts we read over time shape our cognition, mind, and thought patterns Provides a new outlook on the four representative writing systems of the world Discusses the consequences of literacy for the functioning of the min
The Nature of Writing – A Theory of Grapholinguistics [book cover]
Cover illustration: Purgatory: Canto VII – The Rule of the Mountain from A Typographic Dante (2008) by Barrie Tullett (also displayed in Barrie Tullett, Typewriter Art: A Modern Anthology, London: Laurence King Publishing, 2014, p. 167). With kind permission by Barrie Tullett. The text is taken from Dante. The Divine Comedy, translated by Dorothy L. Sayers, HarmondsworthMiddlesex: The Penguin Classics, 1949. On the lower part of the illustration, one can read the concluding
verses of the Canto:
But now the poet was going on before;
“Forward!” said he; “look how the sun doth stand
Meridianhigh, while on the Western shore
Night sets her foot upon Morocco’s strand.
A Corpus-based Approach to the Chinese Word Segmentation
For a society based upon laws and reason, it has become too easy for us to believe
that we live in a world without them. And given that our linguistics wisdom was
originally motivated by the search for rules, it seems strange that we now consider
these rules to be the exceptions and take exceptions as the norm.
The current task of contemporary computational linguistics is to describe these
exceptions. In particular, it suffices for most language processing needs, to just
describe the argument and predicate within an elementary sentence, under the
framework of local grammar. Therefore, a corpus-based approach to the Chinese
Word Segmentation problem is proposed, as the first step towards a local grammar
for the Chinese language.
The two main issues with existing lexicon-based approaches are (a) the classification
of unknown character sequences, i.e. sequences that are not listed in
the lexicon, and (b) the disambiguation of situations where two candidate words
overlap.
For (a), we propose an automatic method of enriching the lexicon by comparing
candidate sequences to occurrences of the same strings in a manually segmented
reference corpus, and using methods of machine learning to select the optimal
segmentation for them. These methods are developed in the course of the thesis
specifically for this task. The possibility of applying these machine learning
method will be discussed in NP-extraction and alignment domain.
(b) is approached by designing a general processing framework for Chinese text,
which will be called multi-level processing. Under this framework, sentences are
recursively split into fragments, according to a language-specific, but domainindependent
heuristics. The resulting fragments then define the ultimate boundaries
between candidate words and therefore resolve any segmentation ambiguity
caused by overlapping sequences. A new shallow semantical annotation is also
proposed under the frame work of multi-level processing.
A word segmentation algorithm based on these principles has been implemented
and tested; results of the evaluation are given and compared to the performance of
previous approaches as reported in the literature.
The first chapter of this thesis discusses the goals of segmentation and introduces
some background concepts. The second chapter analyses the current state-of-theart
approach to Chinese language segmentation. Chapter 3 proposes a new corpusbased
approach to the identification of unknown words. In chapter 4, a new shallow
semantical annotation is also proposed under the framework of multi-level
processing
Design revolutions: IASDR 2019 Conference Proceedings. Volume 1: Change, Voices, Open
In September 2019 Manchester School of Art at Manchester Metropolitan University was honoured to host the bi-annual conference of the International Association of Societies of Design Research (IASDR) under the unifying theme of DESIGN REVOLUTIONS. This was the first time the conference had been held in the UK. Through key research themes across nine conference tracks – Change, Learning, Living, Making, People, Technology, Thinking, Value and Voices – the conference opened up compelling, meaningful and radical dialogue of the role of design in addressing societal and organisational challenges. This Volume 1 includes papers from Change, Voices and Open tracks of the conference
A Corpus-based Approach to the Chinese Word Segmentation
For a society based upon laws and reason, it has become too easy for us to believe
that we live in a world without them. And given that our linguistics wisdom was
originally motivated by the search for rules, it seems strange that we now consider
these rules to be the exceptions and take exceptions as the norm.
The current task of contemporary computational linguistics is to describe these
exceptions. In particular, it suffices for most language processing needs, to just
describe the argument and predicate within an elementary sentence, under the
framework of local grammar. Therefore, a corpus-based approach to the Chinese
Word Segmentation problem is proposed, as the first step towards a local grammar
for the Chinese language.
The two main issues with existing lexicon-based approaches are (a) the classification
of unknown character sequences, i.e. sequences that are not listed in
the lexicon, and (b) the disambiguation of situations where two candidate words
overlap.
For (a), we propose an automatic method of enriching the lexicon by comparing
candidate sequences to occurrences of the same strings in a manually segmented
reference corpus, and using methods of machine learning to select the optimal
segmentation for them. These methods are developed in the course of the thesis
specifically for this task. The possibility of applying these machine learning
method will be discussed in NP-extraction and alignment domain.
(b) is approached by designing a general processing framework for Chinese text,
which will be called multi-level processing. Under this framework, sentences are
recursively split into fragments, according to a language-specific, but domainindependent
heuristics. The resulting fragments then define the ultimate boundaries
between candidate words and therefore resolve any segmentation ambiguity
caused by overlapping sequences. A new shallow semantical annotation is also
proposed under the frame work of multi-level processing.
A word segmentation algorithm based on these principles has been implemented
and tested; results of the evaluation are given and compared to the performance of
previous approaches as reported in the literature.
The first chapter of this thesis discusses the goals of segmentation and introduces
some background concepts. The second chapter analyses the current state-of-theart
approach to Chinese language segmentation. Chapter 3 proposes a new corpusbased
approach to the identification of unknown words. In chapter 4, a new shallow
semantical annotation is also proposed under the framework of multi-level
processing