3 research outputs found
Study on the English corresponding unit of Chinese clause
This paper annotates the English corresponding units of Chinese clauses in Chinese-English translation and statistically analyzes them. Firstly, based on Chinese clause segmentation, we segment English target text into corresponding units (clause) to get a Chinese-to-English clause-aligned parallel corpus. Then, we annotate the grammatical properties of the English corresponding clauses in the corpus. Finally, we find the distribution characteristics of grammatical properties of English corresponding clauses by statistically analyzing the annotated corpus: There are more clauses (1631,74.41%) than sentences (561,25.59%); there are more major clauses (1719,78.42%) than subordinate clauses (473,21.58%); there are more adverbial clauses (392,82.88%) than attributive clauses (81,17.12%) and more non-defining clauses (358,75.69%) than restrictive relative clauses (115,24.31%) in subordinate clauses; and there are more simple clauses (1142,52.1%) than coordinate clauses (1050,47.9%). Β© Springer International Publishing AG 2016
Chinese-Russian Parallel Discourse Corpus: Alignment of Clauses and Statistical Analysis
ΠΠΎΡΡΡΠΏΠΈΠ»Π° Π² ΡΠ΅Π΄Π°ΠΊΡΠΈΡ 19.01.2018. ΠΡΠΈΠ½ΡΡΠ° ΠΊ ΠΏΠ΅ΡΠ°ΡΠΈ 18.04.2018.Submitted on 19 January, 2018. Accepted on 18 April, 2018.Π‘ΡΠ°ΡΡΡ ΠΏΠΎΡΠ²ΡΡΠ΅Π½Π° ΡΠΏΠΎΡΠΎΠ±Π°ΠΌ ΠΊΠΎΡΠΏΡΡΠ½ΠΎΠ³ΠΎ Π²ΡΡΠ°Π²Π½ΠΈΠ²Π°Π½ΠΈΡ ΠΈ Π°Π½Π½ΠΎΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΡΡ
ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° ΡΡΠΎΠ²Π½Π΅ ΠΊΠ»Π°ΡΠ·Ρ. Π Π°Π±ΠΎΡΠ° ΡΡΡΠΎΠΈΡΡΡ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠΎΠ·Π΄Π°Π½ΠΈΡ ΠΊΠΈΡΠ°ΠΉΡΠΊΠΎ-ΡΡΡΡΠΊΠΎΠ³ΠΎ Π΄ΠΈΡΠΊΡΡΡΠΈΠ²Π½ΠΎΠ³ΠΎ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠ³ΠΎ ΠΊΠΎΡΠΏΡΡΠ°. ΠΒ ΠΊΠ°ΡΠ΅ΡΡΠ²Π΅ ΠΏΠΈΠ»ΠΎΡΠ½ΠΎΠ³ΠΎ ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π° Π±ΡΠ» Π²ΡΠ±ΡΠ°Π½ ΠΊΠΈΡΠ°ΠΉΡΠΊΠΈΠΉ ΠΎΡΠΈΡΠΈΠ°Π»ΡΠ½ΡΠΉ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ Β«ΠΠΎΠΊΠ»Π°Π΄ ΠΎ ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ°Π²ΠΈΡΠ΅Π»ΡΡΡΠ²Π° ΠΠΠ 2017Β» ΠΈ Π΅Π³ΠΎ ΠΏΠ΅ΡΠ΅Π²ΠΎΠ΄ Π½Π° ΡΡΡΡΠΊΠΈΠΉ ΡΠ·ΡΠΊ. ΠΠ°Π½Π½Π°Ρ ΡΠ°Π±ΠΎΡΠ° ΡΠΎΡΡΠΎΠΈΡ ΠΈΠ· ΡΡΠ΅Ρ
ΠΎΠ±ΡΠΈΡ
ΡΠ°ΡΡΠ΅ΠΉ: 1) Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠ° ΠΏΡΠΈΠ½ΡΠΈΠΏΠΎΠ² Π²ΡΡΠ°Π²Π½ΠΈΠ²Π°Π½ΠΈΡ Π΄Π²ΡΡ
ΡΠ΅ΠΊΡΡΠΎΠ²; 2) ΡΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½ΠΈΠ΅ ΠΏΡΠΈΠ½ΡΠΈΠΏΠΎΠ² Π°Π½Π½ΠΎΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ Π²ΡΠ΄Π΅Π»ΡΠ΅ΠΌΡΡ
Π΅Π΄ΠΈΠ½ΠΈΡ ΠΈ ΡΠΎΠ·Π΄Π°Π½ΠΈΠ΅ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠ³ΠΎ ΠΊΠΎΡΠΏΡΡΠ°, Π² ΠΊΠΎΡΠΎΡΠΎΠΌ Π²ΡΡΠ°Π²Π½ΠΈΠ²Π°Π½ΠΈΠ΅ ΠΎΡΡΡΠ΅ΡΡΠ²Π»ΡΠ΅ΡΡΡ Π½Π° ΡΡΠΎΠ²Π½Π΅ ΠΊΠ»Π°ΡΠ·Ρ; 3) ΡΡΠ°ΡΠΈΡΡΠΈΠΊΠ° ΠΏΠΎ ΡΠ°Π·Π½ΡΠΌ ΡΠΈΠΏΠ°ΠΌ ΡΡΡΡΠΊΠΈΡ
ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΡ
Π°Π½Π°Π»ΠΎΠ³ΠΎΠ² ΠΊΠΈΡΠ°ΠΉΡΠΊΠΈΡ
ΠΊΠ»Π°ΡΠ· ΠΈ ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠ°ΡΠΈΡ ΠΏΠΎΠ»ΡΡΠ΅Π½Π½ΠΎΠΉ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ. ΠΒ ΠΏΠ΅ΡΠ²ΠΎΠΉ ΡΠ°ΡΡΠΈ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΡΡΡ, ΠΊΠ°ΠΊ Π² ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠΌ ΠΊΠΎΡΠΏΡΡΠ΅ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΡΡ Π΄Π΅Π»Π΅Π½ΠΈΠ΅ ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° ΠΊΠ»Π°ΡΠ·Ρ: ΡΠ½Π°ΡΠ°Π»Π° ΡΠ΅Π³ΠΌΠ΅Π½ΡΠ°ΡΠΈΡ ΠΎΡΡΡΠ΅ΡΡΠ²Π»ΡΠ΅ΡΡΡ Π² ΠΊΠΈΡΠ°ΠΉΡΠΊΠΎΠΌ ΡΠ΅ΠΊΡΡΠ΅, Π·Π°ΡΠ΅ΠΌ Π² ΠΏΠ΅ΡΠ΅Π²ΠΎΠ΄Π½ΠΎΠΌ ΡΡΡΡΠΊΠΎΠΌ ΡΠ΅ΠΊΡΡΠ΅ Π²ΡΠ΄Π΅Π»ΡΡΡΡΡ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΡΡΡΠΈΠ΅ ΡΠ»Π΅ΠΌΠ΅Π½ΡΡ, Π½Π°Π·ΡΠ²Π°Π΅ΠΌΡΠ΅ ΡΡΡΡΠΊΠΈΠΌΠΈ ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΠΌΠΈ Π°Π½Π°Π»ΠΎΠ³Π°ΠΌΠΈ (Π Π‘Π) ΠΊΠΈΡΠ°ΠΉΡΠΊΠΈΡ
ΠΊΠ»Π°ΡΠ·. ΠΠΎ Π²ΡΠΎΡΠΎΠΉ ΡΠ°ΡΡΠΈ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Ρ ΠΏΡΠΈΠ½ΡΠΈΠΏΡ Π°Π½Π°Π»ΠΈΠ·Π° ΠΈ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ Π Π‘Π. ΠΒ ΠΈΠ·ΡΡΠ΅Π½Π½ΠΎΠΌ ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π΅ ΡΠ°Π·ΠΌΠ΅ΡΠΊΠΈ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Ρ 9 ΡΠΈΠΏΠΎΠ² Π Π‘Π ΠΊΠΈΡΠ°ΠΉΡΠΊΠΈΡ
ΠΊΠ»Π°ΡΠ·: ΠΏΡΠΎΡΡΡΠ΅ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΡ, ΡΠ»ΠΎΠΆΠ½ΡΠ΅ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΡ, Π³ΡΡΠΏΠΏΡ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ, ΡΡΠ°Π³ΠΌΠ΅Π½ΡΡ ΠΏΡΠΎΡΡΡΡ
ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ, ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ Ρ ΠΎΠ΄Π½ΠΎΡΠΎΠ΄Π½ΡΠΌΠΈ ΡΠΊΠ°Π·ΡΠ΅ΠΌΡΠΌΠΈ, ΡΠ»ΠΎΠΆΠ½ΡΡ
ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ, ΡΠ°ΡΡΠΈ Π±Π΅ΡΡΠΎΡΠ·Π½ΡΡ
ΡΠ»ΠΎΠΆΠ½ΡΡ
ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ, ΡΠ»ΠΎΠΆΠ½ΠΎΡΠΎΡΠΈΠ½Π΅Π½Π½ΡΡ
ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ ΠΈ ΡΠ»ΠΎΠΆΠ½ΠΎΠΏΠΎΠ΄ΡΠΈΠ½Π΅Π½Π½ΡΡ
ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ. ΠΒ ΡΡΠ΅ΡΡΠ΅ΠΉ ΡΠ°ΡΡΠΈ ΡΠ΄Π΅Π»Π°Π½Ρ Π²ΡΠ²ΠΎΠ΄Ρ ΠΎ ΡΠΎΠΌ, ΡΡΠΎ Π² Π±ΠΎΠ»ΡΡΠΈΠ½ΡΡΠ²Π΅ ΡΠ»ΡΡΠ°Π΅Π² ΠΊΠΈΡΠ°ΠΉΡΠΊΠΎΠΉ ΠΊΠ»Π°ΡΠ·Π΅ ΠΌΠΎΠΆΠ΅Ρ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΠΎΠ²Π°ΡΡ ΠΌΠΎΠ½ΠΎΠΏΡΠ΅Π΄ΠΈΠΊΠ°ΡΠΈΠ²Π½Π°Ρ ΠΊΠΎΠ½ΡΡΡΡΠΊΡΠΈΡ Π² ΡΡΡΡΠΊΠΎΠΌ ΠΏΠ΅ΡΠ΅Π²ΠΎΠ΄Π΅. ΠΒ Π΄ΡΡΠ³ΠΈΡ
ΡΠΈΡΡΠ°ΡΠΈΡΡ
ΠΊΠΈΡΠ°ΠΉΡΠΊΠΈΠ΅ ΠΊΠ»Π°ΡΠ·Ρ ΠΏΠ΅ΡΠ΅Π²ΠΎΠ΄ΡΡΡΡ Π½Π΅ΠΏΡΠ΅Π΄ΠΈΠΊΠ°ΡΠΈΠ²Π½ΡΠΌΠΈ ΡΠ»ΠΎΠ²ΠΎΡΠΎΡΠ΅ΡΠ°Π½ΠΈΡΠΌΠΈ, ΠΏΠΎΠ»ΡΠΏΡΠ΅Π΄ΠΈΠΊΠ°ΡΠΈΠ²Π½ΡΠΌΠΈ ΠΏΡΠΈΡΠ°ΡΡΠ½ΡΠΌΠΈ ΠΈ Π΄Π΅Π΅ΠΏΡΠΈΡΠ°ΡΡΠ½ΡΠΌΠΈ ΠΎΠ±ΠΎΡΠΎΡΠ°ΠΌΠΈ, ΠΏΠΎΠ»ΠΈΠΏΡΠ΅Π΄ΠΈΠΊΠ°ΡΠΈΠ²Π½ΡΠΌΠΈ ΡΠ»ΠΎΠΆΠ½ΡΠΌΠΈ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΡΠΌΠΈ ΠΈ Π³ΡΡΠΏΠΏΠ°ΠΌΠΈ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠΉ.This paper considers the alignment and annotation of Chinese-Russian parallel texts at the level of clauses. For the purpose of the article, the author creates a Chinese-Russian Parallel Discourse Corpus. As its pilot material, the author chooses an official Chinese document, namely, the 2017 Report on the work of the government of the PRC and its Russian translation. The article consists of three parts, in which the author 1) provides a description of the principles of alignment; 2) determines the annotation principles of the units selected and creates a parallel corpus with alignment at the level of clauses; 3) draws a statistic of the types of Russian syntactic analogues of Chinese clauses (RSACCs), and interprets the information received. The first part considers the way in which texts are divided into clauses in the parallel corpus; the author starts with the segmentation of clauses in the Chinese text, and then singles out elements in the Russian translation that are considered RSACCs. In the second part, the author describes the principles of analysis and classification of the Russian syntactic analogues. Overall, there are 9 types of RSACCs: simple sentences, complex sentences, groups of sentences, fragments of simple sentences, fragments of sentences with homogeneous predicates, fragments of complex sentences, parts of asyndetic complex sentences, and parts of complex and compound sentences. In the third part, the author concludes that in the majority of cases, Chinese clauses correspond to single-predicate constructions in the Russian translation; in other cases, Chinese clauses can be translated by means of non-predicative phrases, semi-predicative phrases, and poly-predicative complex sentences and groups of sentences