8 research outputs found


    Get PDF
    国立国語研究所 コーパス開発センター 非常勤研究員Adjunct Researcher, Center for Corpus Development, NINJAL現代日本語書き言葉の原型となる近代日本語の口語文体は,言文一致運動という意識的な文体変革から大きく動き出し,第二期国定教科書の完成・普及によって確立したとされる。近年,この近代口語文体の確立において,欧米文学の翻訳行為が大きな影響を及ぼしたと具体的に論じられ始めた。より詳細にその影響を明らかにするため,明治中期(明治16~30年)に発表された翻訳小説6作品と明治後期(同31~44年)・大正期に発表された創作小説2作品のコーパス構築を行った。本稿では,まず構築したコーパスの概要と,それらの文の長さや文書間類似度の調査結果を示し,明治中期から後期にかけての文体の類似性について指摘する。本稿の調査では,文の長さ(一文における文節数の平均)や品詞比率,MVR(Modifier Verb Ratio)では近似の値を示し,文書間類似度では特徴的な結果は表れなかった。ただし,一文に含まれる接続助詞数のばらつきを調査すると,時代が下るにしたがってばらつきが小さくなるため,やはり時代による差があることは明らかとなった。本稿で用いた手法によると,明治中期の翻訳小説と近代口語文体確立期の創作小説とに類似性が見出せることを示すことができた。これは,明治中期の段階で近代口語文体に近い文が産出され,それが読み手の目に触れていたことを意味しており,欧米の小説を翻訳することによる日本語への影響を示すこととなる。The modern colloquial style of Japanese had begun to be constructed since the genbun itchi movement, which involved the unification of spoken and written old Japanese. This style had been established and popularized as the standard form of Japanese through the national textbook system (kokutei kyokasho) in the Meiji era. From then on, it became the basis for contemporary written Japanese. Recent discussions have indicated that the translation of European and American literature had affected the establishment of the colloquial style. To quantitatively investigate the effect of the translation work on the style, we compiled a corpus of six translated works of the mid-Meiji era and one each from the late Meiji and Taisho eras.This paper provides an overview of the corpus and the findings of a statistical survey, such as the length of sentence, the similarity between the works, and so on. The survey showed that there are no significant differences between the two sets of works in the number of bunsetsus in a sentence, the rate of POS tagging, Modifier Verb Ratio (MVR), or in their cosine similarities. The quantitative results indicated similarities between the translated works of the mid-Meiji era and those of the late Meiji and Taisho eras. It also supported the assumption that the colloquial style had already become familiar through translated works and read by people of the mid-Meiji era. However, as an exception, the statistics showed that the variance of the conjunctive postposition in a sentence tends to decrease with the passage of time

    BCCWJ-TimeBank: Temporal and Event Information Annotation on Japanese Text

    Get PDF

    Webを母集団とした超大規模コーパスの開発 : 収集と組織化

    Get PDF
    国立国語研究所 コーパス開発センター国立国語研究所 コーパス開発センター プロジェクト研究員国立国語研究所 コーパス開発センター プロジェクト研究員国立国語研究所 コーパス開発センター 非常勤研究員国立国語研究所 言語資源研究系Center for Corpus Development, NINJALPostdoctoral Research Fellow, Center for Corpus Development, NINJALPostdoctoral Research Fellow, Center for Corpus Development, NINJALAdjunct Researcher, Center for Corpus Development, NINJALDepartment of Corpus Studies, NINJAL国立国語研究所コーパス開発センターでは2011年より超大規模コーパスプロジェクトとして,Webを母集団とした100億語規模のコーパスの構築を進めている。構築にあたっては,工程を収集・組織化・利活用・保存の四つに分割して実装を進めている。本論文ではそのうち最初の2工程について報告する。収集に関しては,2012年第4四半期より3か月ごとに1億URLのクロールを繰り返し実施している。また組織化に関しては,2013年第3四半期までの約1年間に収集されたWebページの文抽出・形態素解析・係り受け解析を実施した。これらの作業に生じた問題とその解決法を示した後,2013年末において構築されたコーパスデータの基礎統計量を示し,本コーパスを用いてどのような理論的・応用的研究が可能になると考えられるかを論じる。In 2011, the National Institute for Japanese Language and Linguistics launched a corpus compilation project with the aim of constructing a ten-billion-word Web corpus. The project was split into the following four sub-projects: page collection, linguistic annotation, release, and preservation. During the page collection stage, crawling began during the fourth quarter of 2012. We crawled 100 million URLs every three months as fixed-point observations. During the linguistic annotation, normalization (HTML tag removal and character encoding conversion), Japanese morphological analysis (word segmentation and part-of-speech tagging), and Japanese dependency analysis were performed on the data that were crawled in the timespan of one year, specifically from the fourth quarter of 2012 to the third quarter of 2013. In this paper, we present the basic statistics of the crawled data and discuss possible theoretical and practical implications of the language resources. Additionally, we address issues encountered during the page collection and linguistic annotation stages, and offer tentative solutions


    Get PDF
    国立国語研究所 コーパス開発センター 非常勤研究員マンパワーグループ株式会社国立国語研究所 理論・構造研究系 非常勤研究員国立国語研究所 コーパス開発センター 非常勤研究員国立国語研究所 言語資源研究系国立国語研究所 コーパス開発センター 技術補佐員(元)国立国語研究所 コーパス開発センター プロジェクト研究員文部科学省国立国語研究所 言語資源研究系国立国語研究所 言語資源研究系国立国語研究所 言語資源研究系国立国語研究所 言語資源研究系国立国語研究所 言語資源研究系Adjunct Researcher, Center for Corpus Development, NINJALManpower Group Co., LtdAdjunct Researcher, Department of Linguistic Theory and Structure, NINJALAdjunct Researcher, Center for Corpus Development, NINJALDepartment of Corpus Studies, NINJAL(former) Technical Staff, Center for Corpus Development, NINJALPostdoctoral Research Fellow, Center for Corpus Development, NINJALMinistry of Education, Culture, Sports, Science, and TechnologyDepartment of Corpus Studies, NINJALDepartment of Corpus Studies, NINJALDepartment of Corpus Studies, NINJALDepartment of Corpus Studies, NINJALDepartment of Corpus Studies, NINJAL『現代日本語書き言葉均衡コーパス』第1.0版(Maekawa et al. 2014)(以下BCCWJ)には「文境界」の情報がアノテーションされているが,その認定基準の妥当性について従来から様々な指摘がある(小西ほか2014,長谷川2014,田野村2014)。この問題に対処するために,国立国語研究所コーパス開発センターでは2013年から2014年にかけて,BCCWJの修正を行った。本稿ではその修正作業について報告する。第1.0版におけるBCCWJ 文境界情報の問題は,コーパス構築の過程において文境界を含む文書構造タグの整備と形態素列レベルの情報の整備とを並行して行ったために,文字情報を用いる文境界処理にとどまったことに由来する。今回,形態論情報に基づいた文境界基準を策定し,問題の解消を試みた。文境界修正の指針を示すとともに,文境界修正に用いた作業環境と,修正件数について報告する。In December 2011, the National Institute for Japanese Language and Linguistics (NINJAL) released a 100-million-word balanced corpus - the Balanced Corpus of Contemporary Written Japanese (BCCWJ) DVD Version 1.0 - which was compiled from 2006 through 2011. Some users have pointed out some issues concerning sentence delimitation in the BCCWJ. To address these issues, we - NINJAL - performed a complete survey and correction, beginning in 2013 and ending in 2014. This article reports the revision work on sentence delimitation in the BCCWJ. The problems with the BCCWJ DVD Version 1.0 derive from the string-based definition. We could not obtain any morpheme information for the sentence delimitation task because of the task parallelism between sentence delimitation annotation and morpheme annotation. The method used this time was morpheme based. We present the morpheme-based annotation guidelines, annotation environment, and basic statistics of the corpus correction

    BCCWJ-TimeBank: Temporal and Event Information Annotation on Japanese Text

    No full text
    Temporal information extraction can be split into the following three tasks: tem-poral expression extraction, time normalisa-tion, and temporal ordering relation resolu-tion. This paper describes a time expression and temporal ordering annotation schema for Japanese, employing the Balanced Cor-pus of Contemporary Written Japanese, or BCCWJ. The annotation is aimed at allow-ing the development of better Japanese tem-poral ordering relation resolution tools. The annotation schema is based on an ISO anno-tation standard – TimeML. We extract verbal and adjective event expressions as ⟨EVENT⟩ in a subset of BCCWJ. Then, we annotate temporal ordering relation ⟨TLINK ⟩ on the above pairs of event and time expressions by previous work. We identify several issues in the annotation.

    Polydiacetylene Liposomal Aequorin Bioluminescent Device for Detection of Hydrophobic Compounds

    No full text
    In this study, a polydiacetylene liposomal aequorin bioluminescent device (PLABD) that functioned through control of the membrane transport of Ca<sup>2+</sup> ions was developed for detecting hydrophobic compounds. In the PLABD, aequorin was encapsulated in an internal water phase and a calcium ionophore (CI) was contained in a hydrophobic region. Membrane transport of Ca<sup>2+</sup> ions across the CI was suppressed by polymerization between diacetylene molecules. On addition of an analyte, the membrane transport of Ca<sup>2+</sup> ions across the CI increased, and Ca<sup>2+</sup> ions from the external water phase could diffuse into the internal water phase via the CI, which resulted in bioluminescence of the aequorin. Lidocaine, procaine, and procainamide were used as model compounds to test the validity of the detection mechanism of the PLABD. When each analyte was added to a suspension of the PLABD, bioluminescence from the aequorin in the PLABD was observed, and the level of this bioluminescence increased with increasing analyte concentration. There was a linear relationship between the logarithm of the analyte concentration and the bioluminescence for all analytes as follows: <i>R</i> = 0.89 from 10 nmol L<sup>–1</sup> to 10 mmol L<sup>–1</sup> for lidocaine, <i>R</i> = 0.66 from 10 nmol L<sup>–1</sup> to 100 μmol L<sup>–1</sup> for procaine, and <i>R</i> = 0.74 from 100 nmol L<sup>–1</sup> to 100 μmol L<sup>–1</sup> for procainamide. Compared to the traditional colorimetric method using polydiacetylene liposome, the PLABD was superior for both the sensitivity and dynamic range. Thus, PLABD is a valid, simple, and sensitive signal generator for detection of hydrophobic compounds that interact with PLABD membranes

    Empagliflozin in Patients with Chronic Kidney Disease

    No full text
    Background The effects of empagliflozin in patients with chronic kidney disease who are at risk for disease progression are not well understood. The EMPA-KIDNEY trial was designed to assess the effects of treatment with empagliflozin in a broad range of such patients. Methods We enrolled patients with chronic kidney disease who had an estimated glomerular filtration rate (eGFR) of at least 20 but less than 45 ml per minute per 1.73 m(2) of body-surface area, or who had an eGFR of at least 45 but less than 90 ml per minute per 1.73 m(2) with a urinary albumin-to-creatinine ratio (with albumin measured in milligrams and creatinine measured in grams) of at least 200. Patients were randomly assigned to receive empagliflozin (10 mg once daily) or matching placebo. The primary outcome was a composite of progression of kidney disease (defined as end-stage kidney disease, a sustained decrease in eGFR to &lt; 10 ml per minute per 1.73 m(2), a sustained decrease in eGFR of &amp; GE;40% from baseline, or death from renal causes) or death from cardiovascular causes. Results A total of 6609 patients underwent randomization. During a median of 2.0 years of follow-up, progression of kidney disease or death from cardiovascular causes occurred in 432 of 3304 patients (13.1%) in the empagliflozin group and in 558 of 3305 patients (16.9%) in the placebo group (hazard ratio, 0.72; 95% confidence interval [CI], 0.64 to 0.82; P &lt; 0.001). Results were consistent among patients with or without diabetes and across subgroups defined according to eGFR ranges. The rate of hospitalization from any cause was lower in the empagliflozin group than in the placebo group (hazard ratio, 0.86; 95% CI, 0.78 to 0.95; P=0.003), but there were no significant between-group differences with respect to the composite outcome of hospitalization for heart failure or death from cardiovascular causes (which occurred in 4.0% in the empagliflozin group and 4.6% in the placebo group) or death from any cause (in 4.5% and 5.1%, respectively). The rates of serious adverse events were similar in the two groups. Conclusions Among a wide range of patients with chronic kidney disease who were at risk for disease progression, empagliflozin therapy led to a lower risk of progression of kidney disease or death from cardiovascular causes than placebo