1,867 research outputs found

    Automatic Pronunciation Assessment -- A Review

    Full text link
    Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding

    A process model for developing learning design patterns with international scope

    Get PDF
    This paper investigates the process of identifying design patterns in international collaborative learning environments. In this context, design patterns are referred to as structured descriptions of best practice with pre-defined sections such as problem, solution and consequences. We pay special attention to how the scope of a design pattern is identified and articulated. Based on a review of the seminal design patterns literature and current practice in the area of learning design, the lack of a more specific process description for developing patterns with international scope is identified. The paper suggests a process model for developing patterns with international scope. This model is exemplified in a case study that links the analysis of observation in international learning environments to the articulation of design patterns by identifying culturally independent core values that constitute the foundations of a design pattern with international scope. These core values are linked to recurrent learning behaviors and specific artefacts that support learning in the articulation of a design pattern. The findings contribute to gaining a deeper understanding of the pattern scoping and abstraction process in international learning environments

    Target Text Contraction in English-into-Korean Translations: A Contradiction of Presumed Translation Universals?

    Get PDF
    This paper contradicts the prevailing assumptions among the advocates of translation universals (TU’s) that explicitation, a translation behavior which consists of spelling things out rather than leaving them implicit in translation, is a potential TU, irrespective of the specific language pairs involved in the process of translation. Specifically, via a study employing a newly built 517,609-word parallel corpus, it is shown that implicitation and the subsequent TT contraction as well as explicitation and TT expansion entailed were both observed in translations involving Korean and English. The significance of the direction of language combinations in translations employing the same language pair was identified, together with the introduction and verification of the validity of the four measurement units devised for this study to capture diverse aspects of explicitation/implicitation which in turn entail TT expansion/contraction.Cet article contredit l’hypothèse qui a cours parmi les partisans des universels de traduction selon laquelle l’explicitation, procédé consistant à expliquer clairement les choses plutôt que de les laisser implicites dans le texte traduit, est un universel de traduction (UT) potentiel quelles que soient les deux langues présentes dans le processus de traduction. Au moyen de l’étude d’un nouveau corpus parallèle de 517 609 mots, on a notamment observé aussi bien l’implicitation et la contraction du texte d’arrivée qui l’accompagne que l’explicitation et l’expansion du texte traduit que cela implique dans des traductions entre le coréen et l’anglais. On a constaté que la direction des combinaisons linguistiques dans les traductions employant les mêmes langues est significative et on a introduit et vérifié la validité des quatre unités de mesure conçues pour que cette étude saisisse les différents aspects de l’explicitation/implicitation et de l’expansion/contraction du texte traduit qui en découlent.본 논문은 원문 텍스트에 암묵적으로 나타나있는 의미를 ’번역 과정에서 언어적으로 구현하여 의미의 명확성을 제고하는 번역 현상을 가리키는 ’외연화‘가번역언어에 상관없이 나타나는 ’번역보편소‘ 후보라는 ’번역보편소‘ 주창자들의 가설을 실증적 자료분석을 통해 반박한다. 구체적으로는 한국어와 영어 간의 번역 텍스트에서는 외연화와 그에 따른 번역텍스트 확장현상 이외에 원문텍스트에 명시적으로 구현되어 있는 의미를 암묵적 추론이 가능한 방식으로 구성하는 ’내포화‘와 그로 인한 번역텍스트 축소현상도 관찰되고 있음을 보고한다. 연구방법론으로 본 연구를 위해 새로 구성된 517,609 단어(토큰기준)의 병렬코퍼스를 다양한 통계기법을 사용, 분석한다. 이를 통해 동일 언어 쌍이 번역언어로 사용되는 경우에도 언어별 번역방향(한영및영한)이 연구대상인 번역현상에 중요한 차이를 가져온다는 사실을 보고하는 한편, 외연화/암묵화와 그로 인한 번역텍스트 확장/축소의 다양한 측면을 포착하기 위해 고안된 네 가지 측정단위의 타당성을 입증한다

    Towards the Global SentiWordNet

    Get PDF

    A Survey on Awesome Korean NLP Datasets

    Full text link
    English based datasets are commonly available from Kaggle, GitHub, or recently published papers. Although benchmark tests with English datasets are sufficient to show off the performances of new models and methods, still a researcher need to train and validate the models on Korean based datasets to produce a technology or product, suitable for Korean processing. This paper introduces 15 popular Korean based NLP datasets with summarized details such as volume, license, repositories, and other research results inspired by the datasets. Also, I provide high-resolution instructions with sample or statistics of datasets. The main characteristics of datasets are presented on a single table to provide a rapid summarization of datasets for researchers.Comment: 11 pages, 1 horizontal page for large tabl

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Get PDF
    Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.Comment: Proceedings of the 1st ACM WSDM Health Search and Data Mining Workshop (HSDM2020), 202
    corecore