6,192 research outputs found

    Overview of the NTCIR-11 SpokenQuery&Doc task

    Get PDF
    This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery&Doc) task at the NTCIR-11Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) as the main sub-task. With a spoken query driven spoken term detection task (SQSTD) as an additional sub-task. The paper describes details of each sub-task, the data used, the creation of the speech recognition systems used to create the transcripts, the design of the retrieval test collections, the metrics used to evaluate the sub-tasks and a summary of the results of submissions by the task participants

    A study of the very high order natural user language (with AI capabilities) for the NASA space station common module

    Get PDF
    The requirements are identified for a very high order natural language to be used by crew members on board the Space Station. The hardware facilities, databases, realtime processes, and software support are discussed. The operations and capabilities that will be required in both normal (routine) and abnormal (nonroutine) situations are evaluated. A structure and syntax for an interface (front-end) language to satisfy the above requirements are recommended

    A longitudinal study of the development of fluency of novice Japanese learners: Analysis using objective measures

    Get PDF
    Fluency has been studied extensively in ESL and EFL mainly to determine what spoken features are characteristics of fluent speech by comparing students who participated in study abroad programs with those who did not. These studies were mainly done with advanced learners of English as a second or foreign language, and there have not been many studies conducted with novice-level learners of foreign languages. Japanese fluency studies are especially in the minority. It is necessary to investigate the characteristics of fluency in Japanese novice-level learners since Japanese shares very little in common with English. This study investigated the developmental changes in fluency in Japanese foreign language learners (JFL) over the course of one semester using objective measures. Research questions are 1) which objective measures change in relationship to changes in L2 general proficiency throughout a semester, and how do they change, 2) Which objective measures correlate to subjective rating obtained from Japanese instructors? The participants were 30 students enrolled in Japanese 101. The objective measures were obtained by annotating audio samples with Praat and Syllable Nuclei and by parsing the annotations and calculating measures with Fluency Calculator (Fukada, Hirotani & Matsumoto 2015). The audio data was collected at the beginning and end of the semester with the same set of tasks. Objective measures taken were speed, quantity of speech, pause related measures and several measures concerned with repairs. Accuracy was also measured by the number of AS-units with or without errors. The results for the first research question suggested that the speed of speech showed steady development from very early stages in the students’ language learning process. Silent pause measures indicated that leaners became able to pause at grammatical junctures as the semester went on, but the overall pause ratio did not seem to decrease between the collection points. In addition, it was found that the two tasks used in this study generated very different results. It is not clear which task was better able to gain access to the learners’ true fluency, and this should be further investigated in future studies. Correlation coefficients were calculated to see the relationship between subjective measures and objective measures in order to answer research question 2. The results indicated that speed related measures showed high correlation values indicating that they could be good measures to predict oral proficiency. Mean run length also showed steady correlations to subjective scoring at both the first and second collection points. Pause related measures showed quite different correlation values from the first to second collection points. There were some measures that changed between the collection points, so it will be necessary to see how the relationship between oral proficiency and the objective measures may change with a wider variety of learners in future studies

    Automatic Construction of Discourse Corpora for Dialogue Translation

    Get PDF
    In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation

    ParaPhraser: Russian paraphrase corpus and shared task

    Get PDF
    The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with fine-grained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.Peer reviewe

    Evaluating Information Retrieval and Access Tasks

    Get PDF
    This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

    Get PDF
    Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page
    • 

    corecore