447 research outputs found

    DEVELOPMENT AND EVALUATION OF THE ATOS SPONTANEOUS SPEECH CONVERSATIONAL SYSTEM

    Get PDF
    ABSTRACT In this paper we report our recent development work in Spanish spontaneous speech conversational systems. We describe the Automatic Telephone Operator Service (ATOS) and present the improvements introduced into it to deal with spontaneous speech, which are: (a) a task independent dialogue manager, that can be adapted to a new semantic domain by changing a configuration file. It also generates a prediction about the user's expected utterance to constrain the language model used by the speech recognizer. (b) a language modeling strategy, which allows to adapt the statistical language model to a new task with just few hundreds of sentences. This strategy reduces a 27% the word error rate. We also report the results, conclusions and the speech database collected in the evaluation of the ATOS system, which has been tested by 30 real users

    Dynamic language modeling for European Portuguese

    Get PDF
    Doutoramento em Engenharia InformáticaActualmente muitas das metodologias utilizadas para transcrição e indexação de transmissões noticiosas são baseadas em processos manuais. Com o processamento e transcrição deste tipo de dados os prestadores de serviços noticiosos procuram extrair informação semântica que permita a sua interpretação, sumarização, indexação e posterior disseminação selectiva. Pelo que, o desenvolvimento e implementação de técnicas automáticas para suporte deste tipo de tarefas têm suscitado ao longo dos últimos anos o interesse pela utilização de sistemas de reconhecimento automático de fala. Contudo, as especificidades que caracterizam este tipo de tarefas, nomeadamente a diversidade de tópicos presentes nos blocos de notícias, originam um elevado número de ocorrência de novas palavras não incluídas no vocabulário finito do sistema de reconhecimento, o que se traduz negativamente na qualidade das transcrições automáticas produzidas pelo mesmo. Para línguas altamente flexivas, como é o caso do Português Europeu, este problema torna-se ainda mais relevante. Para colmatar este tipo de problemas no sistema de reconhecimento, várias abordagens podem ser exploradas: a utilização de informações específicas de cada um dos blocos noticiosos a ser transcrito, como por exemplo os scripts previamente produzidos pelo pivot e restantes jornalistas, e outro tipo de fontes como notícias escritas diariamente disponibilizadas na Internet. Este trabalho engloba essencialmente três contribuições: um novo algoritmo para selecção e optimização do vocabulário, utilizando informação morfosintáctica de forma a compensar as diferenças linguísticas existentes entre os diferentes conjuntos de dados; uma metodologia diária para adaptação dinâmica e não supervisionada do modelo de linguagem, utilizando múltiplos passos de reconhecimento; metodologia para inclusão de novas palavras no vocabulário do sistema, mesmo em situações de não existência de dados de adaptação e sem necessidade re-estimação global do modelo de linguagem.Most of today methods for transcription and indexation of broadcast audio data are manual. Broadcasters process thousands hours of audio and video data on a daily basis, in order to transcribe that data, to extract semantic information, and to interpret and summarize the content of those documents. The development of automatic and efficient support for these manual tasks has been a great challenge and over the last decade there has been a growing interest in the usage of automatic speech recognition as a tool to provide automatic transcription and indexation of broadcast news and random and relevant access to large broadcast news databases. However, due to the common topic changing over time which characterizes this kind of tasks, the appearance of new events leads to high out-of-vocabulary (OOV) word rates and consequently to degradation of recognition performance. This is especially true for highly inflected languages like the European Portuguese language. Several innovative techniques can be exploited to reduce those errors. The use of news shows specific information, such as topic-based lexicons, pivot working script, and other sources such as the online written news daily available in the Internet can be added to the information sources employed by the automatic speech recognizer. In this thesis we are exploring the use of additional sources of information for vocabulary optimization and language model adaptation of a European Portuguese broadcast news transcription system. Hence, this thesis has 3 different main contributions: a novel approach for vocabulary selection using Part-Of-Speech (POS) tags to compensate for word usage differences across the various training corpora; language model adaptation frameworks performed on a daily basis for single-stage and multistage recognition approaches; a new method for inclusion of new words in the system vocabulary without the need of additional data or language model retraining

    Introduction

    Get PDF

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
    corecore