Search CORE

60 research outputs found

Segment-based automatic language identification

Author: House A. S.
Mermelstein P.
Timothy J. Hazen
Victor W. Zue
Publication venue: 'Acoustical Society of America (ASA)'
Publication date
Field of study

Porting the galaxy system to Mandarin Chinese

Author: Wang Chao, 1972-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (leaves 83-86).by Chao Wang.M.S

DSpace@MIT

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Author: Keraron Rachel
Riabi Arij
Sagot Benoît
Scialom Thomas
Seddah Djamé
Staiano Jacopo
Publication venue
Publication date: 14/10/2021
Field of study

Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).Comment: 7 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding

Author: Hetherington Irvine Lee
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 167-173).by Irvine Lee Hetherington.Ph.D

DSpace@MIT

Origins of NASA names

Author: Karegeannes C. E.
Wells H. T.
Whiteley S. H.
Publication venue
Publication date
Field of study

Names are selected for NASA spaceflight projects and programs from various sources. Some have their foundations in mythology and astrology or legend and folklore. Some have historic connotations; others are based on a description of their mission, often resulting in an acronym. Included are names of launch vehicles, spacecraft, manned spaceflight programs, sounding rockets, and NASA field installations. This study is limited to names of approved projects through 1974; it does not include names of numerous projects which have been or are being studied or projects that were canceled or postponed before reaching actual flight

University of Michigan Library Repository

NASA Technical Reports Server

A Strategy for Multilingual Spoken Language Understanding Based on Graphs of Linguistic Units

Author: Calvo Lance Marcos
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 11/04/2016
Field of study

[EN] In this thesis, the problem of multilingual spoken language understanding is addressed using graphs to model and combine the different knowledge sources that take part in the understanding process. As a result of this work, a full multilingual spoken language understanding system has been developed, in which statistical models and graphs of linguistic units are used. One key feature of this system is its ability to combine and process multiple inputs provided by one or more sources such as speech recognizers or machine translators. A graph-based monolingual spoken language understanding system was developed as a starting point. The input to this system is a set of sentences that is provided by one or more speech recognition systems. First, these sentences are combined by means of a grammatical inference algorithm in order to build a graph of words. Next, the graph of words is processed to construct a graph of concepts by using a dynamic programming algorithm that identifies the lexical structures that represent the different concepts of the task. Finally, the graph of concepts is used to build the best sequence of concepts. The multilingual case happens when the user speaks a language different to the one natively supported by the system. In this thesis, a test-on-source approach was followed. This means that the input sentences are translated into the system's language, and then they are processed by the monolingual system. For this purpose, two speech translation systems were developed. The output of these speech translation systems are graphs of words that are then processed by the monolingual graph-based spoken language understanding system. Both in the monolingual case and in the multilingual case, the experimental results show that a combination of several inputs allows to improve the results obtained with a single input. In fact, this approach outperforms the current state of the art in many cases when several inputs are combined.[ES] En esta tesis se aborda el problema de la comprensión multilingüe del habla utilizando grafos para modelizar y combinar las diversas fuentes de conocimiento que intervienen en el proceso. Como resultado se ha desarrollado un sistema completo de comprensión multilingüe que utiliza modelos estadísticos y grafos de unidades lingüísticas. El punto fuerte de este sistema es su capacidad para combinar y procesar múltiples entradas proporcionadas por una o varias fuentes, como reconocedores de habla o traductores automáticos. Como punto de partida se desarrolló un sistema de comprensión multilingüe basado en grafos. La entrada a este sistema es un conjunto de frases obtenido a partir de uno o varios reconocedores de habla. En primer lugar, se aplica un algoritmo de inferencia gramatical que combina estas frases y obtiene un grafo de palabras. A continuación, se analiza el grafo de palabras mediante un algoritmo de programación dinámica que identifica las estructuras léxicas correspondientes a los distintos conceptos de la tarea, de forma que se construye un grafo de conceptos. Finalmente, se procesa el grafo de conceptos para encontrar la mejo secuencia de conceptos. El caso multilingüe ocurre cuando el usuario habla una lengua distinta a la original del sistema. En este trabajo se ha utilizado una estrategia test-on-source, en la cual las frases de entrada se traducen al lenguaje del sistema y éste las trata de forma monolingüe. Para ello se han propuesto dos sistemas de traducción del habla cuya salida son grafos de palabras, los cuales son procesados por el algoritmo de comprensión basado en grafos. Tanto en la configuración monolingüe como en la multilingüe los resultados muestran que la combinación de varias entradas permite mejorar los resultados obtenidos con una sola entrada. De hecho, esta aproximación consigue en muchos casos mejores resultados que el actual estado del arte cuando se utiliza una combinación de varias entradas.[CA] Aquesta tesi tracta el problema de la comprensió multilingüe de la parla utilitzant grafs per a modelitzar i combinar les diverses fonts de coneixement que intervenen en el procés. Com a resultat s'ha desenvolupat un sistema complet de comprensió multilingüe de la parla que utilitza models estadístics i grafs d'unitats lingüístiques. El punt fort d'aquest sistema és la seua capacitat per combinar i processar múltiples entrades proporcionades per una o diverses fonts, com reconeixedors de la parla o traductors automàtics. Com a punt de partida, es va desenvolupar un sistema de comprensió monolingüe basat en grafs. L'entrada d'aquest sistema és un conjunt de frases obtingut a partir d'un o més reconeixedors de la parla. En primer lloc, s'aplica un algorisme d'inferència gramatical que combina aquestes frases i obté un graf de paraules. A continuació, s'analitza el graf de paraules mitjançant un algorisme de programació dinàmica que identifica les estructures lèxiques corresponents als distints conceptes de la tasca, de forma que es construeix un graf de conceptes. Finalment, es processa aquest graf de conceptes per trobar la millor seqüència de conceptes. El cas multilingüe ocorre quan l'usuari parla una llengua diferent a l'original del sistema. En aquest treball s'ha utilitzat una estratègia test-on-source, en la qual les frases d'entrada es tradueixen a la llengua del sistema, i aquest les tracta de forma monolingüe. Per a fer-ho es proposen dos sistemes de traducció de la parla l'eixida dels quals són grafs de paraules. Aquests grafs són posteriorment processats per l'algorisme de comprensió basat en grafs. Tant per la configuració monolingüe com per la multilingüe els resultats mostren que la combinació de diverses entrades és capaç de millorar el resultats obtinguts utilitzant una sola entrada. De fet, aquesta aproximació aconsegueix en molts casos millors resultats que l'actual estat de l'art quan s'utilitza una combinació de diverses entrades.Calvo Lance, M. (2016). A Strategy for Multilingual Spoken Language Understanding Based on Graphs of Linguistic Units [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62407TESI

Crossref

RiuNet

The BG News November 3, 1995

Author: Bowling Green State University
Publication venue: ScholarWorks@BGSU
Publication date: 03/11/1995
Field of study

The BGSU campus student newspaper November 3, 1995. Volume 78 - Issue 48https://scholarworks.bgsu.edu/bg-news/6916/thumbnail.jp

Bowling Green State University: ScholarWorks@BGSU

Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

Author: Levison Libby
Stone Matthew
Publication venue: ScholarlyCommons
Publication date: 01/03/1995
Field of study

This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn

ScholarlyCommons@Penn

Central Florida Future, Vol. 40 No. 08, January 25, 2008

Author
Publication venue: Associated Collegiate Press
Publication date: 25/01/2008
Field of study

Med school gets another donation; With Libertarian and Presidency for Paul; Union\u27s Qdoba springs gas leak; Cracks at Knightro\u27s \u27not structural\u27.https://stars.library.ucf.edu/centralfloridafuture/3071/thumbnail.jp

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A narrative inquiry into intercultural collaborations through activities in music education within a large overseas american school system

Author: Black Timothy Michael
Publication venue
Publication date: 03/10/2015
Field of study

Music educators employed by the Large Overseas American School System (LOASS) at the center of this study live and work within the borders of allied host nations. Their students are dependents of military and civilian personnel stationed on bases situated on allied foreign soil. The researcher explores numerous perceptions of music educators and students who have engaged in intercultural collaboration, an unexplored activity occurring in the context of LOASS. Participants report on particular circumstances and issues surrounding activities in music education that include overseas host nation stakeholders. Contributions to the body of literature include re-envisioning the process through which one becomes intercultural, the role of antenarrative and what it comprises, as well as distinguishing unidirectional musical exchange from the activity of omnidirectional collaboration. Data sources include surveys, interviews, and historical evidence such as photos, school yearbooks and newspaper accounts. Survey results obtained from former LOASS music educators and students inform readers of the depth and breadth of the LOASS system, and the demographics of its participant pools. Interview data were manually coded, and revealed several emergent themes: motivations for initiating collaborative activities and what those activities look like; impact of collaborations on former music teachers, their students and host nation counterparts; barriers which inhibit such collaborations from taking place; strategies for overcoming those barriers, and what participants believe qualifies such collaborations as being successful. Yearbook and photographic relics provided an historical sense of overseas schools’ vision and legacy through writings and pictures archived over a 68-year continuum. In totum, these data comprise an antenarrative ‘story before the story’ from which participants’ narratives emerge and are presented in their own words. Framed within this context, the results provide a blueprint of how other members of the music education community can engage in such activities and successfully overcome any potential barriers that may inhibit them. Finally, a number of actionable alternative research methodologies are proffered to future researchers that may address peripheral issues regarding intercultural collaborations through activities in music education worldwide. In doing so, this study may encourage other like-minded music educators and their students to do the same

Boston University Institutional Repository (OpenBU)