11,157 research outputs found

    Factoid question answering for spoken documents

    Get PDF
    In this dissertation, we present a factoid question answering system, specifically tailored for Question Answering (QA) on spoken documents. This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken documents scenario. More specifically, we study new information retrieval (IR) techniques designed for speech, and utilize several levels of linguistic information for the speech-based QA task. These include named-entity detection with phonetic information, syntactic parsing applied to speech transcripts, and the use of coreference resolution. Our approach is largely based on supervised machine learning techniques, with special focus on the answer extraction step, and makes little use of handcrafted knowledge. Consequently, it should be easily adaptable to other domains and languages. In the work resulting of this Thesis, we have impulsed and coordinated the creation of an evaluation framework for the task of QA on spoken documents. The framework, named QAst, provides multi-lingual corpora, evaluation questions, and answers key. These corpora have been used in the QAst evaluation that was held in the CLEF workshop for the years 2007, 2008 and 2009, thus helping the developing of state-of-the-art techniques for this particular topic. The presentend QA system and all its modules are extensively evaluated on the European Parliament Plenary Sessions English corpus composed of manual transcripts and automatic transcripts obtained by three different Automatic Speech Recognition (ASR) systems that exhibit significantly different word error rates. This data belongs to the CLEF 2009 track for QA on speech transcripts. The main results confirm that syntactic information is very useful for learning to rank question candidates, improving results on both manual and automatic transcripts unless the ASR quality is very low. Overall, the performance of our system is comparable or better than the state-of-the-art on this corpus, confirming the validity of our approach.En aquesta Tesi, presentem un sistema de Question Answering (QA) factual, especialment ajustat per treballar amb documents orals. En el desenvolupament explorem, per primera vegada, quines tècniques de les habitualment emprades en QA per documents escrit són suficientment robustes per funcionar en l'escenari més difícil de documents orals. Amb més especificitat, estudiem nous mètodes de Information Retrieval (IR) dissenyats per tractar amb la veu, i utilitzem diversos nivells d'informació linqüística. Entre aquests s'inclouen, a saber: detecció de Named Entities utilitzant informació fonètica, "parsing" sintàctic aplicat a transcripcions de veu, i també l'ús d'un sub-sistema de detecció i resolució de la correferència. La nostra aproximació al problema es recolza en gran part en tècniques supervisades de Machine Learning, estant aquestes enfocades especialment cap a la part d'extracció de la resposta, i fa servir la menor quantitat possible de coneixement creat per humans. En conseqüència, tot el procés de QA pot ser adaptat a altres dominis o altres llengües amb relativa facilitat. Un dels resultats addicionals de la feina darrere d'aquesta Tesis ha estat que hem impulsat i coordinat la creació d'un marc d'avaluació de la taska de QA en documents orals. Aquest marc de treball, anomenat QAst (Question Answering on Speech Transcripts), proporciona un corpus de documents orals multi-lingüe, uns conjunts de preguntes d'avaluació, i les respostes correctes d'aquestes. Aquestes dades han estat utilitzades en les evaluacionis QAst que han tingut lloc en el si de les conferències CLEF en els anys 2007, 2008 i 2009; d'aquesta manera s'ha promogut i ajudat a la creació d'un estat-de-l'art de tècniques adreçades a aquest problema en particular. El sistema de QA que presentem i tots els seus particulars sumbòduls, han estat avaluats extensivament utilitzant el corpus EPPS (transcripcions de les Sessions Plenaries del Parlament Europeu) en anglès, que cónté transcripcions manuals de tots els discursos i també transcripcions automàtiques obtingudes mitjançant tres reconeixedors automàtics de la parla (ASR) diferents. Els reconeixedors tenen característiques i resultats diferents que permetes una avaluació quantitativa i qualitativa de la tasca. Aquestes dades pertanyen a l'avaluació QAst del 2009. Els resultats principals de la nostra feina confirmen que la informació sintàctica és mol útil per aprendre automàticament a valorar la plausibilitat de les respostes candidates, millorant els resultats previs tan en transcripcions manuals com transcripcions automàtiques, descomptat que la qualitat de l'ASR sigui molt baixa. En general, el rendiment del nostre sistema és comparable o millor que els altres sistemes pertanyents a l'estat-del'art, confirmant així la validesa de la nostra aproximació

    Religion and Attitudes toward Same-Sex Marriage among U.S. Latinos

    Full text link
    Objectives. This study examines links between multiple aspects of religious involvement and attitudes toward same-sex marriage among U.S. Latinos. The primary focus is on variations by affiliation and participation, but the possible mediating roles of biblical beliefs, clergy cues, and the role of religion in shaping political views are also considered. Methods. We use binary logistic regression models to analyze data from a large nationwide sample of U.S. Latinos conducted by the Pew Hispanic Forum in late 2006. Results. Findings highlight the strong opposition to same-sex marriage among Latino evangelical (or conservative) Protestants and members of sectarian groups (e.g., LDS), even compared with devout Catholics. Although each of the hypothesized mediators is significantly linked with attitudes toward same-sex marriage, for the most part controlling for them does not alter the massive affiliation/attendance differences in attitudes toward same-sex marriage. Conclusions. This study illustrates the importance of religious cleavages in public opinion on social issues within the diverse U.S. Latino population. The significance of religious variations in Hispanic civic life is likely to increase with the growth of the Latino population and the rising numbers of Protestants and sectarians among Latinos

    Noise-tolerance feasibility for restricted-domain Information Retrieval systems

    Get PDF
    Information Retrieval systems normally have to work with rather heterogeneous sources, such as Web sites or documents from Optical Character Recognition tools. The correct conversion of these sources into flat text files is not a trivial task since noise may easily be introduced as a result of spelling or typeset errors. Interestingly, this is not a great drawback when the size of the corpus is sufficiently large, since redundancy helps to overcome noise problems. However, noise becomes a serious problem in restricted-domain Information Retrieval specially when the corpus is small and has little or no redundancy. This paper devises an approach which adds noise-tolerance to Information Retrieval systems. A set of experiments carried out in the agricultural domain proves the effectiveness of the approach presented

    Destination-Language Proficiency in Cross-National Perspective: A Study of Immigrant Groups in Nine Western Countries

    Get PDF
    Immigrants’ destination-language proficiency has been typically studied from a microperspective in a single country. In this article, the authors examine the role of macrofactors in a cross-national perspective. They argue that three groups of macrolevel factors are important: the country immigrants settle in (“destination” effect), the sending nation (“origin” effect), and the combination between origin and destination (“setting” or “community” effect). The authors propose a design that simultaneously observes multiple origin groups in multiple destinations. They present substantive hypotheses about language proficiency and use them to develop a series of macrolevel indicators. The authors collected and standardized 19 existing immigrant surveys for nine Western countries. Using multilevel techniques, their analyses show that origins, destinations, and settings play a significant role in immigrants’ language proficiency.

    Altitude deviations: Breakdowns of an error-tolerant system

    Get PDF
    Pilot reports of aviation incidents to the Aviation Safety Reporting System (ASRS) provide a window on the problems occurring in today's airline cockpits. The narratives of 10 pilot reports of errors made in the automation-assisted altitude-change task are used to illustrate some of the issues of pilots interacting with automatic systems. These narratives are then used to construct a description of the cockpit as an information processing system. The analysis concentrates on the error-tolerant properties of the system and on how breakdowns can occasionally occur. An error-tolerant system can detect and correct its internal processing errors. The cockpit system consists of two or three pilots supported by autoflight, flight-management, and alerting systems. These humans and machines have distributed access to clearance information and perform redundant processing of information. Errors can be detected as deviations from either expected behavior or as deviations from expected information. Breakdowns in this system can occur when the checking and cross-checking tasks that give the system its error-tolerant properties are not performed because of distractions or other task demands. Recommendations based on the analysis for improving the error tolerance of the cockpit system are given

    Designing Women: Essentializing Femininity in AI Linguistics

    Get PDF
    Since the eighties, feminists have considered technology a force capable of subverting sexism because of technology’s ability to produce unbiased logic. Most famously, Donna Haraway’s “A Cyborg Manifesto” posits that the cyborg has the inherent capability to transcend gender because of its removal from social construct and lack of loyalty to the natural world. But while humanoids and artificial intelligence have been imagined as inherently subversive to gender, current artificial intelligence perpetuates gender divides in labor and language as their programmers imbue them with traits considered “feminine.” A majority of 21st century AI and humanoids are programmed to fit female stereotypes as they fulfill emotional labor and perform pink-collar tasks, whether through roles as therapists, query-fillers, or companions. This paper examines four specific chat-based AI --ELIZA, XiaoIce, Sophia, and Erica-- and examines how their feminine linguistic patterns are used to maintain the illusion of emotional understanding in regards to the tasks that they perform. Overall, chat-based AI fails to subvert gender roles, as feminine AI are relegated to the realm of emotional intelligence and labor

    The Future of the Internet III

    Get PDF
    Presents survey results on technology experts' predictions on the Internet's social, political, and economic impact as of 2020, including its effects on integrity and tolerance, intellectual property law, and the division between personal and work lives

    Objective Translational Error and the Cultural Norm of Translation

    Get PDF

    Combating e-discrimination in the North West - final report

    Get PDF
    The Combating eDiscimination in the North West project examined over 100 websites advertising job opportunities both regionally and nationally, and found the vast majority to be largely inaccessible. Professional standards, such as using valid W3C code and adhering to the W3C Web Content Accessibility Guidelines, were largely not followed. The project also conducted interviews with both public and private sector web professionals, and focus groups of disabled computer users, to draw a broader picture of the accessibility of jobs websites. Interviews with leading web development companies in the Greater Manchester region, showed that there is a view there should not be any additional cost in making websites accessible, as the expertise to create a site professionally should be in place from the start, and that accessibility will follow from applying professional standards. However, through the process of trying to create a website for the project, with such a company, it was found that following professional standards is not sufficient to catch all the potential problems, and that user testing is an essential adjunct to professional practice. The main findings of the project are, thus, that: • Most websites in the job opportunities sector are not following professional standards of web development, and are largely inaccessible • Professional standards of web development need to be augmented with user testing to ensure proper accessibility
    corecore