3,198 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    MENGA: a new comprehensive tool for the integration of neuroimaging data and the Allen human brain transcriptome atlas

    Get PDF
    Brain-wide mRNA mappings offer a great potential for neuroscience research as they can provide information about system proteomics. In a previous work we have correlated mRNA maps with the binding patterns of radioligands targeting specific molecular systems and imaged with positron emission tomography (PET) in unrelated control groups. This approach is potentially applicable to any imaging modality as long as an efficient procedure of imaging-genomic matching is provided. In the original work we considered mRNA brain maps of the whole human genome derived from the Allen human brain database (ABA) and we performed the analysis with a specific region-based segmentation with a resolution that was limited by the PET data parcellation. There we identified the need for a platform for imaging-genomic integration that should be usable with any imaging modalities and fully exploit the high resolution mapping of ABA dataset.In this work we present MENGA (Multimodal Environment for Neuroimaging and Genomic Analysis), a software platform that allows the investigation of the correlation patterns between neuroimaging data of any sort (both functional and structural) with mRNA gene expression profiles derived from the ABA database at high resolution.We applied MENGA to six different imaging datasets from three modalities (PET, single photon emission tomography and magnetic resonance imaging) targeting the dopamine and serotonin receptor systems and the myelin molecular structure. We further investigated imaging-genomic correlations in the case of mismatch between selected proteins and imaging targets

    Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding

    Full text link
    Abstractive community detection is an important spoken language understanding task, whose goal is to group utterances in a conversation according to whether they can be jointly summarized by a common abstractive sentence. This paper provides a novel approach to this task. We first introduce a neural contextual utterance encoder featuring three types of self-attention mechanisms. We then train it using the siamese and triplet energy-based meta-architectures. Experiments on the AMI corpus show that our system outperforms multiple energy-based and non-energy based baselines from the state-of-the-art. Code and data are publicly available.Comment: Update baseline

    The Walking Talking Stick: Understanding Automated Note-Taking in Walking Meetings

    Get PDF
    While walking meetings offer a healthy alternative to sit-down meetings, they also pose practical challenges. Taking notes is difficult while walking, which limits the potential of walking meetings. To address this, we designed the Walking Talking Stick -- a tangible device with integrated voice recording, transcription, and a physical highlighting button to facilitate note-taking during walking meetings. We investigated our system in a three-condition between-subjects user study with thirty pairs of participants (NN=60) who conducted 15-minute outdoor walking meetings. Participants either used clip-on microphones, the prototype without the button, or the prototype with the highlighting button. We found that the tangible device increased task focus, and the physical highlighting button facilitated turn-taking and resulted in more useful notes. Our work demonstrates how interactive artifacts can incentivize users to hold meetings in motion and enhance conversation dynamics. We contribute insights for future systems which support conducting work tasks in mobile environments.Comment: In CHI 202

    Toward summarization of communicative activities in spoken conversation

    Get PDF
    This thesis is an inquiry into the nature and structure of face-to-face conversation, with a special focus on group meetings in the workplace. I argue that conversations are composed of episodes, each of which corresponds to an identifiable communicative activity such as giving instructions or telling a story. These activities are important because they are part of participants’ commonsense understanding of what happens in a conversation. They appear in natural summaries of conversations such as meeting minutes, and participants talk about them within the conversation itself. Episodic communicative activities therefore represent an essential component of practical, commonsense descriptions of conversations. The thesis objective is to provide a deeper understanding of how such activities may be recognized and differentiated from one another, and to develop a computational method for doing so automatically. The experiments are thus intended as initial steps toward future applications that will require analysis of such activities, such as an automatic minute-taker for workplace meetings, a browser for broadcast news archives, or an automatic decision mapper for planning interactions. My main theoretical contribution is to propose a novel analytical framework called participant relational analysis. The proposal argues that communicative activities are principally indicated through participant-relational features, i.e., expressions of relationships between participants and the dialogue. Participant-relational features, such as subjective language, verbal reference to the participants, and the distribution of speech activity amongst the participants, are therefore argued to be a principal means for analyzing the nature and structure of communicative activities. I then apply the proposed framework to two computational problems: automatic discourse segmentation and automatic discourse segment labeling. The first set of experiments test whether participant-relational features can serve as a basis for automatically segmenting conversations into discourse segments, e.g., activity episodes. Results show that they are effective across different levels of segmentation and different corpora, and indeed sometimes more effective than the commonly-used method of using semantic links between content words, i.e., lexical cohesion. They also show that feature performance is highly dependent on segment type, suggesting that human-annotated “topic segments” are in fact a multi-dimensional, heterogeneous collection of topic and activity-oriented units. Analysis of commonly used evaluation measures, performed in conjunction with the segmentation experiments, reveals that they fail to penalize substantially defective results due to inherent biases in the measures. I therefore preface the experiments with a comprehensive analysis of these biases and a proposal for a novel evaluation measure. A reevaluation of state-of-the-art segmentation algorithms using the novel measure produces substantially different results from previous studies. This raises serious questions about the effectiveness of some state-of-the-art algorithms and helps to identify the most appropriate ones to employ in the subsequent experiments. I also preface the experiments with an investigation of participant reference, an important type of participant-relational feature. I propose an annotation scheme with novel distinctions for vagueness, discourse function, and addressing-based referent inclusion, each of which are assessed for inter-coder reliability. The produced dataset includes annotations of 11,000 occasions of person-referring. The second set of experiments concern the use of participant-relational features to automatically identify labels for discourse segments. In contrast to assigning semantic topic labels, such as topical headlines, the proposed algorithm automatically labels segments according to activity type, e.g., presentation, discussion, and evaluation. The method is unsupervised and does not learn from annotated ground truth labels. Rather, it induces the labels through correlations between discourse segment boundaries and the occurrence of bracketing meta-discourse, i.e., occasions when the participants talk explicitly about what has just occurred or what is about to occur. Results show that bracketing meta-discourse is an effective basis for identifying some labels automatically, but that its use is limited if global correlations to segment features are not employed. This thesis addresses important pre-requisites to the automatic summarization of conversation. What I provide is a novel activity-oriented perspective on how summarization should be approached, and a novel participant-relational approach to conversational analysis. The experimental results show that analysis of participant-relational features is

    Accessing spoken interaction through dialogue processing [online]

    Get PDF
    Zusammenfassung Unser Leben, unsere Leistungen und unsere Umgebung, alles wird derzeit durch Schriftsprache dokumentiert. Die rasante Fortentwicklung der technischen Möglichkeiten Audio, Bilder und Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt werden um die schriftliche Dokumentation von menschlicher Kommunikation, zum Beispiel Meetings, zu unterstĂŒtzen, zu ergĂ€nzen oder gar zu ersetzen. Diese neuen Technologien können uns in die Lage versetzen Information aufzunehmen, die anderweitig verloren gehen, die Kosten der Dokumentation zu senken und hochwertige Dokumente mit audiovisuellem Material anzureichern. Die Indizierung solcher Aufnahmen stellt die Kerntechnologie dar um dieses Potential auszuschöpfen. Diese Arbeit stellt effektive Alternativen zu schlĂŒsselwortbasierten Indizes vor, die SuchraumeinschrĂ€nkungen bewirken und teilweise mit einfachen Mitteln zu berechnen sind. Die Indizierung von Sprachdokumenten kann auf verschiedenen Ebenen erfolgen: Ein Dokument gehört stilistisch einer bestimmten Datenbasis an, welche durch sehr einfache Merkmale bei hoher Genauigkeit automatisch bestimmt werden kann. Durch diese Art von Klassifikation kann eine Reduktion des Suchraumes um einen Faktor der GrĂ¶ĂŸenordnung 4­10 erfolgen. Die Anwendung von thematischen Merkmalen zur Textklassifikation bei einer Nachrichtendatenbank resultiert in einer Reduktion um einen Faktor 18. Da Sprachdokumente sehr lang sein können mĂŒssen sie in thematische Segmente unterteilt werden. Ein neuer probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia­ tive und Stil) liefern vergleichbare oder bessere Resultate als traditionelle schlĂŒsselwortbasierte AnsĂ€tze. Diese thematische Segmente können durch die vorherrschende AktivitĂ€t charakterisiert werden (erzĂ€hlen, diskutieren, planen, ...), die durch ein neuronales Netz detektiert werden kann. Die Detektionsraten sind allerdings begrenzt da auch Menschen diese AktivitĂ€ten nur ungenau bestimmen. Eine maximale Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten Daten theoretisch möglich. Eine thematische Klassifikation dieser Segmente wurde ebenfalls auf einer Datenbasis durchgefĂŒhrt, die Detektionsraten fĂŒr diesen Index sind jedoch gering. Auf der Ebene der einzelnen Äußerungen können Dialogakte wie Aussagen, Fragen, RĂŒckmeldungen (aha, ach ja, echt?, ...) usw. mit einem diskriminativ trainierten Hidden Markov Model erkannt werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen wie Frage/Antwort­Spielen erweitert werden (Dialogspiele). Dialogakte und ­spiele können eingesetzt werden um Klassifikatoren fĂŒr globale Sprechstile zu bauen. Ebenso könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz erinnern und versuchen, diese in einer grafischen ReprĂ€sentation wiederzufinden. In einer Studie mit sehr pessimistischen Annahmen konnten Benutzer eines aus vier Ă€hnlichen und gleichwahrscheinlichen GesprĂ€chen mit einer Genauigkeit von ~ 43% durch eine graphische ReprĂ€sentation von AktivitĂ€t bestimmt. Dialogakte könnte in diesem Szenario ebenso nĂŒtzlich sein, die Benutzerstudie konnte aufgrund der geringen Datenmenge darĂŒber keinen endgĂŒltigen Aufschluß geben. Die Studie konnte allerdings fĂŒr detailierte Basismerkmale wie FormalitĂ€t und SprecheridentitĂ€t keinen Effekt zeigen. Abstract Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance high­quality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 4­10 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, ...) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, ...), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ~ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity

    Automatic Summarization

    Get PDF
    It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field
    • 

    corecore