77 research outputs found

    Deriving and Exploiting Situational Information in Speech: Investigations in a Simulated Search and Rescue Scenario

    Get PDF
    The need for automatic recognition and understanding of speech is emerging in tasks involving the processing of large volumes of natural conversations. In application domains such as Search and Rescue, exploiting automated systems for extracting mission-critical information from speech communications has the potential to make a real difference. Spoken language understanding has commonly been approached by identifying units of meaning (such as sentences, named entities, and dialogue acts) for providing a basis for further discourse analysis. However, this fine-grained identification of fundamental units of meaning is sensitive to high error rates in the automatic transcription of noisy speech. This thesis demonstrates that topic segmentation and identification techniques can be employed for information extraction from spoken conversations by being robust to such errors. Two novel topic-based approaches are presented for extracting situational information within the search and rescue context. The first approach shows that identifying the changes in the context and content of first responders' report over time can provide an estimation of their location. The second approach presents a speech-based topological map estimation technique that is inspired, in part, by automatic mapping algorithms commonly used in robotics. The proposed approaches are evaluated on a goal-oriented conversational speech corpus, which has been designed and collected based on an abstract communication model between a first responder and a task leader during a search process. Results have confirmed that a highly imperfect transcription of noisy speech has limited impact on the information extraction performance compared with that obtained on the transcription of clean speech data. This thesis also shows that speech recognition accuracy can benefit from rescoring its initial transcription hypotheses based on the derived high-level location information. A new two-pass speech decoding architecture is presented. In this architecture, the location estimation from a first decoding pass is used to dynamically adapt a general language model which is used for rescoring the initial recognition hypotheses. This decoding strategy has resulted in a statistically significant gain in the recognition accuracy of the spoken conversations in high background noise. It is concluded that the techniques developed in this thesis can be extended to more application domains that deal with large volumes of natural spoken conversations

    LL(O)D and NLP perspectives on semantic change for humanities research

    Get PDF
    CC BY 4.0This paper presents an overview of the LL(O)D and NLP methods, tools and data for detecting and representing semantic change, with its main application in humanities research. The paperā€™s aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST Action Nexus Linguarum, European network for Web-centred linguistic data science, CA18209. The survey focuses on the essential aspects needed to understand the current trends and to build applications in this area of study

    DIR 2011: Dutch_Belgian Information Retrieval Workshop Amsterdam

    Get PDF

    Towards generic relation extraction

    Get PDF
    A vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g., PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database that can be more effectively used for querying and automated reasoning. However, adapting conventional relation extraction systems to new domains or tasks requires significant effort from annotators and developers. Furthermore, previous adaptation approaches based on bootstrapping start from example instances of the target relations, thus requiring that the correct relation type schema be known in advance. Generic relation extraction (GRE) addresses the adaptation problem by applying generic techniques that achieve comparable accuracy when transferred, without modification of model parameters, across domains and tasks. Previous work on GRE has relied extensively on various lexical and shallow syntactic indicators. I present new state-of-the-art models for GRE that incorporate governordependency information. I also introduce a dimensionality reduction step into the GRE relation characterisation sub-task, which serves to capture latent semantic information and leads to significant improvements over an unreduced model. Comparison of dimensionality reduction techniques suggests that latent Dirichlet allocation (LDA) ā€“ a probabilistic generative approach ā€“ successfully incorporates a larger and more interdependent feature set than a model based on singular value decomposition (SVD) and performs as well as or better than SVD on all experimental settings. Finally, I will introduce multi-document summarisation as an extrinsic test bed for GRE and present results which demonstrate that the relative performance of GRE models is consistent across tasks and that the GRE-based representation leads to significant improvements over a standard baseline from the literature. Taken together, the experimental results 1) show that GRE can be improved using dependency parsing and dimensionality reduction, 2) demonstrate the utility of GRE for the content selection step of extractive summarisation and 3) validate the GRE claim of modification-free adaptation for the first time with respect to both domain and task. This thesis also introduces data sets derived from publicly available corpora for the purpose of rigorous intrinsic evaluation in the news and biomedical domains

    Metadiscourse Tagging in Academic Lectures

    Get PDF
    This thesis presents a study into the nature and structure of academic lectures, with a special focus on metadiscourse phenomena. Metadiscourse refers to a set of linguistics expressions that signal specific discourse functions such as the Introduction: ā€œToday we will talk about...ā€ and Emphasising: ā€œThis is an important pointā€. These functions are important because they are part of lecturersā€™ strategies in understanding of what happens in a lecture. The knowledge of their presence and identity could serve as initial steps toward downstream applications that will require functional analysis of lecture content such as a browser for lectures archives, summarisation, or an automatic minute-taker for lectures. One challenging aspect for metadiscourse detection and classification is that the set of expressions are semi-fixed, meaning that different phrases can indicate the same function. To that end a four-stage approach is developed to study metadiscourse in academic lectures. Firstly, a corpus of metadiscourse for academic lectures from Physics and Economics courses is built by adapting an existing scheme that describes functional-oriented metadiscourse categories. Second, because producing reference transcripts is a time-consuming task and prone to some errors due to the manual efforts required, an automatic speech recognition (ASR) system is built specifically to produce transcripts of lectures. Since the reference transcripts lack time-stamp information, an alignment system is applied to the reference to be able to evaluate the ASR system. Then, a model is developed using Support Vector Machines (SVMs) to classify metadiscourse tags using both textual and acoustical features. The results show that n-grams are the most inductive features for the task; however, due to data sparsity the model does not generalise for unseen n-grams. This limits its ability to solve the variation issue in metadiscourse expressions. Continuous Bag-of-Words (CBOW) provide a promising solution as this can capture both the syntactic and semantic similarities between words and thus is able to solve the generalisation issue. However, CBOW ignores the word order completely, something which is very important to be retained when classifying metadiscourse tags. The final stage aims to address the issue of sequence modelling by developing a joint CBOW and Convolutional Neural Network (CNN) model. CNNs can work with continuous features such as word embedding in an elegant and robust fashion by producing a fixed-size feature vector that is able to identify indicative local information for the tagging task. The results show that metadiscourse tagging using CNNs outperforms the SVMs model significantly even on ASR outputs, owing to its ability to predict a sequence of words that is more representative for the task regardless of its position in the sentence. In addition, the inclusion of other features such as part-of-speech (POS) tags and prosodic cues improved the results further. These findings are consistent in both disciplines. The final contribution in this thesis is to investigate the suitability of using metadiscourse tags as discourse features in the lecture structure segmentation model, despite the fact that the task is approached as a classification model and most of the state-of-art models are unsupervised. In general, the obtained results show remarkable improvements over the state-of-the-art models in both disciplines
    • ā€¦
    corecore