67,270 research outputs found
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling
Topic segmentation is critical for obtaining structured documents and
improving downstream tasks such as information retrieval. Due to its ability of
automatically exploring clues of topic shift from abundant labeled data, recent
supervised neural models have greatly promoted the development of long document
topic segmentation, but leaving the deeper relationship between coherence and
topic segmentation underexplored. Therefore, this paper enhances the ability of
supervised models to capture coherence from both logical structure and semantic
similarity perspectives to further improve the topic segmentation performance,
proposing Topic-aware Sentence Structure Prediction (TSSP) and Contrastive
Semantic Similarity Learning (CSSL). Specifically, the TSSP task is proposed to
force the model to comprehend structural information by learning the original
relations between adjacent sentences in a disarrayed document, which is
constructed by jointly disrupting the original document at topic and sentence
levels. Moreover, we utilize inter- and intra-topic information to construct
contrastive samples and design the CSSL objective to ensure that the sentences
representations in the same topic have higher similarity, while those in
different topics are less similar. Extensive experiments show that the
Longformer with our approach significantly outperforms old state-of-the-art
(SOTA) methods. Our approach improve of old SOTA by 3.42 (73.74 -> 77.16)
and reduces by 1.11 points (15.0 -> 13.89) on WIKI-727K and achieves an
average relative reduction of 4.3% on on WikiSection. The average
relative drop of 8.38% on two out-of-domain datasets also demonstrates
the robustness of our approach.Comment: Accepted by EMNLP 2023. Codes is available at
https://github.com/alibaba-damo-academy/SpokenNLP
Robust audio indexing for Dutch spoken-word collections
AbstractâWhereas the growth of storage capacity is in accordance with widely acknowledged predictions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technologies. This paper addresses the history and development of robust audio indexing technology for searching Dutch spoken-word collections and compares Dutch audio indexing in the well-studied broadcast news domain with an oral-history case-study. It is concluded that despite significant advances in Dutch audio indexing technology and demonstrated applicability in several domains, further research is indispensable for successful automatic disclosure of spoken-word collections
Dialogue as Data in Learning Analytics for Productive Educational Dialogue
This paper provides a novel, conceptually driven stance on the state of the contemporary analytic challenges faced in the treatment of dialogue as a form of data across on- and offline sites of learning. In prior research, preliminary steps have been taken to detect occurrences of such dialogue using automated analysis techniques. Such advances have the potential to foster effective dialogue using learning analytic techniques that scaffold, give feedback on, and provide pedagogic contexts promoting such dialogue. However, the translation of much prior learning science research to online contexts is complex, requiring the operationalization of constructs theorized in different contexts (often face-to-face), and based on different datasets and structures (often spoken dialogue). In this paper, we explore what could constitute the effective analysis of productive online dialogues, arguing that it requires consideration of three key facets of the dialogue: features indicative of productive dialogue; the unit of segmentation; and the interplay of features and segmentation with the temporal underpinning of learning contexts. The paper thus foregrounds key considerations regarding the analysis of dialogue data in emerging learning analytics environments, both for learning-science and for computationally oriented researchers
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
- âŚ