11,558 research outputs found

    Topic Similarity Networks: Visual Analytics for Large Document Sets

    Full text link
    We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData 2014

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Tagging Prosody and Discourse Structure in Elicited Spontaneous Speech

    Get PDF
    This paper motivates and describes the annotation and analysis of prosody and discourse structure for several large spoken language corpora. The annotation schema are of two types: tags for prosody and intonation, and tags for several aspects of discourse structure. The choice of the particular tagging schema in each domain is based in large part on the insights they provide in corpus-based studies of the relationship between discourse structure and the accenting of referring expressions in American English. We first describe these results and show that the same models account for the accenting of pronouns in an extended passage from one of the Speech Warehouse hotel-booking dialogues. We then turn to corpora described in Venditti [Ven00], which adapts the same models to Tokyo Japanese. Japanese is interesting to compare to English, because accent is lexically specified and so cannot mark discourse focus in the same way. Analyses of these corpora show that local pitch range expansion serves the analogous focusing function in Japanese. The paper concludes with a section describing several outstanding questions in the annotation of Japanese intonation which corpus studies can help to resolve.Work reported in this paper was supported in part by a grant from the Ohio State University Office of Research, to Mary E. Beckman and co-principal investigators on the OSU Speech Warehouse project, and by an Ohio State University Presidential Fellowship to Jennifer J. Venditti

    Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application

    Get PDF
    Computer vision and natural language processing (NLP) are two active machine learning research areas. However, the integration of these two areas gives rise to a new interdisciplinary field, which is currently attracting more attention of researchers. Research has been carried out to extract the text associated with an image or a video that can assist in making computer vision effective. Moreover, researchers focus on utilizing NLP to extract the meaning of words through the use of computer vision. This concept is widely used in robotics. Although robots should observe the surroundings from different ways of interactions, natural gestures and spoken languages are the most convenient way for humans to interact with the robots. This would be possible only if the robots can understand such types of interactions. In the present paper, the proposed integrated application is utilized for guiding vision-impaired people. As vision is the most essential in the life of a human being, an alternative source that helps in guiding the blind in their movements is highly important. For this purpose, the current paper uses a smartphone with the capabilities of vision, language, and intelligence which has been attached to the blind person to capture the images of their surroundings, and it is associated with a Faster Region Convolutional Neural Network (F-RCNN) based central server to detect the objects in the image to inform the person about them and avoid obstacles in their way. These results are passed to the smartphone which produces a speech output for the guidance of the blinds

    A Computational Theory of the Use-Mention Distinction in Natural Language

    Get PDF
    To understand the language we use, we sometimes must turn language on itself, and we do this through an understanding of the use-mention distinction. In particular, we are able to recognize mentioned language: that is, tokens (e.g., words, phrases, sentences, letters, symbols, sounds) produced to draw attention to linguistic properties that they possess. Evidence suggests that humans frequently employ the use-mention distinction, and we would be severely handicapped without it; mentioned language frequently occurs for the introduction of new words, attribution of statements, explanation of meaning, and assignment of names. Moreover, just as we benefit from mutual recognition of the use-mention distinction, the potential exists for us to benefit from language technologies that recognize it as well. With a better understanding of the use-mention distinction, applications can be built to extract valuable information from mentioned language, leading to better language learning materials, precise dictionary building tools, and highly adaptive computer dialogue systems. This dissertation presents the first computational study of how the use-mention distinction occurs in natural language, with a focus on occurrences of mentioned language. Three specific contributions are made. The first is a framework for identifying and analyzing instances of mentioned language, in an effort to reconcile elements of previous theoretical work for practical use. Definitions for mentioned language, metalanguage, and quotation have been formulated, and a procedural rubric has been constructed for labeling instances of mentioned language. The second is a sequence of three labeled corpora of mentioned language, containing delineated instances of the phenomenon. The corpora illustrate the variety of mentioned language, and they enable analysis of how the phenomenon relates to sentence structure. Using these corpora, inter-annotator agreement studies have quantified the concurrence of human readers in labeling the phenomenon. The third contribution is a method for identifying common forms of mentioned language in text, using patterns in metalanguage and sentence structure. Although the full breadth of the phenomenon is likely to elude computational tools for the foreseeable future, some specific, common rules for detecting and delineating mentioned language have been shown to perform well

    New Perspectives in Teaching Pronunciation

    Get PDF
    pp.165-18

    Exploring Strategies Improving Ielts Listening Score

    Get PDF
    English proficiency test score is one of particular requirements for applying scholarship abroad, including IELTS score. Listening is one of the sections that will be tested in IELTS. A number of research on IELTS test elucidate that the listening section of the IELTS is one of the most difficult parts of the test. As such, this research is conducted to find out some strategies that successful IELTS test takers used to enhance their listening score. This study employs a qualitative research approach. Five English Education Department alumni were selected purposively to discuss their ways and strategies in improving their listening scores. Those alumni have ever take the official IELTS test and obtained IELTS score of 6,5 or higher. In collecting the required data, semi structured interviews were conducted from October to November 2019. Each interview which was recorded took place about 20 – 40 minutes. All recorded data were partially transcribed, and only the data meet the research questions were maintained and irrelevant data were left in the recording system. The data then were coded and analyzed by using Miles and Huberman (2014) data analysis approach. The findings indicate that: (1) the difficulties that they found while following IELTS official test especially in Listening.(2) the strategies that they use in Listening IELTS. Meanwhile some participants have other strategies for upgrading their score if they want to following IELTS again especially in Listening
    • …
    corecore