10,159 research outputs found

    Exploratory Analysis of Highly Heterogeneous Document Collections

    Full text link
    We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

    Topic Similarity Networks: Visual Analytics for Large Document Sets

    Full text link
    We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData 2014

    mSpace meets EPrints: a Case Study in Creating Dynamic Digital Collections

    No full text
    In this case study we look at issues involved in (a) generating dynamic digital libraries that are on a particular topic but span heterogeneous collections at distinct sites, (b) supplementing the artefacts in that collection with additional information available either from databases at the artefact's home or from the Web at large, and (c) providing an interaction paradigm that will support effective exploration of this new resource. We describe how we used two available frameworks, mSpace and EPrints to support this kind of collection building. The result of the study is a set of recommendations to improve the connectivity of remote resources both to one another and to related Web resources, and that will also reduce problems like co-referencing in order to enable the creation of new collections on demand

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

    Drag it together with Groupie: making RDF data authoring easy and fun for anyone

    No full text
    One of the foremost challenges towards realizing a “Read-write Web of Data” [3] is making it possible for everyday computer users to easily find, manipulate, create, and publish data back to the Web so that it can be made available for others to use. However, many aspects of Linked Data make authoring and manipulation difficult for “normal” (ie non-coder) end-users. First, data can be high-dimensional, having arbitrary many properties per “instance”, and interlinked to arbitrary many other instances in a many different ways. Second, collections of Linked Data tend to be vastly more heterogeneous than in typical structured databases, where instances are kept in uniform collections (e.g., database tables). Third, while highly flexible, the problem of having all structures reduced as a graph is verbosity: even simple structures can appear complex. Finally, many of the concepts involved in linked data authoring - for example, terms used to define ontologies are highly abstract and foreign to regular citizen-users.To counter this complexity we have devised a drag-and-drop direct manipulation interface that makes authoring Linked Data easy, fun, and accessible to a wide audience. Groupie allows users to author data simply by dragging blobs representing entities into other entities to compose relationships, establishing one relational link at a time. Since the underlying representation is RDF, Groupie facilitates the inclusion of references to entities and properties defined elsewhere on the Web through integration with popular Linked Data indexing services. Finally, to make it easy for new users to build upon others’ work, Groupie provides a communal space where all data sets created by users can be shared, cloned and modified, allowing individual users to help each other model complex domains thereby leveraging collective intelligence
    • 

    corecore