7,656 research outputs found

    Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes

    Get PDF
    In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10

    Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision

    Get PDF
    Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark

    Making money with clouds: Revenue optimization through automated policy decisions

    Get PDF
    Business intelligence (BI) systems and tools are broadly adopted in organizations today, supporting activities such as data analysis, managerial decision making, and business-performance measurement. Our research investigates the integration of feedback and recommendation mechanisms (FRM) into BI solutions. We define FRM as textual, visual, and/or graphical cues that are embedded into front-end BI tools and guide the end-user to consider using certain data subsets and analysis forms. Our working hypothesis is that the integration of FRM will improve the usability of BI tools and increase the benefits that end-users and organizations can gain from data resources. Our first research stage focuses on FRM based on assessment of previous usage and the associated value gain. We describe the development of such FRM, and the design of an experiment that will test the usability and the benefits of their integration. Our experiment incorporates value-driven usage metadata - a novel methodology for tracking and communicating the usage of data, linked to a quantitative assessment of the value gained. We describe a high-level architecture for supporting the collection, storage, and presentation of this new metadata form, and a quantitative method for assessing it

    Web archives: the future

    Get PDF
    T his report is structured first, to engage in some speculative thought about the possible futures of the web as an exercise in prom pting us to think about what we need to do now in order to make sure that we can reliably and fruitfully use archives of the w eb in the future. Next, we turn to considering the methods and tools being used to research the live web, as a pointer to the types of things that can be developed to help unde rstand the archived web. Then , we turn to a series of topics and questions that researchers want or may want to address using the archived web. In this final section, we i dentify some of the challenges individuals, organizations, and international bodies can target to increase our ability to explore these topi cs and answer these quest ions. We end the report with some conclusions based on what we have learned from this exercise

    TOME: Interactive TOpic Model and MEtadata Visualization

    Get PDF
    As archives are being digitized at an increasing rate, scholars will require new tools to make sense of this expanding amount of material. We propose to build TOME, a tool to support the interactive exploration and visualization of text-based archives. Drawing upon the technique of topic modeling--a computational method for identifying themes that recur across a collection--TOME will visualize the topics that characterize each archive, as well as the relationships between specific topics and related metadata, such as publication date. An archive of 19th-century antislavery newspapers, characterized by diverse authors and shifting political alliances, will serve as our initial dataset; it promises to motivate new methods for visualizing topic models and extending their impact. In turn, by applying our new methods to these texts, we will illuminate how issues of gender and racial identity affect the development of political ideology in the nineteenth century, and into the present day

    Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

    No full text
    Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

    A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature

    Full text link
    Discovering authoritative links between publications and the datasets that they use can be a labor-intensive process. We introduce a natural language processing pipeline that retrieves and reviews publications for informal references to research datasets, which complements the work of data librarians. We first describe the components of the pipeline and then apply it to expand an authoritative bibliography linking thousands of social science studies to the data-related publications in which they are used. The pipeline increases recall for literature to review for inclusion in data-related collections of publications and makes it possible to detect informal data references at scale. We contribute (1) a novel Named Entity Recognition (NER) model that reliably detects informal data references and (2) a dataset connecting items from social science literature with datasets they reference. Together, these contributions enable future work on data reference, data citation networks, and data reuse.Comment: 13 pages, 7 figures, 3 table
    corecore