1,785 research outputs found

    Enhancing Search and Browse Using Automated Clustering of Subject Metadata

    Full text link
    The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.http://deepblue.lib.umich.edu/bitstream/2027.42/58766/1/07hagedorn.pd

    Augmenting Dublin Core digital library metadata with Dewey Decimal Classification

    Get PDF
    Purpose – The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query. Design/methodology/approach – The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records. Findings – The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies. Research limitations/implications – The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity. Practical implications – The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing. Social implications – The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries. Originality/value – The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger. </jats:sec

    "You Tube and I Find" - personalizing multimedia content access

    Full text link
    Recent growth in broadband access and proliferation of small personal devices that capture images and videos has led to explosive growth of multimedia content available everywhereVfrom personal disks to the Web. While digital media capture and upload has become nearly universal with newer device technology, there is still a need for better tools and technologies to search large collections of multimedia data and to find and deliver the right content to a user according to her current needs and preferences. A renewed focus on the subjective dimension in the multimedia lifecycle, fromcreation, distribution, to delivery and consumption, is required to address this need beyond what is feasible today. Integration of the subjective aspects of the media itselfVits affective, perceptual, and physiological potential (both intended and achieved), together with those of the users themselves will allow for personalizing the content access, beyond today&rsquo;s facility. This integration, transforming the traditional multimedia information retrieval (MIR) indexes to more effectively answer specific user needs, will allow a richer degree of personalization predicated on user intention and mode of interaction, relationship to the producer, content of the media, and their history and lifestyle. In this paper, we identify the challenges in achieving this integration, current approaches to interpreting content creation processes, to user modelling and profiling, and to personalized content selection, and we detail future directions. The structure of the paper is as follows: In Section I, we introduce the problem and present some definitions. In Section II, we present a review of the aspects of personalized content and current approaches for the same. Section III discusses the problem of obtaining metadata that is required for personalized media creation and present eMediate as a case study of an integrated media capture environment. Section IV presents the MAGIC system as a case study of capturing effective descriptive data and putting users first in distributed learning delivery. The aspects of modelling the user are presented as a case study in using user&rsquo;s personality as a way to personalize summaries in Section V. Finally, Section VI concludes the paper with a discussion on the emerging challenges and the open problems

    From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web

    No full text
    A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results

    HILT IV : subject interoperability through building and embedding pilot terminology web services

    Get PDF
    A report of work carried out within the JISC-funded HILT Phase IV project, the paper looks at the project's context against the background of other recent and ongoing terminologies work, describes its outcome and conclusions, including technical outcomes and terminological characteristics, and considers possible future research and development directions. The Phase IV project has taken HILT to the point where the launch of an operational support service in the area of subject interoperability is a feasible option and where both investigation of specific needs in this area and practical collaborative work are sensible and feasible next steps. Moving forward requires detailed work, not only on terminology interoperability and associated service delivery issues, but also on service and end user needs and engagement, service sustainability issues, and the practicalities of interworking with other terminology services and projects in UK, Europe, and global contexts

    Contexts and Contributions: Building the Distributed Library

    Get PDF
    This report updates and expands on A Survey of Digital Library Aggregation Services, originally commissioned by the DLF as an internal report in summer 2003, and released to the public later that year. It highlights major developments affecting the ecosystem of scholarly communications and digital libraries since the last survey and provides an analysis of OAI implementation demographics, based on a comparative review of repository registries and cross-archive search services. Secondly, it reviews the state-of-practice for a cohort of digital library aggregation services, grouping them in the context of the problem space to which they most closely adhere. Based in part on responses collected in fall 2005 from an online survey distributed to the original core services, the report investigates the purpose, function and challenges of next-generation aggregation services. On a case-by-case basis, the advances in each service are of interest in isolation from each other, but the report also attempts to situate these services in a larger context and to understand how they fit into a multi-dimensional and interdependent ecosystem supporting the worldwide community of scholars. Finally, the report summarizes the contributions of these services thus far and identifies obstacles requiring further attention to realize the goal of an open, distributed digital library system

    The best of both worlds: highlighting the synergies of combining manual and automatic knowledge organization methods to improve information search and discovery.

    Get PDF
    Research suggests organizations across all sectors waste a significant amount of time looking for information and often fail to leverage the information they have. In response, many organizations have deployed some form of enterprise search to improve the 'findability' of information. Debates persist as to whether thesauri and manual indexing or automated machine learning techniques should be used to enhance discovery of information. In addition, the extent to which a knowledge organization system (KOS) enhances discoveries or indeed blinds us to new ones remains a moot point. The oil and gas industry was used as a case study using a representative organization. Drawing on prior research, a theoretical model is presented which aims to overcome the shortcomings of each approach. This synergistic model could help to re-conceptualize the 'manual' versus 'automatic' debate in many enterprises, accommodating a broader range of information needs. This may enable enterprises to develop more effective information and knowledge management strategies and ease the tension between what arc often perceived as mutually exclusive competing approaches. Certain aspects of the theoretical model may be transferable to other industries, which is an area for further research

    Integration of Heterogeneous Digital Libraries with Semi-automatic Mapping and Browsing: From Formalization to Specification to Visualization

    Get PDF
    In this paper, we formalize the digital library (DL) integration problem and propose an overall approach based on the 5S framework. We apply 5S to domain-specific (archaeological) DLs, illustrating our solutions for key problems in DL integration. We use ETANA-DL as a case study to describe the process of semi-automatically generating a union catalog and a unified browsing service in an archaeological DL. A visual schema mapping tool is developed for union catalog creation. A pilot user study aids tool evaluation. Our approach is further validated through application of a general browsing component to two integrated DLs

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Still a Lot to Lose: The Role of Controlled Vocabulary in Keyword Searching

    Get PDF
    In their 2005 study, Gross and Taylor found that more than a third of records retrieved by keyword searches would be lost without subject headings. A review of the literature since then shows that numerous studies, in various disciplines, have found that a quarter to a third of records returned in a keyword search would be lost without controlled vocabulary. Other writers, though, have continued to suggest that controlled vocabulary be discontinued. Addressing criticisms of the Gross/Taylor study, this study replicates the search process in the same online catalog, but after the addition of automated enriched metadata such as tables of contents and summaries. The proportion of results that would be lost remains high
    • …
    corecore