45,309 research outputs found

    A Heuristic Baseline Method for Metadata Extraction from Scanned Electronic Theses and Dissertations

    Get PDF
    Extracting metadata from scholarly papers is an important text mining problem. Widely used open-source tools such as GROBID are designed for born-digital scholarly papers but often fail for scanned documents, such as Electronic Theses and Dissertations (ETDs). Here we present a preliminary baseline work with a heuristic model to extract metadata from the cover pages of scanned ETDs. The process started with converting scanned pages into images and then text files by applying OCR tools. Then a series of carefully designed regular expressions for each field is applied, capturing patterns for seven metadata fields: titles, authors, years, degrees, academic programs, institutions, and advisors. The method is evaluated on a ground truth dataset comprised of rectified metadata provided by the Virginia Tech and MIT libraries. Our heuristic method achieves an accuracy of up to 97% on the fields of the ETD text files. Our method poses a strong baseline for machine learning based methods. To our best knowledge, this is the first work attempting to extract metadata from non-born-digital ETDs

    Metadata and ontologies for organizing students’ memories and learning: standards and convergence models for context awareness

    Get PDF
    Este artículo trata de las ontologías que sirven para la comprensión en contexto y la Gestión de la Información Personal (PIM)y su aplicabilidad al proyecto Memex Metadata(M2). M2 es un proyecto de investigación de la Universidad de Carolina del Norte en Chapel Hill para mejorar la memoria digital de los alumnos utilizando tablet PC, la tecnología SenseCam de Microsoft y otras tecnologías móviles(p.ej. un dispositivo de GPS) para capturar el contexto del aprendizaje. Este artículo presenta el proyecto M2, dicute el concepto de los portafolios digitales en las actuales tendencias educativas, relacionándolos con las tecnologías emergentes, revisa las ontologías relevantes y su relación con el proyecto CAF (Context Awareness Framework), y concluye identificando las líneas de investigación futuras.This paper focuses on ontologies supporting context awareness and Personal Information Management (PIM) and their applicability in Memex Metadata (M2) project. M2 is a research project of the University of North Carolina at Chapel Hill to improve student digital memories using the tablet PC, Microsoft’s SenseCam technology, and other mobile technologies (e.g., a GPS device) to capture context. The M2 project offers new opportunities studying students’ learning with digital technologies. This paper introduces the M2 project; discusses E-portfolios and current educational trends related to pervasive computing; reviews relevant ontologies and their relationship to the projects’ CAF (context awareness framework), and concludes by identifying future research directions

    Geoscience after IT: Part L. Adjusting the emerging information system to new technology

    Get PDF
    Coherent development depends on following widely used standards that respect our vast legacy of existing entries in the geoscience record. Middleware ensures that we see a coherent view from our desktops of diverse sources of information. Developments specific to managing the written word, map content, and structured data come together in shared metadata linking topics and information types

    Changing Trains at Wigan: Digital Preservation and the Future of Scholarship

    Get PDF
    This paper examines the impact of the emerging digital landscape on long term access to material created in digital form and its use for research; it examines challenges, risks and expectations.

    File forensics for RAW camera image formats

    Get PDF
    Recent research in multimedia forensics has developed a variety of methods to detect image tampering and to identify the origin of image files. Many of these techniques are based on characteristics in the JPEG format, as it is the most used file format for digital images. In recent years RAW image formats have gained popularity among amateur and professional photographers. This increase in their use and possible misuse makes these file formats an important subject to file forensic examinations. The aim of this paper is to explore to which extend methods previously developed for images in JPEG format can be applied to RAW image formats

    A Study on the Open Source Digital Library Software's: Special Reference to DSpace, EPrints and Greenstone

    Get PDF
    The richness in knowledge has changed access methods for all stake holders in retrieving key knowledge and relevant information. This paper presents a study of three open source digital library management software used to assimilate and disseminate information to world audience. The methodology followed involves online survey and study of related software documentation and associated technical manuals.Comment: 9 Pages, 3 Figures, 1 Table, "Published with International Journal of Computer Applications (IJCA)
    • 

    corecore