11 research outputs found

    Extracting Meaningful Metadata

    Get PDF
    The paper identifies the importance of context based metadata extraction for more meaningful web. It further discusses context thesaurus approach for metadata extraction

    Geração (semi)automática de metadados: um contributo para a recuperação de objectos de aprendizagem

    Get PDF
    A alteração da Lei de Bases do Sistema Educativo Português, impulsionada pelo Processo de Bolonha, abre várias oportunidades de utilização do e-Learning não só no âmbito da formação contínua, mas também no contexto da formação inicial. Tal como em muitas outras instituições, o Moodle foi a opção de b-Learning mais natural e viável para a Escola Superior de Educação de Bragança. Com os recentes cursos (Plano Bolonha) foram identificados novos requisitos: necessidade de obter informação actual sobre os programas e conteúdos das diferentes disciplinas para suportar a tomada de decisão de inscrição em novas disciplinas; exigência de uma aprendizagem rápida; e obtenção de informação para suportar processos interdisciplinares. Essa informação reside na plataforma de e-Learning. Contudo, o Moodle não possui um mecanismo de pesquisa próprio que permita localizar e recuperar informação sobre os recursos de aprendizagem, garantindo que o objecto de aprendizagem propriamente dito não seja visualizado por utilizadores ou alunos não autorizados. A geração (semi)automática de metadados para facilitar a localização e recuperação dos objectos de aprendizagem foi a solução encontrada para responder aos requisitos identificados sem afectar a estrutura global do sistema de e-Learning. Assim, este artigo tem como principal objectivo descrever as actividades de especificação e desenvolvimento da solução encontrada

    Comparing Information Retrieval Effectiveness of Different Metadata Generation Methods

    Get PDF
    This study describes an information retrieval experiment comparing the retrieval effectiveness (recall and precision) for queries run against professionally and automatically generated metadata records. The metadata records represented web pages from the National Institute of Environmental Health Sciences. The results of 10 queries were analyzed in terms of recall and precision for this small-scale study. The results of the study suggest that professionally generated metadata records are not significantly better in terms of information retrieval effectiveness than automatically generated metadata records

    Generation of Classificatory Metadata for Web Resources using Social Tags

    Get PDF
    With the increasing popularity of social tagging systems, the potential for using social tags as a source of metadata is being explored. Social tagging systems can simplify the involvement of a large number of users and improve the metadata generation process, especially for semantic metadata. This research aims to find a method to categorize web resources using social tags as metadata. In this research, social tagging systems are a mechanism to allow non-professional catalogers to participate in metadata generation. Because social tags are not from a controlled vocabulary, there are issues that have to be addressed in finding quality terms to represent the content of a resource. This research examines ways to deal with those issues to obtain a set of tags representing the resource from the tags provided by users.Two measurements that measure the importance of a tag are introduced. Annotation Dominance (AD) is a measurement of how much a tag term is agreed to by users. Another is Cross Resources Annotation Discrimination (CRAD), a measurement to discriminate tags in the collection. It is designed to remove tags that are used broadly or narrowly in the collection. Further, the study suggests a process to identify and to manage compound tags. The research aims to select important annotations (meta-terms) and remove meaningless ones (noise) from the tag set. This study, therefore, suggests two main measurements for getting a subset of tags with classification potential. To evaluate the proposed approach to find classificatory metadata candidates, we rely on users' relevance judgments comparing suggested tag terms and expert metadata terms. Human judges rate how relevant each term is on an n-point scale based on the relevance of each of the terms for the given resource

    Exploring multi-granular documentation strategies for the representation, discovery and use of geographic information

    Get PDF
    This thesis explores how digital representations of geography and Geographic Information (GI) may be described, and how these descriptions facilitate the use of the resources they depict. More specifically, it critically examines existing geospatial documentation practices and aims to identify opportunities for refinement therein, whether when used to signpost those data assets documented, for managing and maintaining information assets, or to assist in resource interpretation and discrimination. Documentation of GI can therefore facilitate its utilisation; it can be reasonably expected that by refining documentation practices, GI hold the potential for being better exploited. The underpinning theme connecting the individual papers of the thesis is one of multi-granular documentation. GI may be recorded at varying degrees of granularity, and yet traditional documentation efforts have predominantly focussed on a solitary level (that of the geospatial data layer). Developing documentation practices to account for other granularities permits the description of GI at different levels of detail and can further assist in realising its potential through better discovery, interpretation and use. One of the aims of the current work is to establish the merit of such multi-granular practices. Over the course of four research papers and a short research article, proprietary as well as open source software approaches are accordingly presented and provide proof-of-concept and conceptual solutions that aim to enhance GI utilisation through improved documentation practices. Presented in the context of an existing body of research, the proposed approaches focus on the technological infrastructure supporting data discovery, the automation of documentation processes and the implications of describing geospatial information resources of varying granularity. Each paper successively contributes to the notion that geospatial resources are potentially better exploited when documentation practices account for the multi-granular aspects of GI, and the varying ways in which such documentation may be used. In establishing the merit of multi-granular documentation, it is nevertheless recognised in the current work that instituting a comprehensive documentation strategy at several granularities may be unrealistic for some geospatial applications. Pragmatically, the level of effort required would be excessive, making universal adoption impractical. Considering however the ever-expanding volumes of geospatial data gathered and the demand for ways of managing and maintaining the usefulness of potentially unwieldy repositories, improved documentation practices are required. A system of hierarchical documentation, of self-documenting information, would provide for information discovery and retrieval from such expanding resource pools at multiple granularities, improve the accessibility of GI and ultimately, its utilisation

    Automatisches Klassifizieren : Verfahren zur Erschliessung elektronischer Dokumente

    Get PDF
    Automatic classification of text documents refers to the computerized allocation of class numbers from existing classification schemes to natural language texts by means of suitable algorithms. Based upon a comprehensive literature review, this thesis establishes an informed and up-to-date view of the applicability of automatic classification for the subject approach to electronic documents, particularly to Web resources. Both methodological aspects and the experiences drawn from relevant projects and applications are covered. Concerning methodology, the present state-of-the-art comprises a number of statistical approaches that rely on machine learning; these methods use pre-classified example documents for establishing a model - the "classifier" - which is then used for classifying new documents. However, the four large-scale projects conducted in the 1990s by the Universities of Lund, Wolverhampton and Oldenburg, and by OCLC (Dublin, OH), still used rather simple and more traditional methodological approaches. These projects are described and analyzed in detail. As they made use of traditional library classifications their results are significant for LIS, even if no permanent quality services have resulted from these endeavours. The analysis of other relevant applications and projects reveals a number of attempts to use automatic classification for document processing in the fields of patent and media documentation. Here, semi-automatic solutions that support human classifiers are preferred, due to the yet unsatisfactory classification results obtained by fully automated systems. Other interesting implementations include Web portals, search engines and (commercial) information services, whereas only little interest has been shown in the automatic classification of books and bibliographic records. In the concluding part of the study the author discusses the most significant applications and projects, and also addresses several problems and issues in the context of automatic classification

    Ontology-Based Information Sharing in Weakly Structured Environments

    Get PDF
    Harmelen, F.A.H. van [Promotor]Herzog, O. [Copromotor

    The development of a model of information seeking behaviour of students in higher education when using internet search engines.

    Get PDF
    This thesis develops a model of Web information seeking behaviour of postgraduate students with a specific focus on Web search engines' use. It extends Marchionini's eight stage model of information seeking, geared towards electronic environments, to holistically encompass the physical, cognitive, affective and social dimensions of Web users' behaviour. The study recognises the uniqueness of the Web environment as a vehicle for information dissemination and retrieval, drawing on the distinction between information searching and information seeking and emphasises the importance of following user-centred holistic approaches to study information seeking behaviour. It reviews the research in the field and demonstrates that there is no comprehensive model that explains the behaviour of Web users when employing search engines for information retrieval. The methods followed to develop the study are explained with a detailed analysis of the four dimensions of information seeking (physical, cognitive affective, social). Emphasis is placed on the significance of combined methods (qualitative and quantitative) and the ways in which they can enrich the examination of human behaviour. This is concluded with a discussion of methodological issues. The study is supported by an empirical investigation, which examines the relationship between interactive information retrieval using Web search engines and human information seeking processes. This investigates the influence of cognitive elements (such as learning and problem style, and creative ability) and affective characteristics (e. g. confidence, loyalty, familiarity, ease of use), as well as the role that system experience, domain knowledge and demographics play in information seeking behaviour and in user overall satisfaction with the retrieval result. The influence of these factors is analysed by identifying users' patterns of behaviour and tactics, adopted to solve specific problems. The findings of the empirical study are incorporated into an enriched information-seeking model, encompassing use of search engines, which reveals a complex interplay between physical, cognitive, affective and social elements and that none of these characteristics can be seen in isolation when attempting to explain the complex phenomenon of information seeking behaviour. Although the model is presented in a linear fashion the dynamic, reiterative and circular character of the information seeking process is explained through an emphasis on transition patterns between the different stages. The research concludes with a discussion of problems encountered by Web information seekers which provides detailed analysis of the reasons why users express satisfaction or dissatisfaction with the results of Web searching, areas in which Web search engines can be improved and issues related to the need for students to be given additional training and support are identified. These include planning and organising information, recognising different dimensions of information intents and needs, emphasising the importance of variety in Web information seeking, promoting effective formulation of queries and ranking, reducing overload of information and assisting effective selection of Web sites and critical examination of results

    Metadatos y recuperación de información: estándares, problemas y aplicabilidad en bibliotecas digitales

    Get PDF
    Programa de Doctorado en DocumentaciónPresidente: Mercedes Caridad Sebastián. - Secretario: Antonio Hernández Pérez. - Vocales: José Carlos Rovira Soler, Eulalia Fuentes i Pujol, José Antonio Gómez Hernánde
    corecore