11 research outputs found
Extracting Meaningful Metadata
The paper identifies the importance of context based metadata
extraction for more meaningful web. It further discusses
context thesaurus approach for metadata extraction
Geração (semi)automática de metadados: um contributo para a recuperação de objectos de aprendizagem
A alteração da Lei de Bases do Sistema Educativo Português, impulsionada pelo Processo de Bolonha, abre várias
oportunidades de utilização do e-Learning não só no âmbito da formação contínua, mas também no contexto da formação
inicial. Tal como em muitas outras instituições, o Moodle foi a opção de b-Learning mais natural e viável para a Escola
Superior de Educação de Bragança. Com os recentes cursos (Plano Bolonha) foram identificados novos requisitos:
necessidade de obter informação actual sobre os programas e conteúdos das diferentes disciplinas para suportar a tomada
de decisão de inscrição em novas disciplinas; exigência de uma aprendizagem rápida; e obtenção de informação para
suportar processos interdisciplinares. Essa informação reside na plataforma de e-Learning. Contudo, o Moodle não possui
um mecanismo de pesquisa próprio que permita localizar e recuperar informação sobre os recursos de aprendizagem,
garantindo que o objecto de aprendizagem propriamente dito não seja visualizado por utilizadores ou alunos não
autorizados. A geração (semi)automática de metadados para facilitar a localização e recuperação dos objectos de
aprendizagem foi a solução encontrada para responder aos requisitos identificados sem afectar a estrutura global do
sistema de e-Learning. Assim, este artigo tem como principal objectivo descrever as actividades de especificação e
desenvolvimento da solução encontrada
Comparing Information Retrieval Effectiveness of Different Metadata Generation Methods
This study describes an information retrieval experiment comparing the retrieval effectiveness (recall and precision) for queries run against professionally and automatically generated metadata records. The metadata records represented web pages from the National Institute of Environmental Health Sciences. The results of 10 queries were analyzed in terms of recall and precision for this small-scale study. The results of the study suggest that professionally generated metadata records are not significantly better in terms of information retrieval effectiveness than automatically generated metadata records
Generation of Classificatory Metadata for Web Resources using Social Tags
With the increasing popularity of social tagging systems, the potential for using social tags as a source of metadata is being explored. Social tagging systems can simplify the involvement of a large number of users and improve the metadata generation process, especially for semantic metadata. This research aims to find a method to categorize web resources using social tags as metadata. In this research, social tagging systems are a mechanism to allow non-professional catalogers to participate in metadata generation. Because social tags are not from a controlled vocabulary, there are issues that have to be addressed in finding quality terms to represent the content of a resource. This research examines ways to deal with those issues to obtain a set of tags representing the resource from the tags provided by users.Two measurements that measure the importance of a tag are introduced. Annotation Dominance (AD) is a measurement of how much a tag term is agreed to by users. Another is Cross Resources Annotation Discrimination (CRAD), a measurement to discriminate tags in the collection. It is designed to remove tags that are used broadly or narrowly in the collection. Further, the study suggests a process to identify and to manage compound tags. The research aims to select important annotations (meta-terms) and remove meaningless ones (noise) from the tag set. This study, therefore, suggests two main measurements for getting a subset of tags with classification potential. To evaluate the proposed approach to find classificatory metadata candidates, we rely on users' relevance judgments comparing suggested tag terms and expert metadata terms. Human judges rate how relevant each term is on an n-point scale based on the relevance of each of the terms for the given resource
Exploring multi-granular documentation strategies for the representation, discovery and use of geographic information
This thesis explores how digital representations of geography and Geographic
Information (GI) may be described, and how these descriptions facilitate the use of
the resources they depict. More specifically, it critically examines existing geospatial
documentation practices and aims to identify opportunities for refinement therein,
whether when used to signpost those data assets documented, for managing and
maintaining information assets, or to assist in resource interpretation and
discrimination. Documentation of GI can therefore facilitate its utilisation; it can be
reasonably expected that by refining documentation practices, GI hold the potential
for being better exploited. The underpinning theme connecting the individual papers
of the thesis is one of multi-granular documentation. GI may be recorded at varying
degrees of granularity, and yet traditional documentation efforts have predominantly
focussed on a solitary level (that of the geospatial data layer). Developing
documentation practices to account for other granularities permits the description of
GI at different levels of detail and can further assist in realising its potential through
better discovery, interpretation and use. One of the aims of the current work is to
establish the merit of such multi-granular practices. Over the course of four research
papers and a short research article, proprietary as well as open source software
approaches are accordingly presented and provide proof-of-concept and conceptual
solutions that aim to enhance GI utilisation through improved documentation
practices. Presented in the context of an existing body of research, the proposed
approaches focus on the technological infrastructure supporting data discovery, the
automation of documentation processes and the implications of describing geospatial
information resources of varying granularity. Each paper successively contributes to the notion that geospatial resources are potentially better exploited when
documentation practices account for the multi-granular aspects of GI, and the
varying ways in which such documentation may be used. In establishing the merit of
multi-granular documentation, it is nevertheless recognised in the current work that
instituting a comprehensive documentation strategy at several granularities may be
unrealistic for some geospatial applications. Pragmatically, the level of effort
required would be excessive, making universal adoption impractical. Considering
however the ever-expanding volumes of geospatial data gathered and the demand for
ways of managing and maintaining the usefulness of potentially unwieldy
repositories, improved documentation practices are required. A system of
hierarchical documentation, of self-documenting information, would provide for
information discovery and retrieval from such expanding resource pools at multiple
granularities, improve the accessibility of GI and ultimately, its utilisation
Automatisches Klassifizieren : Verfahren zur Erschliessung elektronischer Dokumente
Automatic classification of text documents refers to the computerized allocation of class numbers from existing classification schemes to natural language texts by means of suitable algorithms. Based upon a comprehensive literature review, this thesis establishes an informed and up-to-date view of the applicability of automatic classification for the subject approach to electronic documents, particularly to Web resources. Both methodological aspects and the experiences drawn from relevant projects and applications are covered. Concerning methodology, the present state-of-the-art comprises a number of statistical approaches that rely on machine learning; these methods use pre-classified example documents for establishing a model - the "classifier" - which is then used for classifying new documents. However, the four large-scale projects conducted in the 1990s by the Universities of Lund, Wolverhampton and Oldenburg, and by OCLC (Dublin, OH), still used rather simple and more traditional methodological approaches. These projects are described and analyzed in detail. As they made use of traditional library classifications their results are significant for LIS, even if no permanent quality services have resulted from these endeavours. The analysis of other relevant applications and projects reveals a number of attempts to use automatic classification for document processing in the fields of patent and media documentation. Here, semi-automatic solutions that support human classifiers are preferred, due to the yet unsatisfactory classification results obtained by fully automated systems. Other interesting implementations include Web portals, search engines and (commercial) information services, whereas only little interest has been shown in the automatic classification of books and bibliographic records. In the concluding part of the study the author discusses the most significant applications and projects, and also addresses several problems and issues in the context of automatic classification
Ontology-Based Information Sharing in Weakly Structured Environments
Harmelen, F.A.H. van [Promotor]Herzog, O. [Copromotor
The development of a model of information seeking behaviour of students in higher education when using internet search engines.
This thesis develops a model of Web information seeking behaviour of postgraduate students with a specific focus on Web search engines' use. It extends Marchionini's eight stage model of information seeking, geared towards electronic environments, to holistically encompass the physical, cognitive, affective and social dimensions of Web users' behaviour. The study recognises the uniqueness of the Web environment as a vehicle for information dissemination and retrieval, drawing on the distinction between information searching and information seeking and emphasises the importance of following user-centred holistic approaches to study information seeking behaviour. It reviews the research in the field and demonstrates that there is no comprehensive model that explains the behaviour of Web users when employing search engines for information retrieval. The methods followed to develop the study are explained with a detailed analysis of the four dimensions of information seeking (physical, cognitive affective, social). Emphasis is placed on the significance of combined methods (qualitative and quantitative) and the ways in which they can enrich the examination of human behaviour. This is concluded with a discussion of methodological issues. The study is supported by an empirical investigation, which examines the relationship between interactive information retrieval using Web search engines and human information seeking processes. This investigates the influence of cognitive elements (such as learning and problem style, and creative ability) and affective characteristics (e. g. confidence, loyalty, familiarity, ease of use), as well as the role that system experience, domain knowledge and demographics play in information seeking behaviour and in user overall satisfaction with the retrieval result. The influence of these factors is analysed by identifying users' patterns of behaviour and tactics, adopted to solve specific problems. The findings of the empirical study are incorporated into an enriched information-seeking model, encompassing use of search engines, which reveals a complex interplay between physical, cognitive, affective and social elements and that none of these characteristics can be seen in isolation when attempting to explain the complex phenomenon of information seeking behaviour. Although the model is presented in a linear fashion the dynamic, reiterative and circular character of the information seeking process is explained through an emphasis on transition patterns between the different stages. The research concludes with a discussion of problems encountered by Web information seekers which provides detailed analysis of the reasons why users express satisfaction or dissatisfaction with the results of Web searching, areas in which Web search engines can be improved and issues related to the need for students to be given additional training and support are identified. These include planning and organising information, recognising different dimensions of information intents and needs, emphasising the importance of variety in Web information seeking, promoting effective formulation of queries and ranking, reducing overload of information and assisting effective selection of Web sites and critical examination of results
Metadatos y recuperación de información: estándares, problemas y aplicabilidad en bibliotecas digitales
Programa de Doctorado en DocumentaciónPresidente: Mercedes Caridad Sebastián. - Secretario: Antonio Hernández Pérez. - Vocales: José Carlos Rovira Soler, Eulalia Fuentes i Pujol, José Antonio Gómez Hernánde