20 research outputs found

    Optimising metadata to make high-value content more accessible to Google users

    Get PDF
    Purpose: This paper shows how information in digital collections that have been catalogued using high-quality metadata can be retrieved more easily by users of search engines such as Google. Methodology/approach: The research and proposals described arose from an investigation into the observed phenomenon that pages from the Glasgow Digital Library (gdl.cdlr.strath.ac.uk) were regularly appearing near the top of Google search results shortly after publication, without any deliberate effort to achieve this. The reasons for this phenomenon are now well understood and are described in the second part of the paper. The first part provides context with a review of the impact of Google and a summary of recent initiatives by commercial publishers to make their content more visible to search engines. Findings/practical implications: The literature research provides firm evidence of a trend amongst publishers to ensure that their online content is indexed by Google, in recognition of its popularity with Internet users. The practical research demonstrates how search engine accessibility can be compatible with use of established collection management principles and high-quality metadata. Originality/value: The concept of data shoogling is introduced, involving some simple techniques for metadata optimisation. Details of its practical application are given, to illustrate how those working in academic, cultural and public-sector organisations could make their digital collections more easily accessible via search engines, without compromising any existing standards and practices

    Propagation de métadonnées par l'analyse des liens

    Get PDF
    http://www.emse.fr/~mbeig/PUBLIS/2003-jft-p257-prime.pdfInternational audienc

    Web Document Models for Web Information Retrieval

    Get PDF
    http://www.emse.fr/OSWIR05/2005-oswir-p19-beigbeder.pdfInternational audienceDifferent Web document models in relation to the hyper- text nature of the Web are presented. The Web graph is the most well known and used data extracted from the Web hy- pertext. The ways it has been used in works in relation with information retrieval are surveyed. Finally, some consider- ations about the integration of these works in a Web search engine are presented

    Recurrent neural network learning for text routing

    Full text link

    Virtual WWW Documents: a Concept to Explicit the Structure of WWW Sites

    Get PDF
    http://www.emse.fr/~beigbeder/PUBLIS/1999-BCS-IRSG-p185-doan-v1.pdfInternational audienceThis paper shows a new concept of a virtual WWW document (VWD), as a set of WWW pages representing a logical information space, generally dealing with one particular domain. The VWD is described using metadata in the XML syntax and will be accessed through a metadata.class file, stored at the root level of WWW sites. We'll suggest how the VWD can improve information retrieval on the WWW and reduce the network load generated by the robots. We describe a prototype implemented in JAVA, within an application in the environmental domain. The exchanges of such metadata lay in a flexible architecture based on two kinds of robots : generalists and specialists that collect and organize this metadata, in order to localize the resources on the WWW. They will contribute to the overall auto-organizing information process by exchanging their indices, therefore forwarding their knowledge each other

    Construction et utilisation de contextes autour des noeuds d'un hypertexte pour la recherche d'information

    Get PDF
    http://dn.revuesonline.com/article.jsp?articleId=5190Nous faisons l'hypothĂšse que la mise sous forme hypertexte d'un document atomise l'information dans le sens oĂč les noeuds de l'hypertexte qui sont crĂ©Ă©s ne sont pas auto-suffisants pour pouvoir ĂȘtre apprĂ©hendĂ©s. Sous cette hypothĂšse, le contenu seul du noeud n'est pas suffisant pour l'indexer dans un but de l'insĂ©rer dans un systĂšme de recherche d'information. Nous avons implĂ©mentĂ© et testĂ© une mĂ©thode de construction de contextes autour des noeuds d'un hypertexte en utilisant une mĂ©thode de classification automatique. Cette derniĂšre est basĂ©e sur une mesure de similaritĂ© entre les noeuds prenant en compte Ă  la fois les aspects structurels de l'hypertexte, Ă  savoir les liens entre les noeuds, et le contenu textuel des noeuds. Notre systĂšme de recherche d'information indexe Ă  la fois les noeuds et leurs contextes. Le modĂšle de requĂȘte que nous utilisons est Ă  deux niveaux : niveau sujet et niveau contexte

    Comparing Information Retrieval Effectiveness of Different Metadata Generation Methods

    Get PDF
    This study describes an information retrieval experiment comparing the retrieval effectiveness (recall and precision) for queries run against professionally and automatically generated metadata records. The metadata records represented web pages from the National Institute of Environmental Health Sciences. The results of 10 queries were analyzed in terms of recall and precision for this small-scale study. The results of the study suggest that professionally generated metadata records are not significantly better in terms of information retrieval effectiveness than automatically generated metadata records

    A Web SemĂąntica e suas contribuiçÔes para a ciĂȘncia da informação

    Get PDF
    O presente artigo apresenta o processo de atualização por que passa a World Wide Web na sua transição para o que tem sido chamado de “Web SemĂąntica”. Neste sentido, busca-se identificar as tecnologias, as organizaçÔes associadas e o embasamento filosĂłfico e conceitual subjacentes a esta nova web. O artigo tambĂ©m procura apresentar as imbricaçÔes existentes com a ciĂȘncia da informação e as possibilidades de ampliação de escopo dos seus objetos tradicionais de pesquisa com o aporte dos novos padrĂ”es e tecnologias que estĂŁo sendo desenvolvidos no Ăąmbito da Web SemĂąntica
    corecore