4 research outputs found

    Construction et utilisation de contextes autour des noeuds d'un hypertexte pour la recherche d'information

    Get PDF
    http://dn.revuesonline.com/article.jsp?articleId=5190Nous faisons l'hypothèse que la mise sous forme hypertexte d'un document atomise l'information dans le sens où les noeuds de l'hypertexte qui sont créés ne sont pas auto-suffisants pour pouvoir être appréhendés. Sous cette hypothèse, le contenu seul du noeud n'est pas suffisant pour l'indexer dans un but de l'insérer dans un système de recherche d'information. Nous avons implémenté et testé une méthode de construction de contextes autour des noeuds d'un hypertexte en utilisant une méthode de classification automatique. Cette dernière est basée sur une mesure de similarité entre les noeuds prenant en compte à la fois les aspects structurels de l'hypertexte, à savoir les liens entre les noeuds, et le contenu textuel des noeuds. Notre système de recherche d'information indexe à la fois les noeuds et leurs contextes. Le modèle de requête que nous utilisons est à deux niveaux : niveau sujet et niveau contexte

    Adding eScience Assets to the Data Web

    Get PDF
    Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) project. The OAI-ORE specifications are based on the principles of the Architecture of the World Wide Web, the Semantic Web, and the Linked Data effort. Therefore, their incorporation into the cyberinfrastructure that supports eScholarship will ensure the integration of the products of scholarly research into the Data Web.Comment: 10 pages, 7 figures. Proceedings of Linked Data on the Web (LDOW2009) Worksho

    Selective web information retrieval

    Get PDF
    This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries
    corecore