5,531 research outputs found

    The Best Trail Algorithm for Assisted Navigation of Web Sites

    Full text link
    We present an algorithm called the Best Trail Algorithm, which helps solve the hypertext navigation problem by automating the construction of memex-like trails through the corpus. The algorithm performs a probabilistic best-first expansion of a set of navigation trees to find relevant and compact trails. We describe the implementation of the algorithm, scoring methods for trails, filtering algorithms and a new metric called \emph{potential gain} which measures the potential of a page for future navigation opportunities.Comment: 11 pages, 11 figure

    Using Search Engine Technology to Improve Library Catalogs

    Get PDF
    This chapter outlines how search engine technology can be used in online public access library catalogs (OPACs) to help improve users’ experiences, to identify users’ intentions, and to indicate how it can be applied in the library context, along with how sophisticated ranking criteria can be applied to the online library catalog. A review of the literature and current OPAC developments form the basis of recommendations on how to improve OPACs. Findings were that the major shortcomings of current OPACs are that they are not sufficiently user-centered and that their results presentations lack sophistication. Further, these shortcomings are not addressed in current 2.0 developments. It is argued that OPAC development should be made search-centered before additional features are applied. While the recommendations on ranking functionality and the use of user intentions are only conceptual and not yet applied to a library catalogue, practitioners will find recommendations for developing better OPACs in this chapter. In short, readers will find a systematic view on how the search engines’ strengths can be applied to improving libraries’ online catalogs

    Finding cultural heritage images through a Dual-Perspective Navigation Framework

    Get PDF
    With the increasing volume of digital images, improving techniques for image findability is receiving heightened attention. The cultural heritage sector, with its vast resource of images, has realized the value of social tags and started using tags in parallel with controlled vocabularies to increase the odds of users finding images of interest. The research presented in this paper develops the Dual-Perspective Navigation Framework (DPNF), which integrates controlled vocabularies and social tags to represent the aboutness of an item more comprehensively, in order that the information scent can be maximized to facilitate resource findability. DPNF utilizes the mechanisms of faceted browsing and tag-based navigation to offer a seamless interaction between experts’ subject headings and public tags during image search. In a controlled user study, participants effectively completed more exploratory tasks with the DPNF interface than with the tag-only interface. DPNF is more efficient than both single descriptor interfaces (subject heading-only and tag-only interfaces). Participants spent significantly less time, fewer interface interactions, and less back tracking to complete an exploratory task without an extra workload. In addition, participants were more satisfied with the DPNF interface than with the others. The findings of this study can assist interface designers struggling with what information is most helpful to users and facilitate searching tasks. It also maximizes end users’ chances of finding target images by engaging image information from two sources: the professionals’ description of items in a collection and the crowd's assignment of social tags

    Applying Wikipedia to Interactive Information Retrieval

    Get PDF
    There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval

    Context-aware Document-clustering Technique

    Get PDF
    Document clustering is an intentional act that should reflect individuals’ preferences with regard to the semantic coherency or relevant categorization of documents and should conform to the context of a target task under investigation. Thus, effective documentclustering techniques need to take into account a user’s categorization context defined by or relevant to the target task under consideration. However, existing document-clustering techniques generally anchor in pure content-based analysis and therefore are not able to facilitate context-aware document-clustering. In response, we propose a Context-Aware document-Clustering (CAC) technique that takes into consideration a user’s categorization preference (expressed as a list of anchoring terms) relevant to the context of a target task and subsequently generates a set of document clusters from this specific contextual perspective. Our empirical evaluation results suggest that our proposed CAC technique outperforms the pure content-based document-clustering technique

    Collaborative Filtering-based Context-Aware Document-Clustering (CF-CAC) Technique

    Get PDF
    Document clustering is an intentional act that should reflect an individual\u27s preference with regard to the semantic coherency or relevant categorization of documents and should conform to the context of a target task under investigation. Thus, effective document clustering techniques need to take into account a user\u27s categorization context. In response, Yang & Wei (2007) propose a Context-Aware document Clustering (CAC) technique that takes into consideration a user\u27s categorization preference relevant to the context of a target task and subsequently generates a set of document clusters from this specific contextual perspective. However, the CAC technique encounters the problem of small-sized anchoring terms. To overcome this shortcoming, we extend the CAC technique and propose a Collaborative Filtering-based Context-Aware document-Clustering (CF-CAC) technique that considers not only a target user\u27s but also other users\u27 anchoring terms when approximating the categorization context of the target user. Our empirical evaluation results suggest that our proposed CF-CAC technique outperforms the CAC technique

    Open Data

    Get PDF
    Open data is freely usable, reusable, or redistributable by anybody, provided there are safeguards in place that protect the data’s integrity and transparency. This book describes how data retrieved from public open data repositories can improve the learning qualities of digital networking, particularly performance and reliability. Chapters address such topics as knowledge extraction, Open Government Data (OGD), public dashboards, intrusion detection, and artificial intelligence in healthcare

    Design and evaluation of improvement method on the web information navigation - A stochastic search approach

    Get PDF
    With the advent of fast growing Internet and World Wide Web (the Web), more and more companies enhance the business competitiveness by conducting electronic commerce. At the same time, more and more people gather or process information by surfing on the Web. However, due to unbalanced Web traffic and poorly organized information, users suffer from slow communication and disordered information. To improve the situation, information providers can analyze the traffic and Uniform Resource Locator (URL) counters to adjust the information layering and organization; nevertheless, heterogeneous navigation patterns and dynamic fluctuating Web traffic complicate the improvement process. Alternatively, improvement can be made by giving direct guidance to the surfers in navigating the Web sites. In this paper, information retrieval on a Web site is modeled as a Markov chain associated with the corresponding dynamic Web traffic and designated information pages. We consider four models of information retrieval based on combination of the level of skill or experience of the surfers as well as the degree of navigation support by the sites. Simulation is conducted to evaluate the performance of the different types of navigation guidance. In addition, we evaluate the four models of information retrieval in terms of complexity and applicability. The paper concludes with a research summary and a direction for future research efforts. © 2009 Elsevier B.V. All rights reserved.postprin

    Professional Search in Pharmaceutical Research

    Get PDF
    In the mid 90s, visiting libraries – as means of retrieving the latest literature – was still a common necessity among professionals. Nowadays, professionals simply access information by ‘googling’. Indeed, the name of the Web search engine market leader “Google” became a synonym for searching and retrieving information. Despite the increased popularity of search as a method for retrieving relevant information, at the workplace search engines still do not deliver satisfying results to professionals. Search engines for instance ignore that the relevance of answers (the satisfaction of a searcher’s needs) depends not only on the query (the information request) and the document corpus, but also on the working context (the user’s personal needs, education, etc.). In effect, an answer which might be appropriate to one user might not be appropriate to the other user, even though the query and the document corpus are the same for both. Personalization services addressing the context become therefore more and more popular and are an active field of research. This is only one of several challenges encountered in ‘professional search’: How can the working context of the searcher be incorporated in the ranking process; how can unstructured free-text documents be enriched with semantic information so that the information need can be expressed precisely at query time; how and to which extent can a company’s knowledge be exploited for search purposes; how should data from distributed sources be accessed from into one-single-entry-point. This thesis is devoted to ‘professional search’, i.e. search at the workplace, especially in industrial research and development. We contribute by compiling and developing several approaches for facing the challenges mentioned above. The approaches are implemented into the prototype YASA (Your Adaptive Search Agent) which provides meta-search, adaptive ranking of search results, guided navigation, and which uses domain knowledge to drive the search processes. YASA is deployed in the pharmaceutical research department of Roche in Penzberg – a major pharmaceutical company – in which the applied methods were empirically evaluated. Being confronted with mostly unstructured free-text documents and having barely explicit metadata at hand, we faced a serious challenge. Incorporating semantics (i.e. formal knowledge representation) into the search process can only be as good as the underlying data. Nonetheless, we are able to demonstrate that this issue can be largely compensated by incorporating automatic metadata extraction techniques. The metadata we were able to extract automatically was not perfectly accurate, nor did the ontology we applied contain considerably “rich semantics”. Nonetheless, our results show that already the little semantics incorporated into the search process, suffices to achieve a significant improvement in search and retrieval. We thus contribute to the research field of context-based search by incorporating the working context into the search process – an area which so far has not yet been well studied
    corecore