793 research outputs found

    Deriving query suggestions for site search

    Get PDF
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

    Using NLP to build the hypertextuel network of a back-of-the-book index

    Full text link
    Relying on the idea that back-of-the-book indexes are traditional devices for navigation through large documents, we have developed a method to build a hypertextual network that helps the navigation in a document. Building such an hypertextual network requires selecting a list of descriptors, identifying the relevant text segments to associate with each descriptor and finally ranking the descriptors and reference segments by relevance order. We propose a specific document segmentation method and a relevance measure for information ranking. The algorithms are tested on 4 corpora (of different types and domains) without human intervention or any semantic knowledge

    HILT : High-Level Thesaurus Project. Phase IV and Embedding Project Extension : Final Report

    Get PDF
    Ensuring that Higher Education (HE) and Further Education (FE) users of the JISC IE can find appropriate learning, research and information resources by subject search and browse in an environment where most national and institutional service providers - usually for very good local reasons - use different subject schemes to describe their resources is a major challenge facing the JISC domain (and, indeed, other domains beyond JISC). Encouraging the use of standard terminologies in some services (institutional repositories, for example) is a related challenge. Under the auspices of the HILT project, JISC has been investigating mechanisms to assist the community with this problem through a JISC Shared Infrastructure Service that would help optimise the value obtained from expenditure on content and services by facilitating subject-search-based resource sharing to benefit users in the learning and research communities. The project has been through a number of phases, with work from earlier phases reported, both in published work elsewhere, and in project reports (see the project website: http://hilt.cdlr.strath.ac.uk/). HILT Phase IV had two elements - the core project, whose focus was 'to research, investigate and develop pilot solutions for problems pertaining to cross-searching multi-subject scheme information environments, as well as providing a variety of other terminological searching aids', and a short extension to encompass the pilot embedding of routines to interact with HILT M2M services in the user interfaces of various information services serving the JISC community. Both elements contributed to the developments summarised in this report

    Ordinary Search Engine Users Carrying Out Complex Search Tasks

    Full text link
    Web search engines have become the dominant tools for finding information on the Internet. Due to their popularity, users apply them to a wide range of search needs, from simple look-ups to rather complex information tasks. This paper presents the results of a study to investigate the characteristics of these complex information needs in the context of Web search engines. The aim of the study is to find out more about (1) what makes complex search tasks distinct from simple tasks and if it is possible to find simple measures for describing their complexity, (2) if search success for a task can be predicted by means of unique measures, and (3) if successful searchers show a different behavior than unsuccessful ones. The study includes 60 people who carried out a set of 12 search tasks with current commercial search engines. Their behavior was logged with the Search-Logger tool. The results confirm that complex tasks show significantly different characteristics than simple tasks. Yet it seems to be difficult to distinguish successful from unsuccessful search behaviors. Good searchers can be differentiated from bad searchers by means of measurable parameters. The implications of these findings for search engine vendors are discussed.Comment: 60 page

    The best of both worlds: highlighting the synergies of combining manual and automatic knowledge organization methods to improve information search and discovery.

    Get PDF
    Research suggests organizations across all sectors waste a significant amount of time looking for information and often fail to leverage the information they have. In response, many organizations have deployed some form of enterprise search to improve the 'findability' of information. Debates persist as to whether thesauri and manual indexing or automated machine learning techniques should be used to enhance discovery of information. In addition, the extent to which a knowledge organization system (KOS) enhances discoveries or indeed blinds us to new ones remains a moot point. The oil and gas industry was used as a case study using a representative organization. Drawing on prior research, a theoretical model is presented which aims to overcome the shortcomings of each approach. This synergistic model could help to re-conceptualize the 'manual' versus 'automatic' debate in many enterprises, accommodating a broader range of information needs. This may enable enterprises to develop more effective information and knowledge management strategies and ease the tension between what arc often perceived as mutually exclusive competing approaches. Certain aspects of the theoretical model may be transferable to other industries, which is an area for further research

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

    Integration of distributed terminology resources to facilitate subject cross-browsing for library portal systems

    Get PDF
    With the increase in the number of distributed library information resources, users may have to interact with different user interfaces, learn to switch their mental models between these interfaces, and familiarise themselves with controlled vocabularies used by different resources. For this reason, library professionals have developed library portals to integrate these distributed information resources, and assist end-users in cross-accessing distributed resources via a single access point in their own library. There are two important subject-based services that a library portal system might be able to provide. The first is a federated search service, which refers to a process where a user can input a query to cross-search a number of information resources. The second is a subject cross-browsing service, which can offer a knowledge navigation tree to link subject schemes used by distributed resources. However, the development of subject cross-searching and browsing services has been impeded by the heterogeneity of different KOS (Knowledge Organisation System) used by different information resources. Due to the lack of mappings between different KOS, it is impossible to offer a subject cross-browsing service for a library portal system. [Continues.

    Multi-Faceted Search and Navigation of Biological Databases

    Get PDF

    Ontology-Driven Search and Triage: Design of a Web-Based Visual Interface for MEDLINE

    Get PDF
    Background: Diverse users need to search health and medical literature to satisfy open-ended goals such as making evidence-based decisions and updating their knowledge. However, doing so is challenging due to at least two major difficulties: (1) articulating information needs using accurate vocabulary and (2) dealing with large document sets returned from searches. Common search interfaces such as PubMed do not provide adequate support for exploratory search tasks. Objective: Our objective was to improve support for exploratory search tasks by combining two strategies in the design of an interactive visual interface by (1) using a formal ontology to help users build domain-specific knowledge and vocabulary and (2) providing multi-stage triaging support to help mitigate the information overload problem. Methods: We developed a Web-based tool, Ontology-Driven Visual Search and Triage Interface for MEDLINE (OVERT-MED), to test our design ideas. We implemented a custom searchable index of MEDLINE, which comprises approximately 25 million document citations. We chose a popular biomedical ontology, the Human Phenotype Ontology (HPO), to test our solution to the vocabulary problem. We implemented multistage triaging support in OVERT-MED, with the aid of interactive visualization techniques, to help users deal with large document sets returned from searches. Results: Formative evaluation suggests that the design features in OVERT-MED are helpful in addressing the two major difficulties described above. Using a formal ontology seems to help users articulate their information needs with more accurate vocabulary. In addition, multistage triaging combined with interactive visualizations shows promise in mitigating the information overload problem. Conclusions: Our strategies appear to be valuable in addressing the two major problems in exploratory search. Although we tested OVERT-MED with a particular ontology and document collection, we anticipate that our strategies can be transferred successfully to other contexts
    corecore