388 research outputs found

    Out of the box phrase indexing

    Get PDF
    Abstract. We present a method for optimizing inverted index based search engines with respect to phrase querying performance. Our approach adds carefully selected two-term phrases to an existing index. While competitive previous work is mainly based on the analysis of query logs, our approach comes out of the box and uses just the information already contained in the index. Even so, our method can compete with previous work in terms of querying performance and actually, it can get ahead of those for difficult queries. Moreover, our selection process gives performance guarantees for arbitrary queries. In a further step, we propose to use a phrase index as a substitute for the positional index of an in-memory search engine containing just short documents. We confirm all of our considerations by experiments on a high-performance mainmemory search engine. However, we believe that our approach can be applied to classical disk based systems as well

    Optimising Information Retrieval from the Web in Low-bandwidth Environments

    Get PDF
    The Internet has potential to deliver information to Web users that have no other way of getting to those resources. However, information on the Web is scattered without any proper semantics for classifying them and thus this makes information discovery difficult. Thus, to ease the querying of this huge bin of information, developers have built tools amongst which are the search engines and Web directories. However, for these tools to give optimal results, two factors need to be given due importance: the users’ ability to use these tools and the bandwidth that is present in these environments. Unfortunately, after an initial study, none of these two factors were present in Mauritius where low bandwidth prevails. Hence, this study helps us get a better idea of how users use the search tools. To achieve this, we designed a survey where Web users were asked about their skills in using search tools. Then, a jump page using the search boxes of different search engines was developed to provide directed guidance for effective searching in low bandwidth environments. We then conducted a further evaluation, using a sample of users to see if there were any changes in the way users access the search tools. The results from this study were then examined. We noticed that the users were initially unaware about the specificities of the different search tools thus preventing efficient use. However, during the survey, they were educated on how to use those tools and this was fruitful when a further evaluation was performed. Hence the efficient use of the search tools helped in reducing the traffic flow in low bandwidth environments

    Optimising metadata to make high-value content more accessible to Google users

    Get PDF
    Purpose: This paper shows how information in digital collections that have been catalogued using high-quality metadata can be retrieved more easily by users of search engines such as Google. Methodology/approach: The research and proposals described arose from an investigation into the observed phenomenon that pages from the Glasgow Digital Library (gdl.cdlr.strath.ac.uk) were regularly appearing near the top of Google search results shortly after publication, without any deliberate effort to achieve this. The reasons for this phenomenon are now well understood and are described in the second part of the paper. The first part provides context with a review of the impact of Google and a summary of recent initiatives by commercial publishers to make their content more visible to search engines. Findings/practical implications: The literature research provides firm evidence of a trend amongst publishers to ensure that their online content is indexed by Google, in recognition of its popularity with Internet users. The practical research demonstrates how search engine accessibility can be compatible with use of established collection management principles and high-quality metadata. Originality/value: The concept of data shoogling is introduced, involving some simple techniques for metadata optimisation. Details of its practical application are given, to illustrate how those working in academic, cultural and public-sector organisations could make their digital collections more easily accessible via search engines, without compromising any existing standards and practices

    Extending information retrieval system model to improve interactive web searching.

    Get PDF
    The research set out with the broad objective of developing new tools to support Web information searching. A survey showed that a substantial number of interactive search tools were being developed but little work on how these new developments fitted into the general aim of helping people find information. Due to this it proved difficult to compare and analyse how tools help and affect users and where they belong in a general scheme of information search tools. A key reason for a lack of better information searching tools was identified in the ill-suited nature of existing information retrieval system models. The traditional information retrieval model is extended by synthesising work in information retrieval and information seeking research. The purpose of this new holistic search model is to assist information system practitioners in identifying, hypothesising, designing and evaluating Web information searching tools. Using the model, a term relevance feedback tool called ‘Tag and Keyword’ (TKy) was developed in a Web browser and it was hypothesised that it could improve query reformulation and reduce unnecessary browsing. The tool was laboratory experimented and quantitative analysis showed statistical significances in increased query reformulations and in reduced Web browsing (per query). Subjects were interviewed after the experiment and qualitative analysis revealed that they found the tool useful and saved time. Interestingly, exploratory analysis on collected data identified three different methods in which subjects had utilised the TKy tool. The research developed a holistic search model for Web searching and demonstrated that it can be used to hypothesise, design and evaluate information searching tools. Information system practitioners using it can better understand the context in which their search tools are developed and how these relate to users’ search processes and other search tools

    SWI-Prolog and the Web

    Get PDF
    Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP

    Automatic and Efficient Cleansing of Illustration Images in Web

    Get PDF
    The scope and nature of image data is crucial to understand and to determine the complexity of image search design. Interest in image retrieval has increased in large due to the rapid growth of the World Wide Web. There are huge number of high quality images for different image category available in web. When a search query is given, the information retrieval system gives us both relevant and irrelevant images to the users. In order to satisfy the requirement of the user and to give relevant details, there are many interactive and automatic methods that exists. The interactive methods are capable of building large collection of images with ground truth labels, but they depend heavily on human efforts. While Automatic methods leverage an object category model trained on text and visual features. The objective of this work is to review the works both interactive and automatic methods proposed for generating a large number of images for a specified object class

    Feasibility report: Delivering case-study based learning using artificial intelligence and gaming technologies

    Get PDF
    This document describes an investigation into the technical feasibility of a game to support learning based on case studies. Information systems students using the game will conduct fact-finding interviews with virtual characters. We survey relevant technologies in computational linguistics and games. We assess the applicability of the various approaches and propose an architecture for the game based on existing techniques. We propose a phased development plan for the development of the game

    Algorithms and Data Structures for In-Memory Text Search Engines

    Get PDF

    Dynamically typed languages

    Get PDF
    Dynamically typed languages such as Python and Ruby have experienced a rapid grown in popularity in recent times. However, there is much confusion as to what makes these languages interesting relative to statically typed languages, and little knowledge of their rich history. In this chapter I explore the general topic of dynamically typed languages, how they differ from statically typed languages, their history, and their defining features
    corecore