31,939 research outputs found

    Text Extraction and Web Searching in a Non-Latin Language

    Get PDF
    Recent studies of queries submitted to Internet Search Engines have shown that non-English queries and unclassifiable queries have nearly tripled during the last decade. Most search engines were originally engineered for English. They do not take full account of inflectional semantics nor, for example, diacritics or the use of capitals which is a common feature in languages other than English. The literature concludes that searching using non-English and non-Latin based queries results in lower success and requires additional user effort to achieve acceptable precision. The primary aim of this research study is to develop an evaluation methodology for identifying the shortcomings and measuring the effectiveness of search engines with non-English queries. It also proposes a number of solutions for the existing situation. A Greek query log is analyzed considering the morphological features of the Greek language. Also a text extraction experiment revealed some problems related to the encoding and the morphological and grammatical differences among semantically equivalent Greek terms. A first stopword list for Greek based on a domain independent collection has been produced and its application in Web searching has been studied. The effect of lemmatization of query terms and the factors influencing text based image retrieval in Greek are also studied. Finally, an instructional strategy is presented for teaching non-English students how to effectively utilize search engines. The evaluation of the capabilities of the search engines showed that international and nationwide search engines ignore most of the linguistic idiosyncrasies of Greek and other complex European languages. There is a lack of freely available non-English resources to work with (test corpus, linguistic resources, etc). The research showed that the application of standard IR techniques, such as stopword removal, stemming, lemmatization and query expansion, in Greek Web searching increases precision. i

    Performance Evaluation of Selected Search Engines

    Get PDF
    Search Engines have become an integral part of daily internet usage. The search engine is the first stop for web users when they are looking for a product. Information retrieval may be viewed as a problem of classifying items into one of two classes corresponding to interesting and uninteresting items respectively. A natural performance metric in this context is classification accuracy, defined as the fraction of the system's interesting/uninteresting predictions that agree with the user's assessments. On the other hand, the field of information retrieval has two classical performance evaluation metrics: precision, the fraction of the items retrieved by the system that are interesting to the user, and recall, the fraction of the items of interest to the user that are retrieved by the system. Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. This study evaluates the performance of three Web search engines. A set of measurements is proposed for evaluating Web search engine performance

    Quantitative evaluation of recall and precision of CAT Crawler, a search engine specialized on retrieval of Critically Appraised Topics

    Get PDF
    BACKGROUND: Critically Appraised Topics (CATs) are a useful tool that helps physicians to make clinical decisions as the healthcare moves towards the practice of Evidence-Based Medicine (EBM). The fast growing World Wide Web has provided a place for physicians to share their appraised topics online, but an increasing amount of time is needed to find a particular topic within such a rich repository. METHODS: A web-based application, namely the CAT Crawler, was developed by Singapore's Bioinformatics Institute to allow physicians to adequately access available appraised topics on the Internet. A meta-search engine, as the core component of the application, finds relevant topics following keyword input. The primary objective of the work presented here is to evaluate the quantity and quality of search results obtained from the meta-search engine of the CAT Crawler by comparing them with those obtained from two individual CAT search engines. From the CAT libraries at these two sites, all possible keywords were extracted using a keyword extractor. Of those common to both libraries, ten were randomly chosen for evaluation. All ten were submitted to the two search engines individually, and through the meta-search engine of the CAT Crawler. Search results were evaluated for relevance both by medical amateurs and professionals, and the respective recall and precision were calculated. RESULTS: While achieving an identical recall, the meta-search engine showed a precision of 77.26% (Ā±14.45) compared to the individual search engines' 52.65% (Ā±12.0) (p < 0.001). CONCLUSION: The results demonstrate the validity of the CAT Crawler meta-search engine approach. The improved precision due to inherent filters underlines the practical usefulness of this tool for clinicians

    Query recovery of short user queries: on query expansion with stopwords

    Get PDF
    User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language queries after case folding and stopword removal. Query recovery aims to generate a linguistically well-formed query from a given user query as input to provide natural language processing tasks and cross-language information retrieval (CLIR). The evaluation of query translation shows that translation scores (NIST and BLEU) decrease after case folding, stopword removal, and stemming. A baseline method for query recovery reconstructs capitalization and stopwords, which considerably increases translation scores and significantly increases mean average precision for a standard CLIR task

    A sentence-based image search engine

    Get PDF
    Nowadays people are more interested in searching the relevant images directly through search engines like Google, Yahoo or Bing, these image search engines have dedicated extensive research effort to the problem of keyword-based image retrieval. However, the most widely used keyword-based image search engine Google is reported to have a precision of only 39%. And all of these systems have limitation in creating sentence-based queries for images. This thesis studies a practical image search scenario, where many people feel annoyed by using only keywords to find images for their ideas of speech or presentation through trial and error. This thesis proposes and realizes a sentence-based image search engine (SISE) that offers the option of querying images by sentence. Users can naturally create sentence-based queries simply by inputting one or several sentences to retrieve a list of images that match their ideas well. The SISE relies on automatic concept detection and tagging techniques to provide support for searching visual content using sentence-based queries. The SISE gathered thousands of input sentences from TED talk, covering many areas like science, economy, politics, education and so on. The comprehensive evaluation of this system was focused on usability (perceived image usefulness) aspect. The final comprehensive precision has been reached 60.7%. The SISE is found to be able to retrieve matching images for a wide variety of topics, across different areas, and provide subjectively more useful results than keyword-based image search engines --Abstract, page iii
    • ā€¦
    corecore