8 research outputs found

    Large-scale cluster-based retrieval experiments on Turkish texts

    Get PDF
    We present cluster-based retrieval (CBR) experiments on the largest available Turkish document collection. Our experiments evaluate retrieval effectiveness and efficiency on both an automatically generated clustering structure and a manual classification of documents. In particular, we compare CBR effectiveness with full-text search (FS) and evaluate several implementation alternatives for CBR. Our findings reveal that CBR yields comparable effectiveness figures with FS. Furthermore, by using a specifically tailored cluster-skipping inverted index we significantly improve in-memory query processing efficiency of CBR in comparison to other traditional CBR techniques and even FS

    Locality sensitive batch selection for triplet networks.

    Get PDF
    Triplet networks are deep metric learners which learn to optimise a feature space using similarity knowledge gained from training on triplets of data simultaneously. The architecture relies on the triplet loss function to optimise its weights based upon the distance between triplet members. Composition of input triplets therefore directly impacts the quality of the learned representations, meaning that a training scheme which optimises their formation is crucial. However, an exhaustive search for the best triplets is prohibitive unless the search for triplets is confined to smaller training regions or batches. Accordingly, current triplet mining approaches use informed selection applied only to a random minibatch, but the resulting view fails to exploit areas of complexity in the feature space. In this work, we introduce a locality-sensitive batching strategy, which uses the locality of examples to create batches as an alternative to the commonly adopted randomly minibatching. Our results demonstrate this method to offer better performance on three image and two text classification tasks with statistical significance. Importantly most of these gains are incrementally realised with as little as 25% of the training iterations

    Efficient processing of category-restricted queries for web directories

    Get PDF
    We show that a cluster-skipping inverted index (CS-IIS) is a practical and efficient file structure to support category-restricted queries for searching Web directories. The query processing strategy with CS-IIS improves CPU time efficiency without imposing any limitations on the directory size. © 2008 Springer-Verlag Berlin Heidelberg

    MACHINE LEARNING BASED MEDICAL INFORMATION RETRIEVAL SYSTEMS

    Get PDF
    As many fields progress with the assistance of cognitive computing, the field of health care is also adapting, providing many benefits to all users. However, advancements in this area are hindered by several challenges such as the void between user queries and the knowledge base, query mismatches, and range of domain knowledge in users. In this research, we explore existing methodologies as well as look into existing real-life applications that are used in the medical field today. We also look into specific challenges and techniques that can be used to overcome these barriers, specifically related to cognitive computing in the medical domain. Future information retrieval (IR) models that can be tailored specifically for medically intensive applications which can handle large amounts of data are explored as well. The purpose of this work is to give the reader an in-depth understanding of artificial intelligence being used in the medical field today, as well as future possibilities in the domain. The models and techniques designed and discussed in this research can help provide a framework, or starting point for those interested in effectively developing, maintaining, and using these models to help improve the quality of health-care. Furthermore, we explore the development process of such a model and discuss the steps including data collection, processing, model creation, and also improvement

    Information retrieval on turkish texts

    Get PDF
    In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stop-word list in indexing. © 2007 Wiley Periodicals, Inc

    Large-scale cluster-based retrieval experiments on Turkish texts

    No full text
    We present cluster-based retrieval (CBR) experiments on the largest available Turkish document collection. Our experiments evaluate retrieval effectiveness and efficiency on both an automatically generated clustering structure and a manual classification of documents. In particular, we compare CBR effectiveness with full-text search (FS) and evaluate several implementation alternatives for CBR. Our findings reveal that CBR yields comparable effectiveness figures with FS. Furthermore, by using a specifically tailored cluster-skipping inverted index we significantly improve in-memory query processing efficiency of CBR in comparison to other traditional CBR techniques and even FS

    Improving the efficiency of search engines : strategies for focused crawling, searching, and index pruning

    Get PDF
    Ankara : The Department of Computer Engineering and the Instıtute of Engineering and Science of Bilkent University, 2009.Thesis (Ph. D.) -- Bilkent University, 2009.Includes bibliographical references leaves 157-169.Search engines are the primary means of retrieval for text data that is abundantly available on the Web. A standard search engine should carry out three fundamental tasks, namely; crawling the Web, indexing the crawled content, and finally processing the queries using the index. Devising efficient methods for these tasks is an important research topic. In this thesis, we introduce efficient strategies related to all three tasks involved in a search engine. Most of the proposed strategies are essentially applicable when a grouping of documents in its broadest sense (i.e., in terms of automatically obtained classes/clusters, or manually edited categories) is readily available or can be constructed in a feasible manner. Additionally, we also introduce static index pruning strategies that are based on the query views. For the crawling task, we propose a rule-based focused crawling strategy that exploits interclass rules among the document classes in a topic taxonomy. These rules capture the probability of having hyperlinks between two classes. The rulebased crawler can tunnel toward the on-topic pages by following a path of off-topic pages, and thus yields higher harvest rate for crawling on-topic pages. In the context of indexing and query processing tasks, we concentrate on conducting efficient search, again, using document groups; i.e., clusters or categories. In typical cluster-based retrieval (CBR), first, clusters that are most similar to a given free-text query are determined, and then documents from these clusters are selected to form the final ranked output. For efficient CBR, we first identify and evaluate some alternative query processing strategies. Next, we introduce a new index organization, so-called cluster-skipping inverted index structure (CS-IIS). It is shown that typical-CBR with CS-IIS outperforms previous CBR strategies (with an ordinary index) for a number of datasets and under varying search parameters. In this thesis, an enhanced version of CS-IIS is further proposed, in which all information to compute query-cluster similarities during query evaluation is stored. We introduce an incremental-CBR strategy that operates on top of this latter index structure, and demonstrate its search efficiency for different scenarios. Finally, we exploit query views that are obtained from the search engine query logs to tailor more effective static pruning techniques. This is also related to the indexing task involved in a search engine. In particular, query view approach is incorporated into a set of existing pruning strategies, as well as some new variants proposed by us. We show that query view based strategies significantly outperform the existing approaches in terms of the query output quality, for both disjunctive and conjunctive evaluation of queries.Altıngövde, İsmail SengörPh.D
    corecore