359 research outputs found

    Performance comparison of clustered and replicated information retrieval systems

    Get PDF
    The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent

    Selective web information retrieval

    Get PDF
    This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries

    University of Glasgow at WebCLEF 2005: experiments in per-field normalisation and language specific stemming

    Get PDF
    We participated in the WebCLEF 2005 monolingual task. In this task, a search system aims to retrieve relevant documents from a multilingual corpus of Web documents from Web sites of European governments. Both the documents and the queries are written in a wide range of European languages. A challenge in this setting is to detect the language of documents and topics, and to process them appropriately. We develop a language specific technique for applying the correct stemming approach, as well as for removing the correct stopwords from the queries. We represent documents using three fields, namely content, title, and anchor text of incoming hyperlinks. We use a technique called per-field normalisation, which extends the Divergence From Randomness (DFR) framework, to normalise the term frequencies, and to combine them across the three fields. We also employ the length of the URL path of Web documents. The ranking is based on combinations of both the language specific stemming, if applied, and the per-field normalisation. We use our Terrier platform for all our experiments. The overall performance of our techniques is outstanding, achieving the overall top four performing runs, as well as the top performing run without metadata in the monolingual task. The best run only uses per-field normalisation, without applying stemming

    Is Meta-Learning the Right Approach for the Cold-Start Problem in Recommender Systems?

    Full text link
    Recommender systems have become fundamental building blocks of modern online products and services, and have a substantial impact on user experience. In the past few years, deep learning methods have attracted a lot of research, and are now heavily used in modern real-world recommender systems. Nevertheless, dealing with recommendations in the cold-start setting, e.g., when a user has done limited interactions in the system, is a problem that remains far from solved. Meta-learning techniques, and in particular optimization-based meta-learning, have recently become the most popular approaches in the academic research literature for tackling the cold-start problem in deep learning models for recommender systems. However, current meta-learning approaches are not practical for real-world recommender systems, which have billions of users and items, and strict latency requirements. In this paper we show that it is possible to obtaining similar, or higher, performance on commonly used benchmarks for the cold-start problem without using meta-learning techniques. In more detail, we show that, when tuned correctly, standard and widely adopted deep learning models perform just as well as newer meta-learning models. We further show that an extremely simple modular approach using common representation learning techniques, can perform comparably to meta-learning techniques specifically designed for the cold-start setting while being much more easily deployable in real-world applications

    A governance framework for development and assessment of national action plans on antimicrobial resistance

    Get PDF
    Strengthening governance is an essential strategy to tackling antimicrobial resistance (AMR) at all levels: global, national, regional, and local. To date, no systematic approach to governance of national action plans on AMR exists. To address this issue, we aimed to develop the first governance framework to offer guidance for both the development and assessment of national action plans on AMR. We reviewed health system governance framework reviews to inform the basic structure of our framework, international guidance documents from WHO, the Food and Agriculture Organization, the World Organisation for Animal Health, and the European Commission, and sought the input of 25 experts from international organisations, government ministries, policy institutes, and academic institutions to develop and refine our framework. The framework consists of 18 domains with 52 indicators that are contained within three governance areas: policy design, implementation tools, and monitoring and evaluation. To consider the dynamic nature of AMR, the framework is conceptualised as a cyclical process, which is responsive to the context and allows for continuous improvement and adaptation of national action plans on AMR

    Στατιστικά Συμπολυμερή Ν–Βινυλοπυρρολιδόνης και 2–Χλωροαιθυλο–Βινυλαιθέρα μέσω Ριζικού Πολυμερισμού Αντιστρεπτής Μεταφοράς Αλυσίδας με Προσθήκη και Απόσπαση. Σύνθεση, Χαρακτηρισμός και Θερμικές Ιδιότητες.

    Get PDF
    Στην παρούσα ερευνητική εργασία μελετάται η σύνθεση, μέσω ριζικού πολυμερισμού αντιστρεπτής μεταφοράς αλυσίδας με προσθήκη και απόσπαση (RAFT), στατιστικών συμπολυμερών P(NVP–stat–CEVE), όπου NVP η Ν–βινυλοπυρρολιδόνη και CEVE ο 2–χλωροαιθυλο–βινυλαιθέρας. Πρώτα μελετήθηκε ο τυχαίος συμπολυμερισμός τους με χρήση των αντιδραστηρίων μεταφοράς (CTAs) [(O–ethylxanthyl)methyl]benzene (CTA–1) και O–ethyl S–(phthalimidymethyl) xanthate (CTA–3), σε διαφορετικές συγκεντρώσεις και θερμοκρασίες, παρουσία και απουσία υδροξειδίου του λιθίου (LiOH), για την εύρεση των βέλτιστων συνθηκών πολυμερισμού. Τέλος, συντέθηκαν επιτυχώς τυχαία συμπολυμερή σε διαφορετικές αναλογίες τροφοδοσίας των δύο μονομερών με χρήση του CTA–1 και υπολογίστηκαν οι λόγοι δραστικότητας με τη χρήση διάφορων υπολογιστικών μεθόδων καθώς και με το πρόγραμμα COPOINT. Τα πολυμερή που συντέθηκαν χαρακτηρίστηκαν με διάφορες μεθόδους. Πραγματοποιήθηκε χρωματογραφία αποκλεισμού μεγεθών (SEC) για το μοριακό τους χαρακτηρισμό. Η σύστασή τους προσδιορίστηκε με φασματοσκοπία πυρηνικού μαγνητικού συντονισμού (NMR). Τέλος, μελετήθηκαν οι θερμικές τους ιδιότητες με διαφορική θερμιδομετρία σάρωσης (DSC) και η κινητική της θερμικής τους αποικοδόμησης με θερμοσταθμική ανάλυση (TGA) και διαφορική θερμοσταθμική ανάλυση (DTG), εφαρμόζοντας τις μεθοδολογίες των Ozawa–Flynn–Wall (OFW) και Kissinger–Akahira–Sunose (KAS).In the present research work, the synthesis of statistical copolymers P(NVP–stat–CEVE), where NVP is N–vinylpyrrolidone and CEVE is 2–chloroethyl vinyl ether, via radical reversible addition fragmentation chain transfer (RAFT) polymerization, is reported. Initially, their statistical copolymerization in different concentrations and temperatures, in present and absence of lithium hydroxide (LiOH), using the chain transfer agents (CTAs) [(O–ethylxanthyl)methyl]benzene (CTA–1) and O–ethyl S–(phthalimidymethyl) xanthate (CTA–3), is studied to find the optimal polymerization conditions. Finally, the statistical copolymers were successfully synthesized in different proportions of the two monomers using CTA–1 and their reactivity ratios were calculated, using various linear graphical methods, as well as the COPOINT program. The synthesized polymers were characterized by various methods. Size exclusion chromatography (SEC) was performed for their molecular characterization. Their composition was determined by nuclear magnetic resonance (NMR) spectroscopy. Finally, their thermal properties were studied by differential scanning calorimetry (DSC) and the kinetics of their thermal degradation by thermogravimetric analysis (TGA) and differential thermogravimetry (DTG), applying the methodologies of Ozawa–Flynn–Wall (OFW) and Kissinger–Akahira–Sunose (KAS)

    Index ordering by query-independent measures

    Get PDF
    Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced

    Concept Matching for Low-Resource Classification

    Full text link
    We propose a model to tackle classification tasks in the presence of very little training data. To this aim, we approximate the notion of exact match with a theoretically sound mechanism that computes a probability of matching in the input space. Importantly, the model learns to focus on elements of the input that are relevant for the task at hand; by leveraging highlighted portions of the training data, an error boosting technique guides the learning process. In practice, it increases the error associated with relevant parts of the input by a given factor. Remarkable results on text classification tasks confirm the benefits of the proposed approach in both balanced and unbalanced cases, thus being of practical use when labeling new examples is expensive. In addition, by inspecting its weights, it is often possible to gather insights on what the model has learned
    corecore