125 research outputs found

    Decoding billions of integers per second through vectorization

    Get PDF
    In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

    Quality of Information Regarding Repair Restorations on Dentist Websites: Systematic Search and Analysis

    Get PDF
    Background: Repairing instead of replacing partially defective dental restorations represents a minimally invasive treatment concept, and repairs are associated with advantages over complete restoration replacement. To participate in the shared decision-making process when facing partially defective restorations, patients need to be aware of the indications, limitations, and advantages or disadvantages of repairs. Patients are increasingly using the internet to gain health information like this online. Objective: We aimed to assess the quality of German-speaking dentist websites on repairs of partially defective restorations. Methods: Three electronic search engines were used to identify German-speaking websites of dental practices mentioning repairs. Regarding information on repairs, websites were assessed for (1) technical and functional aspects, (2) comprehensiveness of information, and (3) generic quality and risk of bias. Domains 1 and 3 were scored using validated tools (LIDA and DISCERN). Comprehensiveness was assessed using a criterion checklist related to evidence, advantages and disadvantages, restorations and defects suitable for repairs, and information regarding technical implementation. Generalized linear modeling was used to assess the impact of practice-specific parameters (practice location, practice setting, dental society membership, and year of examination or license to practice dentistry) on the quality of information. An overall quality score was calculated by averaging the quality scores of all three domains and used as primary outcome parameter. Quality scores of all three domains were also assessed individually and used as secondary outcomes. Results: Fifty websites were included. The median score of quality of information was 23.2% (interquartile range [IQR] 21.7%-26.2%). Technical and functional aspects (55.2% [IQR 51.7%-58.6%]) showed significantly higher quality than comprehensiveness of information (8.3% [IQR 8.3%-16.7%]) and generic quality and risk of bias (3.6% [IQR 0.0%-7.1%]; P.05/generalized linear modeling). Conclusions: The quality of German-speaking dentist websites on repairs was limited. Despite sufficient technical and functional quality, the provided information was neither comprehensive nor trustworthy. There is great need to improve the quality of information to fully and reliably inform patients, thereby allowing shared decision making

    Index ordering by query-independent measures

    Get PDF
    Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced

    Neonatal Ventilator Associated Pneumonia: A Quality Improvement Initiative Focusing on Antimicrobial Stewardship

    Get PDF
    Background and Aims: Neonatal ventilator associated pneumonia (VAP) is a common nosocomial infection and a frequent reason for empirical antibiotic therapy in NICUs. Nonetheless, there is no international consensus regarding diagnostic criteria and management. In a first step, we analyzed the used diagnostic criteria, risk factors and therapeutic management of neonatal VAP by a literature review. In a second step, we aimed to compare suspected vs. confirmed neonatal VAP episodes in our unit according to different published criteria and to analyze interrater-reliability of chest x-rays. Additionally, we aimed to evaluate the development of VAP incidence and antibiotic use after implementation of multifaceted quality improvement changes regarding antimicrobial stewardship and infection control (VAP-prevention-bundle, early-extubation policy, antimicrobial stewardship rounds).Methods: Neonates until 44 weeks of gestation with suspected VAP, hospitalized at our level-III NICU in Lucerne from September 2014 to December 2017 were enrolled. VAP episodes were analyzed according to 4 diagnostic frameworks. Agreement regarding chest x-ray interpretation done by 10 senior physicians was assessed. Annual incidence of suspected and confirmed neonatal VAP episodes and antibiotic days were calculated and compared for the years 2015, 2016, and 2017.Results: 17 studies were identified in our literature review. Overall, CDC-guidelines or similar criteria, requesting radiographic changes as main criteria, are mostly used. Comparison of suspected vs. confirmed neonatal VAP episodes showed a great variance (20.4 vs. 4.5/1,000 ventilator-days). The interrater-reliability of x-ray interpretation was poor (intra-class correlation 0.25). Implemented changes resulted in a gradual decline in annual VAP incidence and antibiotic days from 2015 compared with 2017 (28.8 vs. 7.4 suspected episodes/1,000 ventilator-days, 5.5 vs. 0 confirmed episodes/1,000 ventilator-days and 211 vs. 34.7 antibiotic days/1,000 ventilation-days, respectively).Conclusion: The incidence of suspected VAP and concomitant antibiotic use is much higher than for confirmed VAP, therefore inclusion of suspected episodes should be considered for accurate evaluation. There is a high diagnostic inconsistency and a low reliability of interpretation of chest x-rays regarding VAP. Implementation of combined antimicrobial stewardship and infection control measures may lead to an effective decrease in VAP incidence and antibiotic use

    One-Pass Ranking Models for Low-Latency Product Recommendations

    Full text link
    Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommenda-tions by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the näıve applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers’ preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equiva

    Multi-User File System Search

    Get PDF
    Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query. The goal of this thesis is to close the gap between information retrieval research and the requirements exacted by these real-life applications. The algorithms and data structures presented in this thesis can be used to implement a file system search engine that is able to react to changes in the file system by updating its index data in real time. File changes (in-sertions, deletions, or modifications) are reflected by the search results within a few seconds

    Indexing time vs. query time: tradeoffs in dynamic information retrieval systems

    No full text
    We examine issues in the design of fully dynamic information retrieval systems with support for instantaneous document insertions and deletions. We present one such system and discuss some of the major design decisions. These decisions affect both the indexing and the query processing efficiency of our system and thus represent genuine trade-offs between indexing and query processing performance. Two aspects of the retrieval system – fast, incremental updates and garbage collection for delayed document deletions – are discussed in detail, with a focus on the respective trade-offs. Depending on the relative number of queries and update operations, different strategies lead to optimal overall performance. Special attention is given to a particular case of dynamic search systems – desktop and file system search. As one of the main results of this paper, we demonstrate how security mechanisms necessary for multiuser support can be extended to realize efficient document deletions
    • 

    corecore