201 research outputs found

    Comprehensive characterization of an open source document search engine

    Get PDF
    This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1

    Partitioning of the molecular density matrix over atoms and bonds

    Full text link
    A double-index atomic partitioning of the molecular first-order density matrix is proposed. Contributions diagonal in the atomic indices correspond to atomic density matrices, whereas off-diagonal contributions carry information about the bonds. The resulting matrices have good localization properties, in contrast to single-index atomic partitioning schemes of the molecular density matrix. It is shown that the electron density assigned to individual atoms, when derived from the density matrix partitioning, can be made con- sistent with well-known partitions of the electron density over AIM basins, either with sharp or with fuzzy boundaries. The method is applied to a test set of about 50 molecules, representative for various types of chemical binding. A close correlation is observed between the trace of the bond matrices and the SEDI (shared electron density index) bond index.Comment: 25 pages, 8 figures, preprin

    Inverted Index Partitioning Strategies for a Distributed Search Engine

    Get PDF
    One of the greatest challenges in information retrieval is to develop an intelligent system for user and machine interaction that supports users in their quest for relevant information. The dramatic increase in the amount of Web content gives rise to the need for a large-scale distributed information retrieval system, targeted to support millions of users and terabytes of data. To retrieve information from such a large amount of data in an efficient manner, the index is split among the servers in a distributed information retrieval system. Thus, partitioning the index among these collaborating nodes plays an important role in enhancing the performance of a distributed search engine. The two widely known inverted index partitioning schemes for a distributed information retrieval system are document partitioning and term partitioning. %In a document partitioned system, each of the server hosts a subset of the documents in the collection, and execute every query against its local sub-collection. In a term partitioned index, each node is responsible for a subset of the terms in the collection, and serves them to a central node as they are required for query evaluation. In this thesis, we introduce the Document over Term inverted index distribution scheme, which splits a set of nodes into several groups (sub-clusters) and then performs document partitioning between the groups and term partitioning within the group. As this approach is based on the term and document index partitioning approaches, we also refer it as a Hybrid Inverted Index. This approach retains the disk access benefits of term partitioning and the benefits of sharing computational load, scalability, maintainability, and availability of the document partitioning. We also introduce the Document over Document index partitioning scheme, based on the document partitioning approach. In this approach, a set of nodes is split into groups and documents in the collection are partitioned between groups and also within each group. This strategy retains all the benefits of the document partitioning approach, but reduces the computational load more effectively and uses resources more efficiently. We compare distributed index approaches experimentally and show that in terms of efficiency and scalability, document partition based approaches perform significantly better than the others. The Document over Term partitioning offers efficient utilization of search-servers and lowers disk access, but suffers from the problem of load imbalance. The Document over Document partitioning emerged to be the preferred method during high workload

    Performing Nonlinear Blind Source Separation with Signal Invariants

    Full text link
    Given a time series of multicomponent measurements x(t), the usual objective of nonlinear blind source separation (BSS) is to find a "source" time series s(t), comprised of statistically independent combinations of the measured components. In this paper, the source time series is required to have a density function in (s,ds/dt)-space that is equal to the product of density functions of individual components. This formulation of the BSS problem has a solution that is unique, up to permutations and component-wise transformations. Separability is shown to impose constraints on certain locally invariant (scalar) functions of x, which are derived from local higher-order correlations of the data's velocity dx/dt. The data are separable if and only if they satisfy these constraints, and, if the constraints are satisfied, the sources can be explicitly constructed from the data. The method is illustrated by using it to separate two speech-like sounds recorded with a single microphone.Comment: 8 pages, 3 figure

    Index partitioning in Oracle8i: analysis and evaluation of performance aspects

    Get PDF
    Große Tabellen verlangen einen angepaßten physischen Datenbankentwurf. Neben dem Mittel der Tabellenpartitionierung, die sich bereits bewährt hat, ist die Indexpartitionierung dazugekommen. In dieser Arbeit soll untersucht werden, ob die Partitionierung von Indexen nennenswerte Leistungssteigerungen bei der Anfragebearbeitung ergibt. Dazu wird eine Messung auf einem Oracle8i-DBMS, Version 8.1.5, durchgeführt. Die Meßdaten geben Hinweise auf Leistungssteigerungen in einigen Fällen. Es treten jedoch Leistungseinbußen in vielen anderen Fällen auf. Der Optimierer des verwendeten DBMS scheint darüberhinaus grundsätzliche Probleme im Umgang mit Indexpartitionierung zu haben

    A Running Time Improvement for Two Thresholds Two Divisors Algorithm

    Get PDF
    Chunking algorithms play an important role in data de-duplication systems. The Basic Sliding Window (BSW) algorithm is the first prototype of the content-based chunking algorithm which can handle most types of data. The Two Thresholds Two Divisors (TTTD) algorithm was proposed to improve the BSW algorithm in terms of controlling the variations of the chunk-size. In this project, we investigate and compare the BSW algorithm and TTTD algorithm from different factors by a series of systematic experiments. Up to now, no paper conducts these experimental evaluations for these two algorithms. This is the first value of this paper. According to our analyses and the results of experiments, we provide a running time improvement for the TTTD algorithm. Our new solution reduces about 7 % of the total running time and also reduces about 50 % of the large-sized chunks while comparing with the original TTTD algorithm and make average chunk-size closer to the expected chunk-size. These significant results are the second important value of this project

    Sentence mood constitution and indefinite noun phrases

    Get PDF
    Sentence mood in German is a complex category that is determined by various components of the grammatical system. In particular, verbal mood, the position of the finite verb and the wh-characteristics of the so called 'Vorfeld'-phrase are responsible for the constitution of sentence mood in German. This article proposes a theory of sentence mood constitution in German and investigates the interaction between the pronominal binding of indefinite noun phrases which are semantically analyzed as choice functions. It is shown that the semantic objects determined by sentence mood define different kinds of domains which have to be uniquely accessible as the range of the choice function. The various properties of the pronominal binding of indefinites can be derived by the interplay of the proposed theoretical notions
    corecore