2,061 research outputs found

    Intersection caching techniques in search engines

    Get PDF
    Los motores de búsqueda procesan enormes cantidades de datos (páginas web) paraconstruir estructuras sofisticadas que soportan la búsqueda. La cantidad cada vez mayorde usuarios e información disponible en la web impone retos de rendimiento y el procesamientode las consultas (queries) consume una cantidad significativa de recursos computacionales. En este contexto, se requieren muchas técnicas de optimización para manejaruna alta carga de consultas. Uno de los mecanismos más importantes para abordareste tipo de problemas es el uso de cachés (caching), que básicamente consiste en manteneren memoria items utilizados previamente, en base a sus patrones de frecuencia, tiempode aparición o costo. La arquitectura típica de un motor de búsqueda está compuesta por un nodo front-end (broker) que proporciona la interfaz de usuario y un conjunto (grande) de nodosde búsqueda que almacenan los datos y soportan las búsquedas. En este contexto, elalmacenamiento en caché se puede implementar en varios niveles. A nivel del broker semantiene generalmente un caché de resultados (results caché). Éste almacena la lista finalde los resultados correspondientes a las consultas más frecuentes o recientes. Además, uncaché de listas de posteo (posting list caché) se implementa en cada nodo de búsqueda. Este caché almacena en memoria las listas de posteo de términos populares o valiosos,para evitar el acceso al disco. Complementariamente, se puede implementar un caché deinterscciones (intersection caché) para obtener mejoras de rendimiento adicionales. Eneste caso, se intenta explotar aquellos pares de términos que ocurren con mayor frecuenciaguardando en la memoria del nodo de búsqueda el resultado de la intersección de lascorrespondientes listas invertidas de los términos (ahorrando no sólo tiempo de acceso adisco, sino también tiempo de CPU). Todos los tipos de caché se gestionan mediante una "política de reemplazo", la cualdecide si va a desalojar algunas entradas del caché en el caso de que un elemento requieraser insertado o no. Esta política desaloja idealmente entradas que son poco probablesde que vayan a ser utilizadas nuevamente o aquellas que se espera que proporcionen unbeneficio menor. Esta tesis se enfoca en el nivel del Intersection caché utilizando políticas de reemplazoque consideran el costo de los items para desalojar un elemento (cost-aware policies). Se diseñan, analizan y evalúan políticas de reemplazo estáticas, dinámicas e híbridas,adaptando algunos algoritmos utilizados en otros dominios a éste. Estas políticas secombinan con diferentes estrategias de resolución de las consultas y se diseñan algunasque aprovechan la existencia del caché de intersecciones, reordenando los términos de laconsulta de una manera particular. También se explora la posibilidad de reservar todo el espacio de memoria asignadaal almacenamiento en caché en los nodos de búsqueda para un nuevo caché integrado (Integrated caché) y se propone una versión estática que sustituye tanto al caché de listasde posting como al de intersecciones. Se diseña una estrategia de gestión específica paraeste caché que evita la duplicación de términos almacenados y, además, se tienen en cuentadiferentes estrategias para inicializar este nuevo Integrated caché. Por último, se propone, diseña y evalúa una política de admisión para el caché deintersecciones basada en principios del Aprendizaje Automático (Machine Learning) quereduce al mínimo el almacenamiento de pares de términos que no proporcionan suficientebeneficio. Se lo modela como un problema de clasificación con el objetivo de identificarlos pares de términos que aparecen con poca frecuencia en el flujo de consultas. Los resultados experimentales de la aplicación de todos los métodos propuestos utilizandoconjuntos de datos reales y un entorno de simulación muestran que se obtienenmejoras interesantes, incluso en este hiper-optimizado campo.Search Engines process huge amounts of data (i.e. web pages) to build sophisticated structuresthat support search. The ever growing amount of users and information availableon the web impose performance challenges and query processing consumes a significantamount of computational resources. In this context, many optimization techniques arerequired to cope with heavy query trafic. One of the most important mechanisms toaddress this kind of problems is caching, that basically consists of keeping in memorypreviously used items, based on its frequency, recency or cost patterns. A typical architecture of a search engine is composed of a front-end node (broker)that provides the user interface and a cluster of a large number of search nodes that storethe data and support search. In this context, caching can be implemented at severallevels. At broker level, a results cache is generally maintained. This stores the finallist of results corresponding to the most frequent or recent queries. A posting list cacheis implemented at each node that simply stores in a memory buffer the posting lists ofpopular or valuable terms, to avoid accessing disk. Complementarily, an intersectioncache may be implemented to obtain additional performance gains. This cache attemptsto exploit frequently occurring pairs of terms by keeping the results of intersecting thecorresponding inverted lists in the memory of the search node (saving not only disk accesstime but CPU time as well). All cache types are managed using a so-called "replacement policy" that decideswhether to evict some entry from the cache in the case of a cache miss or not. This policyideally evicts entries that are unlikely to be a hit or those that are expected to provideless benefit. In this thesis we focus on the Intersection Cache level using cost-aware caching policies,that is, those policies that take into account the cost of the queries to evict an itemfrom cache. We carefully design, analyze and evaluate static, dynamic and hybrid costawareeviction policies adapting some cost-aware policies from other domains to ours. These policies are combined with different query-evaluation strategies, some of them originallydesigned to take benefit of the existence of an intersection cache by reordering thequery terms in a particular way. We also explore the possibility of reserving the whole memory space allocated tocaching at search nodes to a new integrated cache and propose a static cache (namely Integrated Cache) that replaces both list and intersection caches. We design a specificcache management strategy that avoids the duplication of cached terms and we alsoconsider different strategies to populate the integrated cache. Finally, we design and evaluate an admission policy for the Intersection Cache basedin Machine Learning principles that minimize caching pair of terms that do not provideenough advantage. We model this as a classification problem with the aim of identifyingthose pairs of terms that appear infrequently in the query stream. The experimental results from applying all the proposed approaches using real datasetson a simulation framework show that we obtain useful improvements over the baselineeven in this already hyper-optimized field.Fil: Tolosa, Gabriel Hernán. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina

    Universal Indexes for Highly Repetitive Document Collections

    Get PDF
    Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that are near-copies of others. Traditional techniques for indexing these collections fail to properly exploit their regularities in order to reduce space. We introduce new techniques for compressing inverted indexes that exploit this near-copy regularity. They are based on run-length, Lempel-Ziv, or grammar compression of the differential inverted lists, instead of the usual practice of gap-encoding them. We show that, in this highly repetitive setting, our compression methods significantly reduce the space obtained with classical techniques, at the price of moderate slowdowns. Moreover, our best methods are universal, that is, they do not need to know the versioning structure of the collection, nor that a clear versioning structure even exists. We also introduce compressed self-indexes in the comparison. These are designed for general strings (not only natural language texts) and represent the text collection plus the index structure (not an inverted index) in integrated form. We show that these techniques can compress much further, using a small fraction of the space required by our new inverted indexes. Yet, they are orders of magnitude slower.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

    The impact of using combinatorial optimisation for static caching of posting lists

    Get PDF
    Abstract. Caching posting lists can reduce the amount of disk I/O required to evaluate a query. Current methods use optimisation proce-dures for maximising the cache hit ratio. A recent method selects posting lists for static caching in a greedy manner and obtains higher hit rates than standard cache eviction policies such as LRU and LFU. However, a greedy method does not formally guarantee an optimal solution. We investigate whether the use of methods guaranteed, in theory, to find an approximately optimal solution would yield higher hit rates. Thus, we cast the selection of posting lists for caching as an integer linear pro-gramming problem and perform a series of experiments using heuristics from combinatorial optimisation (CCO) to find optimal solutions. Using simulated query logs we find that CCO yields comparable results to a greedy baseline using cache sizes between 200 and 1000 MB, with modest improvements for queries of length two to three

    Using Big Data Analysis to Improve Cache Performance in Search Engines

    Get PDF
    Web Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Locality analysis and its hardware implications for graph pattern mining

    Get PDF
    En aquest treball hem abordat l'acceleració d'aplicacions GPM des de la perspectiva oferta per l'arquitectura NDP. Hem desenvolupat una nova eina de simulació, basada en la integració de dos coneguts simuladors: ZSim (per als cores i les caches) i Ramulator (per a la memòria). Hem hagut de dissenyar específicament aquesta integració perquè la implementació disponible per a la utilització conjunta de tots dos simuladors no aprofita les tècniques que fa servir ZSim per reduir la pèrdua de precisió. Després hem implementat al simulador un accelerador GPM que utilitza l'arquitectura NDP (NDMiner), que representa l'estat de l'art. L'eina de simulació permet realitzar un detallat ``profiling'' de NDMiner, molt útil per identificar els seus punts febles. D'aquesta manera, el simulador facilita el disseny d'estratègies per millorar el rendiment de l'accelerador. Mitjançant una sèrie dexperiments en simulació, hem elaborat una sèrie de propostes concretes per solucionar els problemes detectats i millorar NDMiner.En este trabajo, hemos abordado la aceleración de aplicaciones GPM desde la perspectiva ofrecida por la arquitectura NDP. Hemos desarrollado una nueva herramienta de simulación, basada en la integración de dos conocidos simuladores: ZSim (para los cores y las caches) y Ramulator (para la memoria). Hemos tenido que diseñar específicamente esta integración porque la implementación disponible para la utilización conjunta de ambos simuladores no aprovecha las técnicas que usa ZSim para reducir la pérdida de precisión. Luego hemos implementado en el simulador un acelerador GPM que utiliza la arquitectura NDP (NDMiner), entendemos que representa el estado-del-arte al respecto. La herramienta de simulación permite realizar un detallado ``profiling'' de NDMiner, muy útil para identificar sus puntos débiles. De esta forma, el simulador facilita el diseño de estrategias para mejorar el rendimiento del acelerador. Mediante una serie de experimentos en simulación, hemos elaborado una serie de propuestas concretas para solucionar los problemas detectados y mejorar NDMiner.In this work, we have addressed the acceleration of GPM applications from the perspective offered by the NDP architecture. We have developed a new simulation tool, based on the integration of two well-known simulators: ZSim (for the cores and the caches) and Ramulator (for the memory). The need to carry out this integration arises from the fact that the implementation available for the joint use of both simulators does not take advantage of the techniques that ZSim uses to reduce the loss of precision. We have implemented in simulation a state-of-the-art GPM accelerator based on the NDP architecture (NDMiner). The new simulation tool allows a detailed NDMiner profiling to identify its weak points. Therefore, it helps to design strategies that alleviate those bottlenecks and improve their performance. Consequently, after realizing experiments with the new simulator, we have elaborated a series of concrete proposals to solve some of the problems detected and to improve NDMiner.Outgoin

    Frequent itemset mining on multiprocessor systems

    Get PDF
    Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

    Efficient and High-Quality Rendering of Higher-Order Geometric Data Representations

    Get PDF
    Computer-Aided Design (CAD) bezeichnet den Entwurf industrieller Produkte mit Hilfe von virtuellen 3D Modellen. Ein CAD-Modell besteht aus parametrischen Kurven und Flächen, in den meisten Fällen non-uniform rational B-Splines (NURBS). Diese mathematische Beschreibung wird ebenfalls zur Analyse, Optimierung und Präsentation des Modells verwendet. In jeder dieser Entwicklungsphasen wird eine unterschiedliche visuelle Darstellung benötigt, um den entsprechenden Nutzern ein geeignetes Feedback zu geben. Designer bevorzugen beispielsweise illustrative oder realistische Darstellungen, Ingenieure benötigen eine verständliche Visualisierung der Simulationsergebnisse, während eine immersive 3D Darstellung bei einer Benutzbarkeitsanalyse oder der Designauswahl hilfreich sein kann. Die interaktive Darstellung von NURBS-Modellen und -Simulationsdaten ist jedoch aufgrund des hohen Rechenaufwandes und der eingeschränkten Hardwareunterstützung eine große Herausforderung. Diese Arbeit stellt vier neuartige Verfahren vor, welche sich mit der interaktiven Darstellung von NURBS-Modellen und Simulationensdaten befassen. Die vorgestellten Algorithmen nutzen neue Fähigkeiten aktueller Grafikkarten aus, um den Stand der Technik bezüglich Qualität, Effizienz und Darstellungsgeschwindigkeit zu verbessern. Zwei dieser Verfahren befassen sich mit der direkten Darstellung der parametrischen Beschreibung ohne Approximationen oder zeitaufwändige Vorberechnungen. Die dabei vorgestellten Datenstrukturen und Algorithmen ermöglichen die effiziente Unterteilung, Klassifizierung, Tessellierung und Darstellung getrimmter NURBS-Flächen und einen interaktiven Ray-Casting-Algorithmus für die Isoflächenvisualisierung von NURBSbasierten isogeometrischen Analysen. Die weiteren zwei Verfahren beschreiben zum einen das vielseitige Konzept der programmierbaren Transparenz für illustrative und verständliche Visualisierungen tiefenkomplexer CAD-Modelle und zum anderen eine neue hybride Methode zur Reprojektion halbtransparenter und undurchsichtiger Bildinformation für die Beschleunigung der Erzeugung von stereoskopischen Bildpaaren. Die beiden letztgenannten Ansätze basieren auf rasterisierter Geometrie und sind somit ebenfalls für normale Dreiecksmodelle anwendbar, wodurch die Arbeiten auch einen wichtigen Beitrag in den Bereichen der Computergrafik und der virtuellen Realität darstellen. Die Auswertung der Arbeit wurde mit großen, realen NURBS-Datensätzen durchgeführt. Die Resultate zeigen, dass die direkte Darstellung auf Grundlage der parametrischen Beschreibung mit interaktiven Bildwiederholraten und in subpixelgenauer Qualität möglich ist. Die Einführung programmierbarer Transparenz ermöglicht zudem die Umsetzung kollaborativer 3D Interaktionstechniken für die Exploration der Modelle in virtuellenUmgebungen sowie illustrative und verständliche Visualisierungen tiefenkomplexer CAD-Modelle. Die Erzeugung stereoskopischer Bildpaare für die interaktive Visualisierung auf 3D Displays konnte beschleunigt werden. Diese messbare Verbesserung wurde zudem im Rahmen einer Nutzerstudie als wahrnehmbar und vorteilhaft befunden.In computer-aided design (CAD), industrial products are designed using a virtual 3D model. A CAD model typically consists of curves and surfaces in a parametric representation, in most cases, non-uniform rational B-splines (NURBS). The same representation is also used for the analysis, optimization and presentation of the model. In each phase of this process, different visualizations are required to provide an appropriate user feedback. Designers work with illustrative and realistic renderings, engineers need a comprehensible visualization of the simulation results, and usability studies or product presentations benefit from using a 3D display. However, the interactive visualization of NURBS models and corresponding physical simulations is a challenging task because of the computational complexity and the limited graphics hardware support. This thesis proposes four novel rendering approaches that improve the interactive visualization of CAD models and their analysis. The presented algorithms exploit latest graphics hardware capabilities to advance the state-of-the-art in terms of quality, efficiency and performance. In particular, two approaches describe the direct rendering of the parametric representation without precomputed approximations and timeconsuming pre-processing steps. New data structures and algorithms are presented for the efficient partition, classification, tessellation, and rendering of trimmed NURBS surfaces as well as the first direct isosurface ray-casting approach for NURBS-based isogeometric analysis. The other two approaches introduce the versatile concept of programmable order-independent semi-transparency for the illustrative and comprehensible visualization of depth-complex CAD models, and a novel method for the hybrid reprojection of opaque and semi-transparent image information to accelerate stereoscopic rendering. Both approaches are also applicable to standard polygonal geometry which contributes to the computer graphics and virtual reality research communities. The evaluation is based on real-world NURBS-based models and simulation data. The results show that rendering can be performed directly on the underlying parametric representation with interactive frame rates and subpixel-precise image results. The computational costs of additional visualization effects, such as semi-transparency and stereoscopic rendering, are reduced to maintain interactive frame rates. The benefit of this performance gain was confirmed by quantitative measurements and a pilot user study

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU
    • …
    corecore