164 research outputs found

    Second chance: A hybrid approach for dynamic result caching and prefetching in search engines

    Get PDF
    Cataloged from PDF version of article.Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved. © 2013 ACM

    Memory hierarchies for future HPC architectures

    Get PDF
    Efficiently managing the memory subsystem of modern multi/manycore architectures is increasingly becoming a challenge as systems grow in complexity and heterogeneity. In the field of high performance computing (HPC) in particular, where massively parallel architectures are used and input sets of several terabytes are common, careful management of the memory hierarchy is crucial to exploit the full computing power of these systems. The goal of this thesis is to provide computer architects with valuable information to guide the design of future systems, and in particular of those more widely used in the field of HPC, i.e., symmetric multicore processors (SMPs) and GPUs. With that aim, we present an analysis of some of the inefficiencies and shortcomings of current memory management techniques and propose two novel schemes leveraging the opportunities that arise from the use of new and emerging programming models and computing paradigms. The first contribution of this thesis is a block prefetching mechanism for task-based programming models. Using a task-based programming model simplifies parallel programming and allows for better resource utilization in the supercomputers used in the field of HPC, while enabling sophisticated memory management techniques. The scheme proposed relies on a memory-aware runtime system to guide prefetching while avoiding the main drawbacks of traditional prefetching mechanisms, i.e., cache pollution and lack of timeliness. It leverages the information provided by the user about tasks¿ input data to prefetch contiguous blocks of memory that are certain to be useful. The proposed scheme targets SMPs with large cache hierarchies and uses heuristics to dynamically decide the best cache level to prefetch into without evicting useful data. The focus of this thesis then turns to heterogeneous architectures combining GPUs and traditional multicore processors. The current trend towards tighter coupling of GPU and CPU enables new collaborative computations that tax the memory subsystem in a different manner than previous heterogeneous computations did, and requires careful analysis to understand the trade-offs that are to be expected when designing future memory organizations. The second contribution is an in-depth analysis on the impact of sharing the last-level cache between GPU and CPU cores on a system where the GPU is integrated on the same die as the CPU. The analysis focuses on the effect that a shared cache can have on collaborative computations where GPU and CPU threads concurrently work on a problem and share data at fine granularities. The results presented here show that sharing the last-level cache is largely beneficial as it allows for better resource utilization. In addition, the evaluation shows that collaborative computations benefit significantly from the faster CPU-GPU communication and higher cache hit rates that a shared cache level provides. The final contribution of this thesis analyzes the inefficiencies and drawbacks of demand paging as currently implemented in discrete GPUs by NVIDIA. Then, it proposes a novel memory organization and dynamic migration scheme that allows for efficient data sharing between GPU and CPU, specially when executing collaborative computations where data is migrated back and forth between the two separate memories. This scheme migrates data at cache line granularities transparently to the user and operating system, avoiding false sharing and the unnecessary data transfers that occur with demand paging. The results show that the proposed scheme is able to outperform the baseline system by reducing the migration latency of data that is copied multiple times between the two memories. In addition, analysis of different interconnect latencies shows that fine-grained data sharing between GPU and CPU is feasible as long as future interconnect technologies achieve four to five times lower round-trip times than PCI-Express 3.0.La gestión eficiente del subsistema de memoria se ha convertido en un problema complejo a la vez que los sistemas crecen en complejidad y heterogeneidad. En el campo de la computación de altas prestaciones (HPC) en particular, donde arquitecturas masivamente paralelas son usadas y entradas de varios terabytes son comunes, una gestión cuidadosa de la jerarquía de memoria es crucial para conseguir explotar todo el potencial de estas arquitecturas. El objetivo de esta tesis es proporcionar a los arquitectos de computadores información valiosa para el diseño de los sistemas del futuro, y en concreto de los más comúnmente usados en el campo de HPC, los procesadores multinúcleo simétricos (SMP) y las tarjetas gráficas (GPU). Para ello, presentamos un análisis de las ineficiencias y los inconvenientes de los sistemas de gestión de memoria actuales, y proponemos dos técnicas nuevas que aprovechan las oportunidades surgidas del uso de nuevos y emergentes modelos de programación y paradigmas de computación. La primera contribución de esta tesis es un mecanismo de prefetch de bloques para modelos de programación basados en tareas. Usando modelos de programación orientados a tareas simplifica la programación paralela y permite hacer un mejor uso de los recursos en los supercomputadores usados en HPC, mientras permiten el uso de sofisticados mecanismos de gestión de memoria. La técnica propuesta se basa en un sistema de runtime para guiar el prefetch de datos mientras evita los principales inconvenientes tradicionalmente asociados con prefetching, la polución de cache y la medida incorrecta de los tiempos. El mecanismo utiliza la información sobre las entradas de las tareas proporcionada por el usuario para prefetchear bloques contiguos de memoria sobre los que hay certeza que serán utilizados. El mecanismo está dirigido a arquitecturas SMP con amplias jerarquías de cache, y usa heurísticas para decidir dinámicamente en qué nivel de caché colocar los datos sin desplazar datos útiles. El focus de la tesis gira luego a arquitecturas heterogéneas que combinan GPUs con procesadores multinúcleo tradicionales. La actual tendencia a unir GPU y CPU permite el uso de una nueva serie de computaciones colaborativas que afectan al subsistema de memoria de forma diferente que las computaciones heterogéneas anteriores, y requiere de un cuidadoso análisis para entender las consecuencias que esto tiene en el diseño de las organizaciones de memoria futuras. La segunda contribución de la tesis es un análisis detallado del impacto que supone compartir el último nivel de cache entre núcleos de GPU y CPU en sistemas donde la GPU está integrada en el mismo chip que la CPU. El análisis se centra en el efecto que la cache compartida tiene en colaboraciones colaborativas donde hilos de GPU y CPU trabajan concurrentemente en un problema y comparten datos a grano fino. Los resultados presentados en esta tesis muestran que compartir el último nivel de cache es mayormente beneficioso ya que permite un mejor uso de los recursos. Además, la evaluación muestra que las computaciones colaborativas se benefician en gran medida de la comunicación más rápida entre GPU y CPU y las mayores tasas de acierto de cache que un nivel de cache compartido proporcionan

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    Adaptive memory hierarchies for next generation tiled microarchitectures

    Get PDF
    Les últimes dècades el rendiment dels processadors i de les memòries ha millorat a diferent ritme, limitant el rendiment dels processadors i creant el conegut memory gap. Sol·lucionar aquesta diferència de rendiment és un camp d'investigació d'actualitat i que requereix de noves sol·lucions. Una sol·lució a aquest problema són les memòries “cache”, que permeten reduïr l'impacte d'unes latències de memòria creixents i que conformen la jerarquia de memòria. La majoria de d'organitzacions de les “caches” estan dissenyades per a uniprocessadors o multiprcessadors tradicionals. Avui en dia, però, el creixent nombre de transistors disponible per xip ha permès l'aparició de xips multiprocessador (CMPs). Aquests xips tenen diferents propietats i limitacions i per tant requereixen de jerarquies de memòria específiques per tal de gestionar eficientment els recursos disponibles. En aquesta tesi ens hem centrat en millorar el rendiment i la eficiència energètica de la jerarquia de memòria per CMPs, des de les “caches” fins als controladors de memòria. A la primera part d'aquesta tesi, s'han estudiat organitzacions tradicionals per les “caches” com les privades o compartides i s'ha pogut constatar que, tot i que funcionen bé per a algunes aplicacions, un sistema que s'ajustés dinàmicament seria més eficient. Tècniques com el Cooperative Caching (CC) combinen els avantatges de les dues tècniques però requereixen un mecanisme centralitzat de coherència que té un consum energètic molt elevat. És per això que en aquesta tesi es proposa el Distributed Cooperative Caching (DCC), un mecanisme que proporciona coherència en CMPs i aplica el concepte del cooperative caching de forma distribuïda. Mitjançant l'ús de directoris distribuïts s'obté una sol·lució més escalable i que, a més, disposa d'un mecanisme de marcatge més flexible i eficient energèticament. A la segona part, es demostra que les aplicacions fan diferents usos de la “cache” i que si es realitza una distribució de recursos eficient es poden aprofitar els que estan infrautilitzats. Es proposa l'Elastic Cooperative Caching (ElasticCC), una organització capaç de redistribuïr la memòria “cache” dinàmicament segons els requeriments de cada aplicació. Una de les contribucions més importants d'aquesta tècnica és que la reconfiguració es decideix completament a través del maquinari i que tots els mecanismes utilitzats es basen en estructures distribuïdes, permetent una millor escalabilitat. ElasticCC no només és capaç de reparticionar les “caches” segons els requeriments de cada aplicació, sinó que, a més a més, és capaç d'adaptar-se a les diferents fases d'execució de cada una d'elles. La nostra avaluació també demostra que la reconfiguració dinàmica de l'ElasticCC és tant eficient que gairebé proporciona la mateixa taxa de fallades que una configuració amb el doble de memòria.Finalment, la tesi es centra en l'estudi del comportament de les memòries DRAM i els seus controladors en els CMPs. Es demostra que, tot i que els controladors tradicionals funcionen eficientment per uniprocessadors, en CMPs els diferents patrons d'accés obliguen a repensar com estan dissenyats aquests sistemes. S'han presentat múltiples sol·lucions per CMPs però totes elles es veuen limitades per un compromís entre el rendiment global i l'equitat en l'assignació de recursos. En aquesta tesi es proposen els Thread Row Buffers (TRBs), una zona d'emmagatenament extra a les memòries DRAM que permetria guardar files de dades específiques per a cada aplicació. Aquest mecanisme permet proporcionar un accés equitatiu a la memòria sense perjudicar el seu rendiment global. En resum, en aquesta tesi es presenten noves organitzacions per la jerarquia de memòria dels CMPs centrades en la escalabilitat i adaptativitat als requeriments de les aplicacions. Els resultats presentats demostren que les tècniques proposades proporcionen un millor rendiment i eficiència energètica que les millors tècniques existents fins a l'actualitat.Processor performance and memory performance have improved at different rates during the last decades, limiting processor performance and creating the well known "memory gap". Solving this performance difference is an important research field and new solutions must be proposed in order to have better processors in the future. Several solutions exist, such as caches, that reduce the impact of longer memory accesses and conform the system memory hierarchy. However, most of the existing memory hierarchy organizations were designed for single processors or traditional multiprocessors. Nowadays, the increasing number of available transistors has allowed the apparition of chip multiprocessors, which have different constraints and require new ad-hoc memory systems able to efficiently manage memory resources. Therefore, in this thesis we have focused on improving the performance and energy efficiency of the memory hierarchy of chip multiprocessors, ranging from caches to DRAM memories. In the first part of this thesis we have studied traditional cache organizations such as shared or private caches and we have seen that they behave well only for some applications and that an adaptive system would be desirable. State-of-the-art techniques such as Cooperative Caching (CC) take advantage of the benefits of both worlds. This technique, however, requires the usage of a centralized coherence structure and has a high energy consumption. Therefore we propose the Distributed Cooperative Caching (DCC), a mechanism to provide coherence to chip multiprocessors and apply the concept of cooperative caching in a distributed way. Through the usage of distributed directories we obtain a more scalable solution and, in addition, has a more flexible and energy-efficient tag allocation method. We also show that applications make different uses of cache and that an efficient allocation can take advantage of unused resources. We propose Elastic Cooperative Caching (ElasticCC), an adaptive cache organization able to redistribute cache resources dynamically depending on application requirements. One of the most important contributions of this technique is that adaptivity is fully managed by hardware and that all repartitioning mechanisms are based on distributed structures, allowing a better scalability. ElasticCC not only is able to repartition cache sizes to application requirements, but also is able to dynamically adapt to the different execution phases of each thread. Our experimental evaluation also has shown that the cache partitioning provided by ElasticCC is efficient and is almost able to match the off-chip miss rate of a configuration that doubles the cache space. Finally, we focus in the behavior of DRAM memories and memory controllers in chip multiprocessors. Although traditional memory schedulers work well for uniprocessors, we show that new access patterns advocate for a redesign of some parts of DRAM memories. Several organizations exist for multiprocessor DRAM schedulers, however, all of them must trade-off between memory throughput and fairness. We propose Thread Row Buffers, an extended storage area in DRAM memories able to store a data row for each thread. This mechanism enables a fair memory access scheduling without hurting memory throughput. Overall, in this thesis we present new organizations for the memory hierarchy of chip multiprocessors which focus on the scalability and of the proposed structures and adaptivity to application behavior. Results show that the presented techniques provide a better performance and energy-efficiency than existing state-of-the-art solutions

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Improving prefetching mechanisms for tiled CMP platforms

    Get PDF
    Recently, high performance processor designs have evolved toward Chip-Multiprocessor (CMP) architectures to deal with instruction level parallelism limitations and, more important, to manage the power consumption that is becoming unaffordable due to the increased transistor count and clock frequency. At the present moment, this architecture, which implements multiple processing cores on a single die, is commercially available with up to twenty four processors on a single chip and there are roadmaps and research trends that suggest that number of cores will increase in the near future. The increasing on number of cores has converted the interconnection network in a key issue that will have significant impact on performance. Moreover, as the number of cores increases, tiled architectures are foreseen to provide a scalable solution to handle design complexity. Network-on-Chip (NoC) emerges as a solution to deal with growing on-chip wire delays. On the other hand, CMP designs are likely to be equipped with latency hiding techniques like prefetching in order to reduce the negative impact on performance that, otherwise, high cache miss rates would lead to. Unfortunately, the extra number of network messages that prefetching entails can drastically increase power consumption and the latency in the NoC. In this thesis, we do not develop a new prefetching technique for CMPs but propose improvements applicable to any of them. Specifically, we analyze the behavior of the prefetching in the CMPs and its impact to the interconnect. We propose several dynamic management techniques to improve the performance of the prefetching mechanism in the system. Furthermore, we identify the main problems when implementing prefetching in distributed memory systems like tiled architectures and propose directions to solve them. Finally, we propose several research lines to continue the work done in this thesis.Recentment l'arquitectura dels processadors d'altes prestacions ha evolucionat cap a processadors amb diversos nuclis per a concordar amb les limitacions del paral·lelisme a nivell d'instrucció i, mes important encara, per tractar el consum d'energia que ha esdevingut insostenible degut a l'increment de transistors i la freqüència de rellotge. Ara mateix, aquestes arquitectures, que implementes varis nuclis en un sol xip, estan a la venta amb mes de vint-i-quatre processadors en un sol xip i hi ha previsions que suggereixen que aquest nombre de nuclis creixerà en un futur pròxim. Aquest increment del nombre de nuclis, ha convertit la xarxa que els connecta en un punt clau que tindrà un impacte important en el seu rendiment. Una topologia de xarxa que sembla que serà capaç de proveir una solució escalable per aquestes arquitectures ha estat la topologia tile. Les xarxes en el xip (NoC) es presenten com la solució del increment de la latència dels cables del xip. Per altre banda, els dissenys de multiprocessadors seguiran disposant de tècniques de reducció de latència de memòria com el prefetch per tal de reduir l'impacte negatiu en rendiment que, altrament, tindríem degut als elevats temps de latència en fallades a memòria cache. Desafortunadament, el gran nombre de peticions destinades a prefetch, pot augmentar dràsticament la congestió a la xarxa i el consum d'energia. En aquesta tesi, no desenvolupem cap tècnica nova de prefetching, però proposem millores aplicables a qualsevol d'ells. Concretament analitzem el comportament del prefetching en multiprocessadors i el seu impacte a la xarxa. Proposem diverses tècniques de control dinàmic per millor el rendiment del prefetcher al sistema. A més, identifiquem els problemes principals d'implementar el prefetching en els sistemes de memòria distribuïts com els de les arquitectures tile i proposem línies d'investigació per solucionar-los. Finalment, també proposem diverses línies d'investigació per continuar amb el treball fet en aquesta tesi.Postprint (published version

    Prefetching techniques for client server object-oriented database systems

    Get PDF
    The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access. In this work we suggest several prefetching techniques to avoid, or at least to reduce, page fetch latency. In practice no prediction technique is perfect and no prefetching technique can entirely eliminate delay due to page fetch latency. Therefore we are interested in the trade-off between the level of accuracy required for obtaining good results in terms of elapsed time reduction and the processing overhead needed to achieve this level of accuracy. If prefetching accuracy is high then the total elapsed time of an application can be reduced significantly otherwise if the prefetching accuracy is low, many incorrect pages are prefetched and the extra load on the client, network, server and disks decreases the whole system performance. Access pattern of object-oriented databases are often complex and usually hard to predict accurately. The ..
    corecore