With the advent of chip multiprocessors, new techniques have been developed to make parallel programing easier and more reliable. New parallel programing paradigms and new methods of making the execution of programs more efficient and more reliable have been developed. Usually, these improvements require hardware support to avoid a system slowdown.
INTRODUCTION
New parallel programming techniques are needed to take full advantage of emergent multicore architectures with several cores and shared memory. Unlike sequential
The target of FlexSig is to obtain a more flexible signature system than traditional signatures based on Bloom filters, by using all the hardware resources independently of the number of threads running in the system. By using a fixed signature space, FlexSig is able to manage a high number of requests simultaneously when signatures are highly demanded, and achieve a low rate of false positives when there are few concurrent requests in the system. 1 Additionally, it provides fault tolerance because if some signature registers fail, FlexSig can continue operating just by invalidating the faulty registers and redistributing the remaining registers among the threads.
For instance, in a processor with 16 cores and 1 thread per core demanding from 0 to 4 signatures each, with at least one thread requiring at least one signature, the maximum number of signatures required simultaneously is 64 (4 signatures × 16 threads), and the minimum is 1, (only 1 thread running and it demands just 1 signature). Let us assume that the most usual situation is to need 8 signatures simultaneously; for example, 4 threads with 2 signatures each. Using signatures based on Bloom filters, 64 signatures of a given size (to be determined to minimize the false positives rate) are required to properly manage the worst case. However, with FlexSig it is possible to focus on the most common situation: a register space equivalent to 8 Bloom filters, but with enough flexibility to keep up to 64 concurrent signatures. Of course, the larger the number of signatures, the lower the number of registers assigned to each one. With this configuration, FlexSig behaves efficiently for the most common case of 8 simultaneous signatures. When only 1 signature is required, FlexSig provides a large signature with a size 8 times larger than a regular signature. In the rare case of requiring 64 signatures, FlexSig provides it anyway, but with a smaller size and, therefore, a higher false positives rate.
FlexSig is evaluated by comparing with fixed signatures based on Bloom filters in a Transactional Memory (TM) System. The results show that FlexSig outperforms the fixed signatures, with significant reduction in the false positive rate, specially when the number of threads does not match the maximum hardware threads.
There are other papers focused on improving signatures [Yen 2009; Shenghua et al. 2009; Yen et al. 2008; Almeida et al. 2007 ] and even to propose a scalable scheme [Almeida et al. 2007 ] but, to the best of our knowledge, this is the first work focusing on improving the flexibility of hardware signatures.
The rest of the paper is organized as follows. In Section 2, hardware signatures are reviewed. In Sections 3 and 4, our approach to flexible signatures is presented and some implementation issues are discussed. In Section 5, FlexSig is evaluated. Finally, in Section 6 related work is discussed and the conclusions are summarized in Section 7.
BACKGROUND ON HARDWARE SIGNATURES
In this section we focus on hardware signatures, although signatures can be implemented in software or hardware. Signatures are used to keep the set of addresses generated by the different cores or threads in the system and to detect conflicts in the memory accesses among cores/threads. Hardware signatures are composed of a m-bit register and several hash functions where the addressed are hash-encoded and stored.
The basic operations which must be supported by a signature are: insert a new address, check if an address is already in the signature (conflict) and clear. Due to aliasing, a conflict may be detected in the check operation when no actual conflict exists (false positive), but no conflicts are missed, that is, false negatives are not possible. There are different implementations of signatures, but here we focus on signatures based on Bloom filters [Bloom 1970 ] because they are widely used and are the base of FlexSig.
True Bloom Filters
Figure 1(a) shows the basic scheme of a signature based on Bloom filters [Bloom 1970] . It is composed of a m-bit register and one or more hash functions [h 1 · · · h k ]. This kind of signature, which is based on a single Bloom filter, is called True Bloom signature. In an insert operation the hash function receives an address and sets a bit in the register. To check if an address is stored in the signature, the address is hashed and compared with the content of the register. To clear the signature all bits of the register are set to zero.
The most critical design decisions in a true Bloom filter are the size of the register (m) and the number of hash functions (k). Large registers decrease the probability of a false positive, but increase the hardware resources and power required. On the other hand, the probability of false positives depends also on the number of hash functions and the number of elements inserted [Sanchez et al. 2007; Bloom 1970] .
The number of false positives is influenced as well by how the hash functions are implemented. Very simple implementations are not efficient in terms of false positives. A very popular and widely used family of hash functions is H 3 [Carter and Wegman 1977; Ramakrishna et al. 1997] . H 3 requires additional hardware, specifically, for nbit addresses a tree of n/2 two-input XOR gates for each bit of the hash function is needed, but it achieves a better false positive rate thanks to uncorrelated and uniformly distributed hash values.
Parallel Bloom filters
Figure 1(b) shows an improvement of the the true Bloom filters, the parallel Bloom filters [Sanchez et al. 2007; . In this case, the m-bit register is split into k m/k-bit registers, each with a hash function. In this way, each hash function operates on different parts of the m-bit register. Hence, the Parallel Bloom filter can be seen as k true Bloom filters with one hash function and a m/k-bit register. The insert operation hashes the address and inserts one bit in each m/k-bit register. The advantage of parallel signatures is that hash functions are simpler and are implemented with fewer resources. Also, each of the individual Bloom filters can be single-ported, thereby greatly reducing the hardware area/power/latency.
The false positive rate depends on the size of the signature m, the number of hash functions k and the number of inserted addresses n. and is given by the following expression [Sanchez et al. 2007; : It can be seen that a large k degrades the false positive rate quickly as the number on inserted addresses increases. The optimum value for k is between 3 and 8.
FLEXIBLE SIGNATURES
This section describes the FlexSig scheme. The overall organization and the strategy for allocating signatures are described in detail and an implementation is outlined. Moreover, the software interface and the actions to be taken in case of overflow and hardware failure are discussed.
Overview
FlexSig is based on parallel Bloom filters (Section 2), but introducing mechanisms to use all the resources in the signatures as much as possible and with a large flexibility to adapt to different signature demands, allowing a better efficiency and reconfigurability. Figure 3 shows the block diagram of FlexSig. It is composed of T Bloom filters, each one composed of a M/T -bit register (M is the total size of the FlexSig) and a hash function, that can host between 1 and T signatures, each signature being composed of one or more Bloom filters. Each Bloom Filter has an identifier (ID) of the signature to which it belongs. The number of Bloom filters assigned to each signature depends on the number of signatures allocated. Moreover, the resources assigned to a given signature may change dynamically with time. h 1 , h 2 , . . . , h T are independent H 3 hash functions, each operating on one register. The registers in FlexSig are usually relatively small (for instance, 64 bits), because a signature is to be composed of several of them. Every time FlexSig receives a request to insert a new address in one of its signatures, each hash function assigned to the signature sets one bit in its register. On the other hand, to check if an address is already in the signature, all the bits read by the corresponding hash functions should be 1. Deallocation requests clear all the IDs and registers assigned to the signature.
Each time a new signature allocation request arrives, FlexSig assigns k Bloom filters, k ≤ T , to the new signature. Then, the k Bloom filters operate as a parallel Bloom filter inside FlexSig. The number of Bloom filters assigned depends on the current resource availability, that is, on the already allocated Bloom filters to previous signatures. If the hardware resources in FlexSig are fully used by previous signatures, FlexSig have to free several Bloom filters, already assigned, to allocate the new signature. This means that FlexSig has to reduce dynamically the size of any signature by releasing Bloom filters. In this case, the false positive rate may increase, but false negatives are never produced. Figure 4 shows the FlexSig module architecture. As said before, there are T Bloom filters, each one composed of a register and a hash function. Attached to each register there is a thread identifier (thID) and a signature identifier (sID), used to identify the registers assigned to a given signature. Therefore, thID and sID are log 2 (num threads)-bit and log 2 (max sigs per thread)-bit wide, respectively. Both thID and sID form the signature owner identifier (ID)
The minimum number of Bloom filters per signature in FlexSig is T /num max concurrent sigs (the Bloom filters are distributed equally among signatures), and the maximum size is T (when only one signature is allocated).
The controller implements the allocation algorithm and the rest of functions needed for the correct operation of FlexSig. The complexity and efficiency of the controller depends on the algorithm to allocate and to free signature registers. Figure 5 shows how to perform the insertion, check and deallocation requests. The insertion request must include the ID for the signature, so that the address is only inserted in the registers that matches this ID. The check operation is very similar to the insertion operation, but it is read-only. The deallocation consists of clearing the registers and IDs.
Allocation Algorithm
The allocation algorithm is required to make room for a new signature and to free the space occupied by a signature once it is not needed anymore. This algorithm may be very complicated, for example, by defining priorities to assign more or less Bloom filters depending on the requirements of the allocated signature. In this work we show a simple allocation algorithm with no priorities. The algorithm with priorities requires a deep study of the design space (number of priorities, resources assignment depending on the priority, etc) to have an efficient implementation, and it will be developed in a future work.
To perform the description of the allocation algorithm, the following parameters are defined.
-n sig: Number of signatures allocated in FlexSig.
-n reg f ree: number of free Bloom filters in FlexSig.
-n to f ree: number of Bloom filters to be freed by the allocation algorithm.
Three different situations are possible when a thread tries to reserve space for a new signature in FlexSig: (1) FlexSig is empty, (2) FlexSig is full, or (3) FlexSig is partially full.
(1) FlexSig is empty (n sig = 0, n reg f ree = T ). All the resources of FlexSig are assigned to the new signature. This is one of the basic principles of FlexSig: if there are free resources, take as much as possible. (2) FlexSig is full (n sig = T , n reg f ree = 0). The controller must free space in FlexSig when it is necessary to allocate a new signature. Then, the other signatures in FlexSig are made smaller by reducing the number of Bloom filters per signature. The release of one or several Bloom filters assigned to a given signature may increase the false positive rate but false negatives are never produced, because all the hash functions are independent and all the registers in a signature have the information corresponding to every address inserted. The number of Bloom filters to free is given by
The free filters are assigned to the new signature. Therefore, the filters are redistributed equally among all the signatures including the new one. Figure 6 illustrates an example of the allocation algorithm in a FlexSig module composed of 16 Bloom filters. Initially, there are 3 signatures allocated, one of them has 6 filters assigned and the other has 5 filters. When a new signature is allocated, n to f ree = 4, and the controller clears filters and tries to assign the same number of filters to every signature. (3) FlexSig is partially full. The controller must decide whether the free resources are enough to allocate a new signature or if additional resources are needed. In the latter case, some filters should be freed and assigned to the new signature. If n reg f ree < n to f ree, the controller frees (n to f ree − n reg f ree) Bloom filters as explained for the case when FlexSig is full. On the other hand if n reg f ree ≥ n to f ree all the available Bloom filters are assigned to the new signature.
Influence of the Bloom Filters Release on the False Positive Rate
As said before, when a new signature is needed and there is not room to host it, the controller must free some Bloom filters assigned to other signatures and assign them to the new signature. But, how does this affect the false positive rate? The probability of a false positive is P = (1
2), being m the number of bits of the signature, k the number of registers or hash functions, and n the number of elements inserted in the signature. In FlexSig, the relation m/k is constant. When the number of Bloom filters assigned to a signature is reduced, m and k are reduced in the same proportion.
Let us illustrate this influence with an example. Figure 7 shows the variation of the false positive rate when the resources allocated to a given signature are reduced: as an instance, assume that initially the signature is composed of k = 16 filters with a total register size of m = 2048 and it is reduced to k = 8 filters with m = 1024. If the number of addresses inserted in the signature, n, is low, the reduction in signature size has no practial influence on the false positive rate. However, if n is large the false positive rate increase significantly. 
Software Interface
To provide a basic software interface to FlexSig, the following instruction set extensions should be added. This basic instruction set extension allows the proper interface to the basic operations of the module. Moreover, the instruction set can be extended with new instructions for specific purposes. For instance, to support Transactional Memory, it can be extended with functionalities to forward signatures to cores, etc.
Register Grouping
In the case of just one or few signatures allocated in FlexSig, the number of Bloom filter elements per signature is high, i.e., equivalent to having a high value of k in conventional parallel Bloom filters. However, as we learned from Figure 2 (b), the optimum value of k, in terms of the false positive rate, is low (between 3 and 8 for the parameters used in Figure 2(b) ). To reduce the false positive rate in these cases, FlexSig can group several Bloom filters so that only one is used at a time in operations that involve hash functions (insert and check). A simple implementation consists of selecting the Bloom filter of the group based on the value of a few least significant bits of the address involved in the operation. Figure 8 illustrates the grouping scheme for groups of two elements.
Grouping can be implemented in a static or dynamic way. For a static implementation, the grouping size is chosen before the first allocation, and only can change when the FlexSig is totally empty. This forces all signatures to have the same grouping size and simplify the implementation. If it is implemented dynamically, the grouping size is chosen for each signature when is allocated, depending on the number of Bloom filters assigned to it. This complicates the logic and can cause other problems with many corner cases. Because of that, we chose to implement static grouping in our evaluation.
The maximum level of grouping is a design decision for FlexSig. The specific grouping for each application can be established through the software interface. For the case of our benchmarks (see Section 5.3) with up to 16 threads, we determined that a maximum grouping of two elements is enough to achieve good results. Moreover, this grouping is activated only for applications configured to run with two and four threads.
FlexSig Overflow and Fault Tolerance
FlexSig allows to host several signatures at the same time. However, a situation of overflow may be produced in exceptional (low probability) cases when a new allocation request arrives, and the controller can not free any Bloom filter because FlexSig hits the maximum number of signatures allowed (when the software tries to allocate more signatures than the total number of Bloom Filters available). In this case, the situation is managed by software, as it is done, for instance, for conventional signatures in Transactional Memory implementations.
In the case that an application allocates signatures, but fails to deallocate them (due to a software bug or fault), FlexSig will have fewer resources to allocate new signatures for the remaining running application time (similar to the memory leak problem). This case is very hard to manage in hardware, and therefore it should be handled by software. As an instance, a straightforward scheme for Transactional Memory is to clear FlexSig when serial code is executing or when no transactions are running in the system.
FlexSig has nice fault tolerant properties regarding the storage of the signatures due to its flexibility. If one register fails (permanent or soft error detected with standard fault detection techniques), such register is marked as invalid if the error is permanent (not used any more) or freed if it is a soft error (it can be used in new signature allocations). Moreover, regarding permanent faults, only stuck-at-zero faults would lead to the invalidation of a register (stuck-at-one faults only increase the false positive rate). No special operations are needed for managing this situation, as the only implication of loosing one Bloom filter is to increase the false positive rate. Of course, an exception is raised if the failing Bloom filter is the only one assigned to the signature. 
IMPLEMENTATION ISSUES
The FlexSig system can be placed in each core or as a centralized resource attached to the chip interconnection network. The flexibility of FlexSig is achieved at the cost of extra logic compared with conventional parallel Bloom filters. In this sense, a key element is the FlexSig controller, which should support the functionality of FlexSig with a simple architecture to reduce power an area overhead. The controller needs one queue for the incoming requests, because the requests are served sequentially. There are four types of request, Allocate, Deallocate, Check and Insert, each with an ID that the controller uses to take action on the corresponding registers. To implement efficiently the straightforward allocation algorithm described in Section 3.2, some extra registers are needed in the controller to take fast decisions for the allocation operation. There are T log 2 (T )-bit counters in the controller to count the number of registers of FlexSig allocated by the corresponding ID. There is also a counter that keeps track of empty records. Using this stored information a finite-state machine performs the allocation operations.
For implementations using a centralized FlexSig for all cores, the module might be a bottleneck. After analyzing the concurrency of the possible arriving requests, we determined that the controller can serve some of them in parallel. Taking into account these rules, the controller can be parallelized in several ways. Figure 9 shows a simple parallel controller proposal. This controller may perform up to P operations in parallel. Basically it is an in-order issue superscalar engine. The incoming requests are placed in an input queue. The issue logic determines up to P requests to be issued in parallel, following the rules listed above. We have one finitestate machine (and the corresponding counters) to execute the allocate requests, and P very simple circuits to process inserts, checks or deallocates.
Most of the time, the finite-state machine has the calculations ready when a new allocate request arrives, because it recalculates these parameters immediately after the previous allocate or deallocate request. Only when two consecutive allocate requests arrive, the finite-state machine has no time to recalculate the parameters before the second request, incurring in some additional delay.
EVALUATION IN A TRANSACTIONAL MEMORY SYSTEM
The aim of the evaluation is to show the effectiveness of the FlexSig system with respect to conventional parallel Bloom filters. In this work we concentrate on Transactional Memory applications, since for many Transactional Memory implementations signatures are a key element. Transactional Memory uses signatures to detect conflicts among transactions. Each transaction inserts in signatures its reads and writes to maintain a summary of its read/write set. Conflicts with other transactions reads/writes are detected through the check operation. Since our purpose is evaluate only signatures, our figures of merit are in terms of false positive rates. A higher the false positive rate degrades performance, because for each false positive, the Transactional Memory system has to do an unnecessary abort (rollback to the initial state and restart the transaction).
To evaluate FlexSig, we use unified signatures (see Section 5.1), so we only need one signature per transaction for the read and write set, which allows us to implement the simple allocation algorithm described in Section 3.2. The results we obtained with our experimental setup (see below) using separate signatures are worse, and therefore we only report the results for unified signatures.
Unified Signatures: Simplifying FlexSig Implementation in Transactional Memory
Transactional Memory uses two signatures per transaction, one for the read set and another one for the write set. Usually the read set is larger than the write set, and therefore, in order to use efficiently the resources, the signature of the read set should be larger than the signature of the write set. However, having signatures of different sizes for the write set and the read set introduces additional difficulties in the allocation algorithm and makes its implementation more complex. Unified signatures [Choi and Draper 2011] propose to use only one signature for both the read set and the write set. This approach may generate read-read conflicts, however, these conflicts rarely lead to a performance lost [Sanyal et al. 2009; Choi and Draper 2011] . Using unified signatures each thread only needs to allocate one signature per transaction, and the complexity of the controller is reduced. This is the approach we have used for evaluating FlexSig.
Experimental Setup
To evaluate the FlexSig scheme we use a Transactional Memory system with signatures used to track data accesses in transactions. Our aim is not to implement a fully functional Transactional Memory system, but to work out a challenging scenario for FlexSig, and compare it with conventional parallel Bloom filters in the same situation. For the Transactional Memory system we use the software approach RSTM [Spear et al. 2008] . RSTM is a software Transactional Memory system that allows many different configurations. In our evaluation we use a lazy acquisition and lazy versioning with extendable timestamps [Riegel et al. 2007 ] to configure RSTM. We use PIN [keung Luk et al. 2005 ] to track all transactions and memory accesses of RSTM and to emulate the [Cao Minh et al. 2008] , two PARSEC Benchmarks [Bienia et al. 2008] and nine micro benchmarks (included in the RSTM distribution). Table I shows the inputs of the benchmarks. The benchmarks not included in the table run with the default input. For this evaluation, we classify the benchmarks in two categories. One group is composed of benchmarks with a high false positive rate (Benchmark set A), and the other with a modest false positive rate (Benchmark set B). The purpose of this is to run each group of benchmarks with a different signature configuration to show the advantages of FlexSig for workloads with different characteristics .  Tables II and III show the characterization of the benchmarks. The parameter #T x is the number of transactions of the benchmark, T xT ime is the percentage of time spent on transactions, and RS and W S are the average number of reads and writes per transaction. The time spent in transactions is, in general, very significant. This parameter is affected by the instrumentation tool, because only transactions are instrumented. This scenario is a pessimistic approximation, since in a real system the time spent inside the transactions should be less, and therefore, it should be less likely that those transactions demand signatures at the same time in the FlexSig system. Therefore, the results should be better than in the simulated case.
Configuration
For the evaluation we use the configurations shown in Table IV . The hardware configuration for parallel Bloom filters (k and m are the parameters in Figure 1(b) ) was chosen specifically to manage up to 16 threads (that is, the conventional signature system has 16 parallel Bloom filters of fixed size). We run experiments with 2, 4, 8 and 16 threads. Two configurations are used for FlexSig: configuration conf1 uses the same resources as their equivalent parallel Bloom filter, and conf2 uses half of the resources. For the benchmarks belonging to the set A, the registers are of 512 bits for the unified parallel Bloom filter (a total of 8192 bits for a 16 thread system); for FlexSig we have 32 registers of 128 bits for conf2, and 64 registers of 128 bits for conf1. Similarly, for the benchmarks belonging to the set B, the size of the registers for the unified parallel Bloom filter is 32 bits and the corresponding FlexSig configurations conf1 and conf2 are described in Table IV . To group registers (see Section 3.5), we choose groups of one register for 8 and 16 threads, and groups of two registers for executions with 2 and 4 threads. This decision was taken to have an efficient configuration (see Figure 2(b) ).
Results
Tables V and VI show the false positive rate of FlexSig with configurations conf1 and conf2 compared with the results obtained with parallel Bloom filters (see Table IV ), for the case of 2, 4, 8 and 16 threads. A white cell (in conf1 and conf2 columns) means that the false positive rate is roughly the same as the one obtained with the parallel Bloom filter, a gray cell means that the false positive rate of FlexSig is better (lower), and a dark gray means that the false positive rate of FlexSig is worse (higher). First, we comment the results with conf1 for both benchmark sets A and B. As Tables V and VI show, the FlexSig-conf1 outperforms parallel Bloom filters in almost all the cases. For 2, 4 and 8 threads, the improvement is very high; for instance, for the case of vacation-high running with 2 threads, the false positive rate is reduced 
where num changes size is the number of times a signature changes its size (number of registers) before deallocation. time interval is the number of time units that a signature has a size sig size. Figure 10 shows the improvement in the average signature size of FlexSig-conf1 compared with the equivalent conventional signature. This improvement is achieved because not all the threads use signatures simultaneously, and therefore, the threads can take resources that others threads do not use. The signature size for FlexSig depends basically on the concurrent nature of the benchmark (less concurrent threads lead to a better performance of FlexSig). In a similar way, Figure 11 shows the average signature size improvement for FlexSig-conf2 compared with the equivalent conventional signatures. Despite the fact that the resources are a half of the conf1, the improvement achieved shows a similar behavior (Figure 10 ). The best result in terms of FlexSig average signature size improvement is for streamcluster, with a more than 50% improvement, due to the low transaction concurrency in this benchmark. In this case, the improvement of the signature size doesn't imply a significant reduction of the false positive rate because this is already very low in absolute terms.
As a conclusion, the FlexSig system improves the false positive rate when compared with conventional parallel Bloom filters. In a system configured to run 16 threads, our signature system clearly outperforms the parallel Bloom filter implementation when the number of threads is lower than 16 (for the conf1 with the same resources), due to the flexibility of FlexSig to assign the physical registers depending on the demand (number of concurrent threads). General purpose multicore and multiprocessors are able to run a large number of concurrent threads, but many applications use only a few threads. FlexSig is flexible enough to provide these applications all the available signature resources to achieve better performance. We used very hard conditions in our evaluation to demonstrate that FlexSig can perform well even in an unfavorable scenario. The benchmarks used are highly concurrent, which means that many transactions use signatures at the same time. Moreover, because of the instrumentation tool, the benchmarks spend more time inside transactions, increasing transactional concurrency.
RELATED WORK
Most of the papers dealing with signatures are focused on improving performance, reducing chip area or reducing the false positive rate [Sanchez et al. 2007; Quislant et al. 2009; Yen 2009; Shenghua et al. 2009; Yen et al. 2008] . However, none of these papers focus on flexibility and scalability. The Scalable Bloom Filters (SBF) proposed by [Almeida et al. 2007] tries to make an approximation of scalable signatures. They use one signature, and when a fill ratio is reached, another signature is used. SBF was proposed to avoid the problem of oversize signatures due to the fact that the size of the signature must be defined previously based on the number of elements to be stored and the desired upper bound of the false positive rate. The SBF method can reduce the specific size of the signature used. However, it may use several signatures depending on the number of elements to be stored, and therefore, in reference to our context, the system has to be oversized anyway (with regard to the number of signatures). FlexSig does not fully avoid the problem of oversized signatures, but it is more flexible and efficient in the sense that it uses as many hardware resources as possible, having a significant effect on the false positive rate for a Transactional Memory System.
In recent publications we find Transactional Memory systems that fit very well for using FlexSig. Mehrara et al. [2009] proposes a Software Transactional Memory system with a centralized conflict detection mechanism (based on software signatures) placed in one core. One way to improve this scheme would be to use FlexSig instead of their software signatures. This would improve performance maintaining the flexibility of the software signatures. Another example is the scheme proposed by Casper et al. [2011] , that describes a new centralized hardware outside the processor chip to accelerate Software Transactional Memory systems. This special hardware includes signatures. They also propose two algorithms for conflict detection, one using two signatures per transaction and other using three signatures. FlexSig would allow to implement both with the same hardware and also will improve performance.
The idea behind FlexSig is similar to the recent trend of incorporating a shared last level cache in multicore systems. The cache size used by each core varies dynamically depending on the application. This leads to a more flexible system than having a fixed size slice of the last level cache assigned to each core. FlexSig follows this trend for a resource that might be of interest for future multicore implementations.
CONCLUSIONS
In this work we propose a module for hardware signatures to improve conventional signatures in terms of flexibility, scalability and fault tolerance. The main feature of FlexSig is that it can host a high number of signatures for cases with applications with a high number of threads and significant contention, and for the cases for low contention or few threads, it can achieve a very low false positive rate.
We described the module and its implementation, defined a detailed algorithm to allocate signatures and evaluated FlexSig in the context of a Transactional Memory system and compared it to an implementation with conventional parallel Bloom filters. From the evaluation performed, we show that, when the number of threads is low, FlexSig achieves a significant improvement because of the flexibility to use all the available resources. When the number of threads is high, the results are similar to the conventional implementation due to the highly concurrent nature of the benchmarks. However, with the same amount of resources, FlexSig never behave worse than conventional parallel Bloom filters.
FlexSig makes signatures more flexible to use as a general purpose hardware resource, since it is able to adapt to the concurrent demand of signatures, and decouples, to some extent, the type of benchmark from the hardware.
As future work, we will explore more complicated allocation algorithms using priorities and will test the efficiency of FlexSig in other environments outside Transactional Memory.
