Neiriory desigii is faciiig the upcorriiiig clialleiiges due to a coriibiiiatioii of' techriology scaliiig aiid higher levels of iiiteteiri coriiplexity. Iii particular, rneriioi-y circuits bccoiric vuliicrablc to traiisiciit (soft) errors caused by particle strikes aiid process spread. In this paper; we propose a iicw error-tolcraiicc tccliiiiquc rcfcrrcd to as the s o j t reduriduricy for on-chip rneiriory design. Program ruritiirie variatioiis iii rrierriory spatial locality cause wasted rrieriiory spaces occupied by tlic irrclcvaiit data. Tlic proposed softreduiidaiicy allocated riieriioi-y exploits these wasted riieriiory spaces to acliicvc cfficiciit iriciriory access a i d effective error protection iri a coherent riiaiiiier. Sirriulatiori results oii the SPEC CPU2000 bciicliriiarks dcirioiistratc 73.7% a\--erage error protectioii coverage ratio oii the 23 beiicliriiarl<s: with average of 52%) a i d 48.3%) reductioii iii irierriory miss rate aiid baiidwidth rcquirciriciit. respectively. as corriparcd to the existing techiiiques.
INTRODUCTION
Techiiological scaliiig has di-iveii the desigii of iiitegi-ated circuits with cxplodiiig systciri coiriplcxity a i d pcrforriiaiicc iiiiproverrieiit. With the coiitiiiuirig treiid towards riarioscale iiitcgratioii. iiaiiorrictcr devices arc approacliiiig tlicir pliysical liiriits, aiid precise coiitrol over feature size aiid desigii uiiifoririity becorries extrerriely difficult [I] . Desigii of liigli-~)crforriiaiicc iiitcgratcd tcrris is thus coiripcllcd to coriterid with a wide rarige of iiniiiariageable perforrriance spread a i d reliability dcgradatioii iiitroduccd by tlic uiidcrlyirig techiiology challeriges.
I'crmissioii to make digital or hard copies of all or part of this work for personal or classroom use is granted wihout lee provided that copies arc not madc or distributed for prolit or commercial advantage and that copies bcar this iiotice aiid the full citation oii the lirst page. To copy otherwise. to 1-cpublish. to post on servers or to redistribute to lists. requires prior specific pcrmissioii and/or a fee. These ohstacles to the serriicoiidiictor roadiiiap are particularly iioticeable iii rneiiiory circuits such as oil-chip caches. Techiiology scaliiig reduces the capacitaiice, supply voltages. arid lierice tlie iriforrriatioii-bearirig charges at storage nodes. Corisequeritly; riieiriory circuits becoirie vulrierable to trarisierit (soft) errors caused by particle strikes. Tirriirig corriplexity is also a problem that further coriiplicates rnerriory desigii. Neriiory access relies oii closely coupled clock waveforms to perhrrri latchiiig: gating: dyiiarnic timirig: aiid sophisticated cycle borrowiiig. However. clock skew coupled with sigrial delay variations iiitroduces tirning-related trarisieiit errors that aRect rriernory robustiiess. Iri addition. rriiiiiriiuiri-gcoirictry devices that build bulk of riicriiory arc very sensitive to variations iii process. supply voltages aiid teiriperature.
The lrustratiiig desigii complexity aiid reliability degradatioii press for iiew capabilities of error toleraiice for oil-chip riiciriory dcsigii. Traditioiial approaches iiicludc radiatioiiharderied riierriory structures [a] double or triple rnerriory rcduiidaiicy [SI. a i d code chcckiiig algoritlirris [4] . Iiitcgratiiig these techriiques into high-per~orriiaiice oil-chip rnerriory preseiits a sigiiificaiit clialleiige due to tlie severe coiistraiiits oii area aiid tiiriiiig iriargiiis. 011 a parallel path. r cache riiici-oarcliitecture has beeii focusiiig oii efficieiitly utiliziiig riicriiory resources. rcduciiig tlic baiidu-idth rcquirciiierit of data fetcliiiig, while riot iiidiiciiig too iiiuch overlicads or lowcriiig spatial locality. Sub-blockcd caclics [5] rcdiice riieiriory traffic by traiisferriiig oiily a siiigle siih-block oii a caclie rriiss. Some alteriiatives [GI ~ [7] dyiiarriically adapt the block size to the spatial locality of the program. Other techiiiques [SI ~ ['3] adopt history based fetching on the grariularity of sub-blocks or words. It should be poiiitcd out that rioiie of the above techriiqiies address both iiieiiiory robustIii this paper7 we propose a iiew error-toleraiice tecliiiiqiie. referred to as the sql'l ,reduriduncy, for on-chip rneiiiory desigii. We observe that prograiri ruiitiiric variatioiis iii iricrriory spatial locality cause marly irrelevant data being fetched iiito tlic iricriiory a i d thus waste iricriiory spaces. Cy frcciiig these iiierriory spaces: tlie program creates traiisieiit (soft) rneiiiory reduiidaiicy that can be exploited lor toleraiice ol traiisieiit errors. Through adaptively halaiiciiig caclie resources arriorig dilfereiit cornputirig tasks, the proposed tecliiiiquc is capable of achicviiig cfficiciit iriciriory a effective error protectioii in a cohereiit rnaiiner. the iriipleirieritatioii of' our technique. we develop a desigii iiiethodology that eiiahles optimal tracleoffs betweeri hardiid access cfficiciicy. ware overheads arid error tolerance. Sirriulatiori results on the SPEC CPU2000 bciichrriarks [12] dcrrioiistratc 73.7% a~i-erage error protection coverage ratio on the 23 beiichmarks: with average ol 52% arid 48.3% reductioii in iriernory miss rate aiid baiidwidth reqiiireiiieiit, respectively, as cornpared to the existiiig tecliiiiques.
111 sectioii 2: we develop the soft-reduiidaiicy allocated iiierriory rriicroarchitecture lor elfective error protectioii. l r i section 3; we formulate a desigii rriethodology for optirniairig the performance of soft redundancy allocation. Section 4 preseiits a statistical arialysis or1 the error tolerance aiid corriparisoii to the existiiig tecliiiiques. Iii sectioii 5. we evaluate the perloriliarice ol the proposed techiiique.
SOFT-REDUNDANCY ALLOCATED MEMORY
111 most processor arcliitccturcs. tlic size of caclic liiics is fixed, while the spatial locality varies during ruritirne wider dificrciit prograiris. Thcrc arc alv,-ays sorric irrclcvaiit data beirig letched in the cache lilies: wastirig riierriory spaces.
111 this sectioii. we discuss tlie soft-reduiidaiicy allocated iriciriory that achieves cfficiciit iricrriory utilizatioii aiid cficctive error pi-otectioii. The proposed techiiique aims at freeiiig the caclic liiic spaces takcii by tlic irrclcvaiit data. Tlic released spaces provide soft (traiisierit) reduridaiicy that is liiddcii iii tlic cache liiics due to variatioiis iii rricriiory spatial locality. Thus: iiistead of fetchiiig the eiitire cache lilies coiitaiiiiiig iriaiiy irrelevaiit data, tlie proposed tecliiiique selectively fctclics useful data to the sublilies witliiii a cache line. 'Llie iiriused sublines are utilized to store a rechiridant copy of the useful data. tlicrcby prwidiiig cf€cctivc error protectioii. Figure 1 lor the purpose ol deirionstratioii). A his-F i g u r e 2: A n e x a m p l e of soft r e d u n d a n c y allocation. Ari exaiiiple of iiieiiiory access seqiieiice iisiiig the soft reduiidaiicy allocatioii is showii iri Figure 2 . Each cache liiie caii he operated iii oiie of the two allocatioii riiodes: e c h e c k i~q arid ,rio-checkzriq~ 'Llie rriode switch is controlled b y the iiiforrriatioii stored iii the history table. The detailed operation arid soft reduiidancy allocation is explaiiied helow iii ref'ereiice to the steps iri Figure 2 .
Soft Redundancy Allocation
Iriitially. the cache liiie is iii the error-checkirig rriode. The [lag bit is set to .'l'' iiidicatiiig this mode. The history table stores "00-01" for two default sublilies .'00'' aiid Til". Note that the default sublines can he chosen arbitrarily. Assuiiie that the first miss iii this cache line triggers the fetch of data to replace those in subline %1" (step 2 iii Fig 2) . Sirice the suhlirie %1'' is already iii the history table, the allocatioii iiiode will reiiiairi at error-checkiiig, hit the coiiteiit of histable will he updated to -01-00'' ~ iiidicatiiig that subliiie is the most receiitly replaced subliiie. The cache will orily fetch data to sublirie .'01"; riot the eiitire cache h e . Illeaiiwhile; ail uiiused space; e.g.. sublirie .'ll'' is assigiied to save a reduridant copy of the new data in subline -01".
Assume the next iriiss causes the replaceriierit ol subliiie .'lo'' (step 3 ) . Siiice this subliiie is iiot listed iii the history table. tlic history table will be updated to -10-01".
Sirice this is the first tirrie of a rriiss wliose eritry is iiot iri tlic history table. the allocatioii rriodc gill rcriiaiii at crrorcheckiiig iriode. New data is fetched iiito siihliiie .'lV' aiid a reduiidarit copy is stored at an uiiused subline deterrriiried by Table 2 .
Kext: assiiiiie the siihlirie -11" is to he replaced because of a lieu-iriiss (step 4). The history table is thus cliaiigcd to "11-10". Sirice this is the secorid tiiiie of a iiiiss whose eritry is iiot iii the history table, statistically we caii coiiclude that the prograiri is hecoiriiiig less predictahle with respect to the iiieiiiory spatial locality during this period. Thus: the allocatioii riiodc will be switched to iio-chcckiiig. The entire cache lirie will he fetched and no siihlirie will have a If the siihseqiierit iiiiss occiirs iii siibliiie "10" (step 5); the program will still letch the eiitire cache liiie aiid update the history table to %-ll". But becaiise siihliiie .'lW is already in the history table. the liit in the history table establishes soriie confidence iii the locatioii where the program is iriost likely to access.
The allocation iriode will switch back to error-checkiiig provided the riext iiiiss is also a hit iri the history table. For example, a miss iii subliiie -11'' leads to update of' history table to .'11-10'' (step 6); aiid this is the secoiid coiisecutive liit in the history table. Consideriiig this everit as an eviderice of sufficient iriernory spatial locality. we caii assuirie tlie subsequent rrierriory accesses are likely to he directed to those subliiies recorded iii the history table. Thus: new data redulldallt copy.
will be fetched to subliiie "11" oiily aiid a copy of the iiew data is stored accordiiig to Table 2 .
Iri a siiiiilar way; a replaceiiierit of siibliiie "10" does riot cliaiigc tlic allocatioii rriodc (step 7 ) . It gill update tlic history table to %-ll" aiid save a reduiidaiit copy of subliiie .w7.
The ahovc cxariiplc dcriioiistratcs the riiaiii idea of the proposed soft reduridaricy allocation. By rnonitoriiig realtiriic access pattcriis recorded iii tlic history table. we caii keep track of the locatioiis (siihliries) iii a cache liiie that arc accessed riiorc frcquciitly by a program. Siiicc tlicsc siihliiies are accessed very ofteii: we assigii rediiiidaiit siibliries to them for tlie purpose of' error tolerance. 0 1 1 the other haiids. tlic suhliiics that arc a are most likely tlie ones storing irrelevant data due to the iiori-idealities ol spatial locality. We can theii lree these siihliries aiid use theiii as soft reduridaricy for the frequeritly accessed subliiies. Heiice. the proposed techiiique is selladaptive to riiiitiirie varyiiig spatial locality aiid is able to exploit trarisierit (solt) reduiidaiicy for error protection. Table 1 shows the geiieral procedure for updatiiig the history table and directirig the mode traiisition. 'lhe allocation mode decides whether to allocate reduiidaiit space lor error Ixdxctioii or to fetch the eiitire cache liiie for perforrriarice. ll the cache line is iii the error-checkiiig mode: each subliiie listed iii the history table will have a reduiidaiit copy iii tlie same cache liiie. On tlie other liaiid; il the cache line is in the no-checking inode; the eiitire cache lirie will be fetched without rnakiiig any rediindarit copy. The allocation rriode switches when eriough coiikleiice has been established.
Error Protection
Wlieii the prograiri reads data iii the cache lilies curreiitly in tlie error-clieckiiig mode. errors could be detected by cornpariiig tlic data with the reduiidaiit copy.
Iri traditioiial cache iiiicroarcliitectiire~ iiierriory liit/rriiss is geiierated by corripariiig the desired tag address with the tag address of the cache liiie. A hit is gerierated if the two tag addresses are rriatched. \2'hat is diflereiit iii the proposed techiiique is aii additioiial coriiparisoii betweeii the two data copies iri the same cache line. Alisrriatches between the two copies iiidicate errors herice result iii caiicelatiori of rnerriory access. A memory miss is gerierated as a result. 'lliere is a possible situatioii that wheii reduridarit copy is corrupted while the origiiial data is correct: a iiiisjudgerrieiit ori the correctiiess ol the data would occur: thereby iiitroduciiig ail additioiial miss with perfoririaiice peiialty. Howeverj siiice the probability is relatively low, tlie performance penalty is iiegligible.
The proposed techriiqiie coiild detect rriiiltiple errors that occur iri aiiy bits. This is a sigiiificarit iriiproveriierit over tlie existiiig error-control techiiiques that are only ellective lor siiigle-bit or double-bit errors. Statistical aiialysis iii sectioii 1 dernoristrates about IOX irnproveiiient in error detectiori capability over the cxistiiig tcchiiiqucs. Table 3 : Algorithm for d e t e r m i n i n g t h e o p t i m a l par a m e t e r s -(ri.pt. r(llj+).
DESIGN OPTIMIZATION
In this section: we preseiit ari algorithiri to deterriiiiie the optirrial coiifigiiratioii of the proposed soft-rediiiidaricy allocated memory microarchitecture.
Coiisider a geiieral case where each cache liiie coiitaiiis 'rri words divided into ri sublines. Here we assuiiie the miniiriuiri size of a subliiie is a siiigle word. Each cache liiie also integrates oiie flag hit for the allocatioii rriode and a history table with 5 log, ~i bits to record the most receiitly replaced subliiies. Tlie capacity of the history table should be large eriough to store the IDS ol at most hall ol the total subliiies iii each cache liiie.
If tlic subliiic listed iii the history table is liit or irii consecutively for certain tiiries. the allocation iriode switch bctwccii the iio-clicckiiig rriodc aiid error-chcckiiig iriode. Assume that the mode will switch alter c times. Tlius: the pararrieter c defiiies tlie coiifideiice level iii riiode switch. 111 tlic iio-chcckiiig rriodc. tlic ciitirc caclic liiic will be fetched aiid iio reduiidaiit copy will be made. Iii the error-clicckiiig riiodc. cacli subliiic listed iii tlic history table has a reduridaiit copy iii the same cache liiie. Wheii the a word iii a subliiic listed iii tlic history table, the secoiid copy is coiripared with iii order to detect possible errors iii the origiiial data.
The suhliiic imiribcr J L is a l<cy parairictcrs iii the proposed techiiique. The value of 1~ iieeds to satisfy the followiiig coiiditioii Tlie coiifideiice level c also affects the perfoririaiice of tlie proposed tccliiiiquc. It caii be choscii froiri a raiigc giv-cii by It is iiecessary to deterriiiiie the optirrial values of these two parairictcrs by coiisidcriiig tlic tradcoffs bctwccii crror protectioii aiid desigii overheads. The tiriiirig overheads of the proposed techiiique caii be rriiiiirriiaed by executiiig the required operatioiis (e.g.: data comparison aiid reduridaiit data copyiiig) iii parallel with iioii-critical operatioils, tlicrcby kccpiiig tlic operatioils off tlic critical tiriiiiig path.
Iii the following, we will focus oil the hardware overheads iri tliis optirriiaatioii aiialy, A large subline riuriiber ri provides more Ilexibility in exploitiiig the soft reduiidaiicy aiid lieiice leads to better error protcctioii. But oii the other haiid. it iiicrcascs the hardware complexity. especially the size of history table.
Tlic coiifidciicc level c affects tlic riiodc su-itdi frcquciicy arid thus the coverage of error protectioii. 111 the proposed tccliiiiquc. tlic subliiics arc iiot uiidcr tlic error protcctioii all the tiirie. Wheii a cache liiie is iii the iio-checkiiig mode: the data iii that cache liiie is iiot protected. iiiirrierical values caii be obtaiiied by averagiiig over diff'ereiit prograiiis subject to various rnernory access patterns. The hardware overlieads of the proposed tccliiiiquc iiicludc history tables. additional comparators and control hits. Each caclic liiic criiploys ( : log, 71,) bits to riiaiiitaiii tlic access liistory. At the switch coiifiideiice level (.; each cache liiie rieeds (log2 c + 1) bits lor mode traiisitioii coiitrol aiid oiie bit lor allocatioii riiode iiidicatioii. Each coiriparator for error detectioii is a ? bit cornparator: where n[ is the iiuiiiber ol words iii each cache liiic. Siiicc each coiriparator is shared by two arbitrarily assigned sihliiies aiid each cache lirie has ~i sublines, the iiurriber ol cornparator bits is + x = 8rn.
For a cache with k cache liries where each lirie curitairis 'rrL words arid is divided into ri sublines with switch coiiIideiice level of e; the hardware overhead C/,(IL; c) is the total size of' the history tables. the comparators: arid the coritrol bits. expressed as
To derive a general solution applicable to diflereiit programs with differeiit rnerriory access patterns: we riorrnaliae the hardware overhead by its iriaxiiriuiri value C,,,,, . arid the error protectioii coverage ratio by K,r,,,,I i.e.:
Grn, I
The optiirial soft reduiidaiicy allocatioii is tlie oiie that achieves tlie best tradeofl' between error protection coverage ratio aiid hardware overheads. This can be expressed as the following optirriiaatiori problern 3(Ilop*; cu,t) E(11, c). s.t..
wliere (1 a i d i3 are tlie weight factors for error protectioii coverage ratio aiid hardware overheads. rcspcctiv-cly. By cliangiiig the values of n arid R7 we caii adjust tlie desigri priority bctwccii error protcctioii a i d hardware overlieads. Table 3 siiiiirriariaes the proposed algoritliiii iii deterrriiriiiig tlic optimal coiifiguratioii of soft rcduiidaiicy allocatioii.
STATISTICAL ANALYSIS OF ERROR TOLERANCE
Iii this sectioii, we perform a statistical aiialysis to quaiitify tlic error tolcraiicc of tlic proposed soft-rcduiidaiicy allocated riiernory.
as siiigle-hit upsets (SBU). As the feature size of seiriicoiiductor process beiiig scaled iiito the iiaiioirieter doriiaiii, a siiiglc partial strike iriay potciitially destroy the states of multiple irieiriory bits; resultiiig iii multiple-bit upsets (MEU) . Tlius. we iiccd to evaluate tlic error protcctioii for both SEU arid MEU.
We coiriparc our tccliiiiquc with tlic parity clicckiiig code aiid siiigle-error-correctioii doiihle-error-detectioii Hariiiriiiig code. Parity clieckiiig code is coiisidered as tlie riiost effective tccliiiiquc for dctcctiiig SBU. whereas Hariiriiiiig code provides error detection for up to two bits of errors. In oiir cstiiriatioii riiodcl. each soft error caii cau or a double-bit error with the prohabilities of P, aiid P,,,: (10) is the solt error rate (SER). Based oii the obseri [lo] . the probahility of SBU is selected as two orders ol iiiagiiitude larger thaii that of' M3I-J. Iri additioii, we riiodel the soft errors as iiidepeiideiitly aiid ideiitically distributed events. i.e. ~ a rrierriory entry may have rnultiple errors caused by multiple soft error events. 
EVALUATION AND DISCUSSION
Iri this sectioii: we evaluate the error toleraiice of the proposed soft-rcduiidaiicy allocated rricrriory iriicroarcliitccturc. We also corripare tlie irieiriory access perforriiaiice aiid haiidwidth requirerrierit with the traditiorial cache systeiiis.
Our sirriulatioii results were obtaiiicd frorri a siriiulator based or1 the trace-driven sirnillator Diriero IV [Ill. Tlie caclic iriodcl iii this simulator is riiodificd to support tlic proposed soft rediiridaiicy rriicroarcliitectiire. The proposed tecliiiique is evaluated iri a direct iiiapped cache wliose total size is 32KB aiid each liiie is 32B. The iiiiiriher of suhliiies is selected to be 8; arid switch coiifidence value is set to 2. All the siriiulatioiis were ruiiiiiiig oii the SPEC CPUZUOO [12] trace files collected from tlie Strearri-Based Xace Compressioii (SIX) [13] ~ where trace files ol 23 beiicliiriarks are available.
Iri sectioii 4; we have showii that tlie proposed tecliiiique achieves aii iiriproved error detectioii capalility. A#laiiy existirig works [ 1.11 ~ [ 161 oil soft errors usually assume certain coriditioiis or target specific architectures. Instead of siiriiilatirig soft errors directly, we evaluate the error protection ratio R,, as cleGried iii (3). Table 4 shows the results of' error protectioii ratio of the 23 workloads selected for sirriiilatiori.
These results are obtained from ( 3 ) usiiig statistical results reported by the simulator. The average en-01-protectioii ratio of all the 2.3 berichmarks is 73.7%. In some benchiiiarks: e.g.; art. mcfj aiid applu; iiearly all the rrieiriory accesses are protected by oiir rriechariisrri. In additioii, the proposed tecliiiique iiiduces very srriall hardware overheads. The ex- tra hardware iiicludcs just 15 hits per cache liiic aiid a 32-bit comparator for each reduridancy suhliiie pair. It is expected that tlic proposed tccliiiiquc will possibly iiitroduce higher miss rates caiised b y the wroiig predictioii of salt reduridaiicy. But oii the otlier liaiid; the proposed tecliiiiqiie oiily fetches iisefiil data wheii possible: thereby rediicirig the rnernory Iraiidwidtli requiremelit. As proved in rnariy sub-blocked cache dcsigiis [5] which feature a siiriilar cache structure: the reduction iii I)aridwidth requirement can offset the iricrease iii iriiss rate. leacliiig to ail overall perlorrnaiice iiiiproverrieiit. 111 coiiiparisoii with the siih-hlocked cache design with the same sub-block iiuiiilrer, our tecliiiique achieves aii average of 52% rediictioii iii iriiss rate (see Fig. 3(a) ). The Iraiidwidtli requirerrieiit iii term ol total fetched words is also sigriificaiitly reduced; averaging at 48.3% as cornpared to the tradition cache (see 1;ig. 
3(b)).
Additional power coiisurriptiori results lrorn the error coritrol operatioils siich as diiplicatiiig data arid error clieckirig.
Our luture work will be directed to the tradeolk between error toleraiice aiid power coiisuinptioii. Also iiote that the elficiericy of' tlie history-based reduridaiicy allocation deteririiiies the error tolerance; perforrnaiice. arid baiidwidth requirement. 1;uture work will study on how to efficiently exploit solt reduiidaiicy iii applications that have diflereiit rneiiiory access spatial locality along with caches that have differeiit cache liiie sizes. Further iriiproveriieiit of the proposed error-control technique could Ire a coiiibiriatiori ol the soft rcduiidaiicy aiid error chcckiiig codes. thereby providiiig error clieckirig to cover all the data aid iiieariwhile irnproviiig error detection.
CONCLUSIONS
This paper prcsciits a iicw error-coiitrol tccliiiiquc referred to as the soft redunduncy for efficieiit riieriioi-y access aiid effective error protcctioii. Statistical aiialysis reveals about 1OX irriproveiiierit iii error detectioii over the existirig errorcoiitrol tccliiiiqucs. Sirriulatioii results dcrrioiistratc 73.7% average error protectioii ratio oii the 23 heiichiriarks7 with average of 52%) aiid 48.3%1 reductioii iii riieriiory iriiss rate aiid baiidwidth rcquircriiciit. respectively. as coiriparcd to the existirig techiiiques. Future work is heirig directed towards cxploitiiig soft rcduiidaiicy with tlircad iiiforriiatioii for rriultithreaded corripiitiiig.
