24 research outputs found

    Speculative Staging for Interpreter Optimization

    Full text link
    Interpreters have a bad reputation for having lower performance than just-in-time compilers. We present a new way of building high performance interpreters that is particularly effective for executing dynamically typed programming languages. The key idea is to combine speculative staging of optimized interpreter instructions with a novel technique of incrementally and iteratively concerting them at run-time. This paper introduces the concepts behind deriving optimized instructions from existing interpreter instructions---incrementally peeling off layers of complexity. When compiling the interpreter, these optimized derivatives will be compiled along with the original interpreter instructions. Therefore, our technique is portable by construction since it leverages the existing compiler's backend. At run-time we use instruction substitution from the interpreter's original and expensive instructions to optimized instruction derivatives to speed up execution. Our technique unites high performance with the simplicity and portability of interpreters---we report that our optimization makes the CPython interpreter up to more than four times faster, where our interpreter closes the gap between and sometimes even outperforms PyPy's just-in-time compiler.Comment: 16 pages, 4 figures, 3 tables. Uses CPython 3.2.3 and PyPy 1.

    List Processing in Real Time on a Serial Computer

    Get PDF
    Key Words and Phrases: real-time, compacting, garbage collection, list processing, virtual memory, file or database management, storage management, storage allocation, LISP, CDR-coding, reference counting. CR Categories: 3.50, 3.60, 373, 3.80, 4.13, 24.32, 433, 4.35, 4.49 This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N00014-75-C-0522.A real-time list processing system is one in which the time required by each elementary list operation (CONS, CAR, CDR, RPLACA, RPLACD, EQ, and ATOM in LISP) is bounded by a (small) constant. Classical list processing systems such as LISP do not have this property because a call to CONS may invoke the garbage collector which requires time proportional to the number of accessible cells to finish. The space requirement of a classical LISP system with N accessible cells under equilibrium conditions is (1.5+ÎŒ)N or (1+ÎŒ)N, depending upon whether a stack is required for the garbage collector, where ÎŒ>0 is typically less than 2. A list processing system is presented which: 1) is real-time--i.e. T(CONS) is bounded by a constant independent of the number of cells in use; 2) requires space (2+2ÎŒ)N, i.e. not more than twice that of a classical system; 3) runs on a serial computer without a time-sharing clock; 4) handles directed cycles in the data structures; 5) is fast--the average time for each operation is about the same as with normal garbage collection; 6) compacts--minimizes the working set; 7) keeps the free pool in one contiguous block--objects of nonuniform size pose no problem; 8) uses one phase incremental collection--no separate mark, sweep, relocate phases; 9) requires no garbage collector stack; 10) requires no "mark bits", per se; 11) is simple--suitable for microcoded implementation. Extensions of the system to handle a user program stack, compact list representation ("CDR-coding"), arrays of non-uniform size, and hash linking are discussed. CDR-coding is shown to reduce memory requirements for N LISP cells to ≈(I+ÎŒ)N. Our system is also compared with another approach to the real-time storage management problem, reference counting, and reference counting is shown to be neither competitive with our system when speed of allocation is critical, nor compatible, in the sense that a system with both forms of garbage collection is worse than our pure one.MIT Artificial Intelligence Laboratory Department of Defense Advanced Research Projects Agenc

    Garbage Collection Algorithms

    Get PDF
    This thesis focuses on an implementation of automatic memory management in C programming language. Mark-sweep method was modified for use in uncooperative programming language, which does not share data type information of memory slots accessible by the mutator. Due to this fact, decisions on pointer identity are conservative which guarantees safe collector operation - if value looks sufficiently like a pointer, it is considered a pointer (although it might not actually be one). Mark bits were moved from object's headers to bitmaps, stored in a seperate part of memory to prevent accidental writes to user's data by the collector. Finally, the usage of garbage collector was evaluated in practice

    Garbage Collection Algorithms

    Get PDF
    This thesis focuses on an implementation of automatic memory management in C programming language. Mark-sweep method was modified for use in uncooperative programming language, which does not share data type information of memory slots accessible by the mutator. Due to this fact, decisions on pointer identity are conservative which guarantees safe collector operation - if value looks sufficiently like a pointer, it is considered a pointer (although it might not actually be one). Mark bits were moved from object's headers to bitmaps, stored in a seperate part of memory to prevent accidental writes to user's data by the collector. Finally, the usage of garbage collector was evaluated in practice

    Reaaliaikaisen roskankeruun tekniikat

    Get PDF
    Roskankeruulla tarkoitetaan automaattista muistinhallinnan mekanismia, jossa roskankerÀin vapauttaa sovelluksen varaamat muistialueet, joihin sovellus ei enÀÀ viittaa. KeskeisiÀ roskankeruun perustekniikoita ovat muistiviitteiden laskenta ja jÀljittÀvÀt keruutekniikat, kuten mark-sweep-keruu ja kopioiva keruu. Reaaliaikaisissa ja interaktiivisissa sovelluksissa roskankeruusta koituvat suoritusviiveet eivÀt saa olla liian pitkiÀ. TÀllaisissa sovelluksissa keruuta ei voida toteuttaa yhtenÀ atomisena operaationa, jonka ajaksi ohjelman suoritus keskeytyy. Sen sijaan roskankeruu voidaan kohdistaa vain osaan ohjelman muistista, tai roskankeruu toteutetaan etenemÀÀn samanaikaisesti ohjelman suorituksen kanssa. Varsinaiset reaaliaikaiset keruutekniikat vuorottavat roskankerÀimen suorituksen siten, ettÀ keruusta aiheutuvat viiveet ovat tarkkaan ennakoituja. Tutkielmassa vertailtiin Java-kielen roskankerÀimiÀ erilaisilla työkuormilla ja erikokoisilla muistialueilla. Mittauksissa tarkasteltiin mittausajojen kestoa, roskankeruutaukojen kestoa sekÀ taukojen jakautumista ohjelman suorituksen ajalle. Mittauksissa löydettiin merkittÀviÀ eroja vertailtujen kerÀimien vÀlillÀ. Java-kielen uusi G1-kerÀin suorittaa koko muistiin kohdistuvan merkintÀvaiheen rinnakkaisena, ja kopiointivaihe kohdistetaan kerrallaan vain pieneen osaan ohjelman muistista. G1-kerÀin oli suoritetuissa mittauksissa vain hieman hitaampi kuin vanha Parallel-kerÀin, mutta G1-kerÀimen keruutauot olivat huomattavasti lyhyempiÀ. Kun G1-kerÀimen keruutauoille asetettiin tavoitekesto, viiveet olivat pisimmillÀÀn vain muutamia kymmeniÀ millisekunteja. Vertailussa mukana olleella Shenandoah- kerÀimellÀ, joka on suunniteltu takaamaan erityisen lyhyitÀ suoritusviiveitÀ, ohjelman suoritukselle aiheutuneet viiveet olivat vain muutamia millisekunteja

    Muistin siivous

    Get PDF
    Tutkielmassa esitellÀÀn roskan kĂ€site tietojenkĂ€sittelytieteessĂ€, roskienkeruun keskeiset kĂ€sitteet ja perusmenetelmĂ€t muunnelmineen sekĂ€ nykyaikaiset tehokkaat algoritmit. KeskipisteenĂ€ ovat kuitenkin muistinhallintatutkimuksen 2000-luvun saavutukset, tutkimusaiheet ja tutkimusvĂ€lineet. NĂ€itĂ€ hyödyntÀÀ tutkielmassa esiteltĂ€vĂ€ uusi CBRC-roskienkeruualgoritmi. LisĂ€ksi katsastetaan ohjelmoijan vastuu automaattisessa muistinhallinnassa sekĂ€ ohjelmoinnissa kĂ€ytettĂ€vissĂ€ olevat roskienkeruutietoiset vĂ€lineet erĂ€issĂ€ ohjelmointikielissĂ€ ja –ympĂ€ristöissĂ€ (Java, .Net, C++). Avainsanat ja -sanonnat: roskienkeruu, muistinsiivous, muistinhallinta, algoritmit, ohjelmointikielet CR-luokat: D 3.4, D.4.2, D.3.

    Fast conservative garbage collection

    Full text link

    An Examination of Deferred Reference Counting and Cycle Detection

    No full text
    Object-oriented programing languages are becoming increasingly important as are managed runtime-systems. An area of importance in such systems is dynamic automatic memory management. A key function of dynamic automatic memory management is detecting and reclaiming discarded memory regions; this is also referred to as garbage collection. A significant proportion of research has been conducted in the field of memory management, and more specifically garbage collection techniques. In the past, adequate comparisons against a range of competing algorithms and implementations has often been overlooked. JMTk is a flexible memory management toolkit, written in Java, which attempts to provide a testbed for such comparisons. This thesis aims to examine the implementation of one algorithm currently available in JMTk: the deferred reference counter. Other research has shown that the reference counter in JMTk performs poorly both in throughput and responsiveness. Several aspects of the reference counter are tested, including the write barrier, allocation cost, increment and decrement processing and cycle-detection. The results of these examinations found the bump-pointer to be 8% faster than the free-list in raw allocation. The cost of the reference counting write barrier was determined to be 10% on the PPC architecture and 20% on the i686 architecture. Processing increments in the write barrier was found to be up to 13% faster than buffering them until collection time on a uni-processor platform. Cycle detection was identified as a key area of cost in reference counting. In order to improve the performance of the deferred reference counter and to contribute to the JMTk testbed, a new algorithm for detecting cyclic garbage was described. This algorithm is based on a mark scan approach to cycle detection. Using this algorithm, two new cycle detectors were implemented and compared to the original trial deletion cycle detector. The semi-concurrent cycle detector had the best throughput, outperforming trial deletion by more than 25% on the javac benchmark. The non-concurrent cycle detector had poor throughput attributed to poor triggering heuristics. Both new cycle detectors had poor pause times. Even so, the semi-concurrent cycle detector had the lowest pause times on the javac benchmark. The work presented in this thesis contributes to an evaluation of components of the reference counter and a comparsion between approaches to reference counting implementation. Previous to this work, the cost of the reference counter's components had not been quantified. Additionally, past work presented different approaches to reference counting implementation as a whole, instead of individual components

    Exploiting the Weak Generational Hypothesis for Write Reduction and Object Recycling

    Get PDF
    Programming languages with automatic memory management are continuing to grow in popularity due to ease of programming. However, these languages tend to allocate objects excessively, leading to inefficient use of memory and large garbage collection and allocation overheads. The weak generational hypothesis notes that objects tend to die young in languages with automatic dynamic memory management. Much work has been done to optimize allocation and garbage collection algorithms based on this observation. Previous work has largely focused on developing efficient software algorithms for allocation and collection. However, much less work has studied architectural solutions. In this work, we propose and evaluate architectural support for assisting allocation and garbage collection. We first study the effects of languages with automatic memory management on the memory system. As objects often die young, it is likely many objects die while in the processor\u27s caches. Writes of dead data back to main memory are unnecessary, as the data will never be used again. To study this, we develop and present architecture support to identify dead objects while they remain resident in cache and eliminate any unnecessary writes. We show that many writes out of the caches are unnecessary, and can be avoided using our hardware additions. Next, we study the effects of using dead data in cache to assist with allocation and garbage collection. Logic is developed and presented to allow for reuse of cache space found dead to satisfy future allocation requests. We show that dead cache space can be recycled at a high rate, reducing pressure on the allocator and reducing cache miss rates. However, a full implementation of our initial approach is shown to be unscalable. We propose and study limitations to our approach, trading object coverage for scalability. Third, we present a new approach for identifying objects that die young based on a limitation of our previous approach. We show this approach has much lower storage and logic requirements and is scalable, while only slightly decreasing overall object coverage

    Ramasse-miettes générationnel et incémental gérant les cycles et les gros objets en utilisant des frames délimités

    Get PDF
    Ces derniĂšres annĂ©es, des recherches ont Ă©tĂ© menĂ©es sur plusieurs techniques reliĂ©es Ă  la collection des dĂ©chets. Plusieurs dĂ©couvertes centrales pour le ramassage de miettes par copie ont Ă©tĂ© rĂ©alisĂ©es. Cependant, des amĂ©liorations sont encore possibles. Dans ce mĂ©moire, nous introduisons des nouvelles techniques et de nouveaux algorithmes pour amĂ©liorer le ramassage de miettes. En particulier, nous introduisons une technique utilisant des cadres dĂ©limitĂ©s pour marquer et retracer les pointeurs racines. Cette technique permet un calcul efficace de l'ensemble des racines. Elle rĂ©utilise des concepts de deux techniques existantes, card marking et remembered sets, et utilise une configuration bidirectionelle des objets pour amĂ©liorer ces concepts en stabilisant le surplus de mĂ©moire utilisĂ©e et en rĂ©duisant la charge de travail lors du parcours des pointeurs. Nous prĂ©sentons aussi un algorithme pour marquer rĂ©cursivement les objets rejoignables sans utiliser de pile (Ă©liminant le gaspillage de mĂ©moire habituel). Nous adaptons cet algorithme pour implĂ©menter un ramasse-miettes copiant en profondeur et amĂ©liorer la localitĂ© du heap. Nous amĂ©liorons l'algorithme de collection des miettes older-first et sa version gĂ©nĂ©rationnelle en ajoutant une phase de marquage garantissant la collection de toutes les miettes, incluant les structures cycliques rĂ©parties sur plusieurs fenĂȘtres. Finalement, nous introduisons une technique pour gĂ©rer les gros objets. Pour tester nos idĂ©es, nous avons conçu et implĂ©mentĂ©, dans la machine virtuelle libre Java SableVM, un cadre de dĂ©veloppement portable et extensible pour la collection des miettes. Dans ce cadre, nous avons implĂ©mentĂ© des algorithmes de collection semi-space, older-first et generational. Nos expĂ©rimentations montrent que la technique du cadre dĂ©limitĂ© procure des performances compĂ©titives pour plusieurs benchmarks. Elles montrent aussi que, pour la plupart des benchmarks, notre algorithme de parcours en profondeur amĂ©liore la localitĂ© et augmente ainsi la performance. Nos mesures de la performance gĂ©nĂ©rale montrent que, utilisant nos techniques, un ramasse-miettes peut dĂ©livrer une performance compĂ©titive et surpasser celle des ramasses-miettes existants pour plusieurs benchmarks. ______________________________________________________________________________ MOTS-CLÉS DE L’AUTEUR : Ramasse-Miettes, Machine Virtuelle, Java, SableVM
    corecore