7 research outputs found

    An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

    Full text link
    Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is 27×27\times faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and 1.34×1.34\times faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is 2.8×2.8\times and 3.2×3.2\times than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems

    Simulation full-band du transport quantique dans les nanocomposants avancés

    No full text
    The semiconductor industry, in its continued effort to scale down nanoscale components further, needs to predict the physical properties of future components. As the size of such devices shrinks down, the currently prevalent semi-classical models start to fall apart, as quantum effects that are usually invisible in larger silicon devices gain in relevance in smaller and/or III-V based semiconductor devices. Therefore, modeling and simulation tools should describe adequately the favorite technological options that are currently under investigation. Consequently, full quantum simulations are necessary to the development of modern field effect transistors.The purpose of this PhD thesis is to develop the tools suitable for those simulations and use them to look into some of the most relevant design options for transistor technology.Hence, we used the Non Equilibrium Green's Functions formalism to simulate charge carriers transport and investigate field effect transistors.The semiconductor band structures were calculated within a continuous kp formalism, but we also developed an atomistic effective pseudopotential method to perform full-band simulations with a variety of ingredients like arbitrary crystal orientation, surface roughness, arbitrary alloy composition in the transistor channel, and so on. This pseudopotential method provides accurate results for a wider array of configurations with a smaller parametrization effort than the k.p formalism.We used these simulation tools to evaluate the transport properties of silicon and InAs based FinFETs, focusing on the supply-voltage scalability of III-V based devices compared to silicon counterparts. In particular, the feasibility of obtaining large on-current values in III-V devices is discussed.Then, we applied that formalism to III-V based gate all-around (GAA) nanowire tunnel-FETs (TFETs). Tunnel-FETs are a promising architecture for future transistors, facing optimization and performance challenges. We aimed at benchmarking the effect of technological boosters on the performances of TFETs, namely the use of strain engineering and of III-V heterojunctions. We've shown that these boosters allow TFETs to theoretically outperform standard MOSFET technology, but that strain engineering induces undesirable drawbacks.In order to design high performance TFETs without the use of strain, we finally introduced novel design options by exploiting a molar fraction grading of a ternary alloy or alternatively a quantum well in the source region. These device configurations dramatically change the density of state of the TFET at the source/channel junction and are therefore able to improve the electrical performance of TFETs with respect to conventional MOSFETs.L'industrie du semiconducteur, dans son effort visant à réduire la taille des nanocomposants, éprouve le besoin de prédire les propriétés physiques des composants futures. Alors que la taille de tels composants se réduit, les modèles semi-classiques en vigueur perdent de leur validité, puisque des effets quantiques, qui sont d'ordinaire invisibles dans des dispositifs en silicium plus grands, prévalent dans des dispositifs plus petits ou à base de matériaux semiconducteurs III-V. Par conséquent, les outils de simulation et de modélisation devraient décrire adéquatement les options technologiques en faveur qui sont aujourd'hui étudiées. Par conséquent, des simulations quantiques sont nécessaires au développement de transistors à effet de champ modernes.Le but de cette thèse de doctorat est de développer les outils appropriés à ces simulations et les utiliser pour étudier certaines des options de conception les plus importantes dans la technologie du transistor.C'est pourquoi nous avons utilisé le formalisme des fonctions de Green hors équilibre pour simuler le transport des porteurs de charge and étudier les transistors à effet de champ.Les structures de bande des semiconducteurs ont été calculées dans le cadre du formalisme k.p, mais nous avons aussi développé une méthode par pseudopotentiel atomique effectif pour effectuer des simulations pleine bande avec une variété d'ingrédients comme une orientation cristalline arbitraire, de la rugosité de surface, une composition d'alliage arbitraire dans le canal du transistor, et ainsi de suite. Cette méthode par pseudopotentiel donne des résultats précis pour un large ensemble de configurations avec un effort de paramétrage inférieur au formalisme k.p.Nous avons utilisé ces outils de simulation pour évaluer les propriétés de transport de FinFETs à base de silicium et d'InAs, en nous concentrant sur l'adaptabilité de la tension d'alimentation de dispositifs à base de III-V comparés à leurs équivalents en silicium. En particulier, nous discutons de la faisabilité de l'obtention d'un fort courant on dans les dispositifs III-V.Ensuite, nous appliquons ce formalisme à des nanofils gate-all-around (GAA) tunnel-FETs (TFETs) à base de III-V. Les tunnel-FETs sont une architecture prometteuse pour les transistors futurs, qui rencontre des problématiques d'optimisation et de performance. Nous avons pour but de faire une évaluation de l'effet de boosters technologiques sur les performances des TFETs, en particulier l'utilisation de contraintes mécaniques, et d'une hétérojonction III-V. Nous avons montré que ces boosters permettent aux TFETs de surpasser en théorie la technologie MOSFET standard, mais que la contraint induit des effets indésirables.Pour concevoir des TFETs à haute performance sans l'utilisation de la contrainte, nous avons enfin introduit un choix de conception qui exploite une gradation de la fraction molaire d'un alliage ternaire, ou alternativement un puits quantique dans la source. Ces configurations augmentent de manière dramatique la densité d'états dans le TFET à la jonction source/canal et sont donc capable d'améliorer les performances électriques des TFETs par rapport aux MOSFET conventionnels

    Exploiting Hetero-Junctions to Improve the Performance of III–V Nanowire Tunnel-FETs

    No full text
    International audienceThis paper presents full-quantum 3-D simulations predicting the electrical performance of nanowire tunnel-FETs based on III-V hetero-junctions. Our calculations exploit an eight-band k·p Hamiltonian within the nonequilibrium Green's functions formalism and include phonon scattering. It is shown that the on-current of GaSb/InAs hetero-junction tunnel-FETs is limited by quantum confinement effects on the bandstructure induced by the small nanowire diameter necessary to preserve an optimal electrostatic integrity at short gate lengths. To circumvent this problem, additional on-current improvements with no substantial subthreshold swing degradation can be achieved by engineering the source region through the insertion of an InAs/GaSb/InAs quantum well along the transport direction. Such a design option is predicted to provide on/off-current ratios larger than 10 7 even at V DD = 300 mV

    PenMRT: A PENELOPE based high-resolution dose calculation engine for microbeam radiation therapy

    No full text
    International audienceIn radiotherapy, the compromise between an increased tumoral control and reduced side effects is quantified as the therapeutic index. In order to increase the therapeutic index, the use of synchrotron generated x-rays spatially fractionated into microbeams is being studied. The treatment in micorbeam radiation therapy (MRT) is performed via an array of intense parallel microbeams (25-50 μm wide beams replicated with a pitch of 200-400 μm) at high dose rate to benefits from the improved healthy tissues due to the dose-volume effect and FLASH effect. The promising preclinical results of MRT[1], encourages its clinical transfer. A safe clinical transfer of MRT needs an adequate treatment planning system (TPS). The calculation core of this TPS must be able to calculate accurately the dose in human or animal patient by taking into account the MRT beam specificities (high dose gradients, spatial fractionation and polarization effect). The most advanced dose calculation engine for MRT is a hybrid algorithm which inherits photon transport from Monte-Carlo (MC) method and electron transport from convolution based methods [2]. This algorithm is remarkably fast but limited to macroscopic rendering of dose without taking into account the complexity of dose distribution in multidirectional treatments. To overcome the limitations of the existing calculation methods, a multi-scale full MC dose calculation engine called penMRT has been developed and benchmarked against an already validated main program in the PENELOPE code package

    A high resolution dose calculation engine for x‐ray microbeams radiation therapy

    No full text
    International audienceBackgroundMicrobeam radiation therapy (MRT) is a treatment modality based on spatial fractionation of synchrotron generated x-rays into parallel, high dose, microbeams of a few microns width. MRT is still an under-development radiosurgery technique for which, promising preclinical results on brain tumors and epilepsy encourages its clinical transfer.PurposeA safe clinical transfer of MRT needs a specific treatment planning system (TPS) that provides accurate dose calculations in human patients, taking into account the MRT beams properties (high dose gradients, spatial fractionation, polarization effects). So far, the most advanced MRT treatment planning system, based on a hybrid dose calculation algorithm, is limited to a macroscopic rendering of the dose and does not account for the complex dose distribution inherent to MRT if delivered as conformal irradiations with multiple incidences. For overcoming these limitations, a multi-scale full Monte-Carlo calculation engine called penMRT has been developed and benchmarked against two general purpose Monte Carlo codes: penmain based on PENELOPE and Gate based on Geant4.MethodsPenMRT, is based on the PENELOPE (2018) Monte Carlo (MC) code, modified to take into account the voxelized geometry of the patients (CT-scans) and offering an adaptive micrometric dose calculation grid independent to the CT size, location and orientation. The implementation of the dynamic memory allocation in penMRT, makes the simulations feasible within a huge number of dose scoring bins. The possibility of using a source replication approach to simulate arrays of microbeams, and the parallelization using OpenMPI have been added to penMRT in order to increase the calculation speed for clinical usages. This engine can be implemented in a TPS as a dose calculation core.ResultsThe performance tests highlight the reliability of penMRT to be used for complex irradiation conditions in MRT. The benchmarking against a standard PENELOPE code did not show any significant difference for calculations in centimetric beams, for a single microbeam and for a microbeam array. The comparisons between penMRT and Gate as an independent MC code did not show any difference in the beam paths, whereas in valley regions, relative differences between the two codes rank from 1 to 7.5% which are probably due to the differences in physics lists that are used in these two codes. The reliability of the source replication approach has also been tested and validated with an underestimation of no more than 0.6% in low dose areas.ConclusionsGood agreements (a relative difference between 0 to 8%) were found when comparing calculated peak to valley dose ratio (PVDR) values using penMRT, for irradiations with a full microbeam array, with calculated values in the literature. The high-resolution calculated dose maps obtained with penMRT are used to extract differential and cumulative dose-volume histograms (DVHs) and analyze treatment plans with much finer metrics regarding the irradiation complexity. To our knowledge, these are the first high-resolution dose maps and associated DVHs ever obtained for cross-fired microbeams irradiation, which is bringing a significant added value to the field of treatment planning in spatially fractionated radiation therapy

    A high resolution dose calculation engine for x‐ray microbeams radiation therapy

    No full text
    International audienceBackgroundMicrobeam radiation therapy (MRT) is a treatment modality based on spatial fractionation of synchrotron generated x-rays into parallel, high dose, microbeams of a few microns width. MRT is still an under-development radiosurgery technique for which, promising preclinical results on brain tumors and epilepsy encourages its clinical transfer.PurposeA safe clinical transfer of MRT needs a specific treatment planning system (TPS) that provides accurate dose calculations in human patients, taking into account the MRT beams properties (high dose gradients, spatial fractionation, polarization effects). So far, the most advanced MRT treatment planning system, based on a hybrid dose calculation algorithm, is limited to a macroscopic rendering of the dose and does not account for the complex dose distribution inherent to MRT if delivered as conformal irradiations with multiple incidences. For overcoming these limitations, a multi-scale full Monte-Carlo calculation engine called penMRT has been developed and benchmarked against two general purpose Monte Carlo codes: penmain based on PENELOPE and Gate based on Geant4.MethodsPenMRT, is based on the PENELOPE (2018) Monte Carlo (MC) code, modified to take into account the voxelized geometry of the patients (CT-scans) and offering an adaptive micrometric dose calculation grid independent to the CT size, location and orientation. The implementation of the dynamic memory allocation in penMRT, makes the simulations feasible within a huge number of dose scoring bins. The possibility of using a source replication approach to simulate arrays of microbeams, and the parallelization using OpenMPI have been added to penMRT in order to increase the calculation speed for clinical usages. This engine can be implemented in a TPS as a dose calculation core.ResultsThe performance tests highlight the reliability of penMRT to be used for complex irradiation conditions in MRT. The benchmarking against a standard PENELOPE code did not show any significant difference for calculations in centimetric beams, for a single microbeam and for a microbeam array. The comparisons between penMRT and Gate as an independent MC code did not show any difference in the beam paths, whereas in valley regions, relative differences between the two codes rank from 1 to 7.5% which are probably due to the differences in physics lists that are used in these two codes. The reliability of the source replication approach has also been tested and validated with an underestimation of no more than 0.6% in low dose areas.ConclusionsGood agreements (a relative difference between 0 to 8%) were found when comparing calculated peak to valley dose ratio (PVDR) values using penMRT, for irradiations with a full microbeam array, with calculated values in the literature. The high-resolution calculated dose maps obtained with penMRT are used to extract differential and cumulative dose-volume histograms (DVHs) and analyze treatment plans with much finer metrics regarding the irradiation complexity. To our knowledge, these are the first high-resolution dose maps and associated DVHs ever obtained for cross-fired microbeams irradiation, which is bringing a significant added value to the field of treatment planning in spatially fractionated radiation therapy

    An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

    No full text
    Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is 27×27\times faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and 1.34×1.34\times faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is 2.8×2.8\times and 3.2×3.2\times than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems
    corecore