75 research outputs found

    The AMD Rome Memory Barrier

    Full text link
    With the rapid growth of AMD as a competitor in the CPU industry, it is imperative that high-performance and architectural engineers analyze new AMD CPUs. By understanding new and unfamiliar architectures, engineers are able to adapt their algorithms to fully utilize new hardware. Furthermore, engineers are able to anticipate the limitations of an architecture and determine when an alternate platform is desirable for a particular workload. This paper presents results which show that the AMD "Rome" architecture performance suffers once an application's memory bandwidth exceeds 37.5 GiB/s for integer-heavy applications, or 100 GiB/s for floating-point-heavy workloads. Strong positive correlations between memory bandwidth and CPI are presented, as well as strong positive correlations between increased memory load and time-to-completion of benchmarks from the SPEC CPU2017 benchmark suites.Comment: Very, very early draft for IEEE SoutheastCon 2017, 9 pages (need to get down to 8), 6 figures, 7 table

    AMD Prefetch Attacks through Power and Time

    Get PDF
    Modern operating systems fundamentally rely on the strict isolation of user applications from the kernel. This isolation is enforced by the hardware. On Intel CPUs, this isolation has been shown to be imperfect, for instance, with the prefetch side-channel. With Meltdown, it was even completely circumvented. Both the prefetch side channel and Meltdown have been mitigated with the same software patch on Intel. As AMD is believed to be not vulnerable to these attacks, this software patch is not active by default on AMD CPUs. In this paper, we show that the isolation on AMD CPUs suffers from the same type of side-channel leakage. We discover timing and power variations of the prefetch instruction that can be observed from unprivileged user space. In contrast to previous work on prefetch attacks on Intel, we show that the prefetch instruction on AMD leaks even more information. We demonstrate the significance of this side channel with multiple case studies in real-world scenarios. We demonstrate the first microarchitectural break of (fine-grained) KASLR on AMD CPUs. We monitor kernel activity, e.g., if audio is played over Bluetooth, and establish a covert channel. Finally, we even leak kernel memory with 52.85 B/s with simple Spectre gadgets in the Linux kernel. We show that stronger page table isolation should be activated on AMD CPUs by default to mitigate our presented attacks successfully

    faulTPM: Exposing AMD fTPMs' Deepest Secrets

    Full text link
    Trusted Platform Modules constitute an integral building block of modern security features. Moreover, as Windows 11 made a TPM 2.0 mandatory, they are subject to an ever-increasing academic challenge. While discrete TPMs - as found in higher-end systems - have been susceptible to attacks on their exposed communication interface, more common firmware TPMs (fTPMs) are immune to this attack vector as they do not communicate with the CPU via an exposed bus. In this paper, we analyze a new class of attacks against fTPMs: Attacking their Trusted Execution Environment can lead to a full TPM state compromise. We experimentally verify this attack by compromising the AMD Secure Processor, which constitutes the TEE for AMD's fTPMs. In contrast to previous dTPM sniffing attacks, this vulnerability exposes the complete internal TPM state of the fTPM. It allows us to extract any cryptographic material stored or sealed by the fTPM regardless of authentication mechanisms such as Platform Configuration Register validation or passphrases with anti-hammering protection. First, we demonstrate the impact of our findings by - to the best of our knowledge - enabling the first attack against Full Disk Encryption solutions backed by an fTPM. Furthermore, we lay out how any application relying solely on the security properties of the TPM - like Bitlocker's TPM- only protector - can be defeated by an attacker with 2-3 hours of physical access to the target device. Lastly, we analyze the impact of our attack on FDE solutions protected by a TPM and PIN strategy. While a naive implementation also leaves the disk completely unprotected, we find that BitLocker's FDE implementation withholds some protection depending on the complexity of the used PIN. Our results show that when an fTPM's internal state is compromised, a TPM and PIN strategy for FDE is less secure than TPM-less protection with a reasonable passphrase.Comment: *Both authors contributed equally. We publish all code necessary to mount the attack under https://github.com/PSPReverse/ftpm_attack. The repository further includes several intermediate results, e.g., flash memory dumps, to retrace the attack process without possessing the target boards and required hardware tool

    Equity research –advanced micro devices (AMD)

    Get PDF
    Advanced Micro Devices (AMD) is a Silicon Valley-born semiconductor company,that had its IPO in 1972. Despiteits rough past, in the most recent years,AMD has beenable togaincompetitive advantage in relation to its industry peers,as well asmarket sharedue to its continuous innovative products’ lines of EPYC and Ryzen processors, and Radeon graphics.Furthermore, recentlyestablished partnerships and launched products give the company good growth prospects, in the near future.For these reasons an Equity Research Report was conducted on AMD, in order to get to a fair value of the stock.AMD’s valuationin the reportwas assessed through the Discounted Cash Flow (DCF)method, considering various factors that could affect the company’s financial statement lineitems.Among those factors are the Average Selling Price of its products, Units Shipments, Gross Margin, investments in innovation (R&D), and several partnerships alongside with the company’s Market Share.A scenario analysison the effects of the tariffs, comingfrom the Trade War,was also conductedon AMD’s price.The price target arrived for December 31, 2020 was $52.75, leading to a BUYinvestment recommendation, considering normal market conditions

    Програмно-апаратний комплекс для задач лінійної алгебри

    Get PDF
    Даний бакалаврський дипломний проєкт присвячено розробці програмно-апаратного комплексу для задач лінійної алгебри. Комплекс спроектовано на основі 12-ядерного процесора AMD Ryzen 9 3900X, що підтримує бажаний рівень обчислювальної потужності, необхідний для роботи із векторами та матрицями великих розмірностей. Програми для комплексу написано на мові програмування С із використанням PThreads. Цей підхід дозволяє виконувати розроблені програми на UNIX-подібних операційних системах та ефективно використовувати обчислювальні ресурси.This bachelor’s degree project is devoted to the development of software and hardware system for linear algebra problems. The system is based on 12-core AMD Ryzen 9 3900X processor. This processor supports the desired level of computing power needed to work with vectors and matrices of large sizes. The programs for the system are written in C programming language using the PThreads. This approach allows to execute the developed programs on UNIX-based operating systems and use the computing resources efficiently

    Energy Concerns with HPC Systems and Applications

    Full text link
    For various reasons including those related to climate changes, {\em energy} has become a critical concern in all relevant activities and technical designs. For the specific case of computer activities, the problem is exacerbated with the emergence and pervasiveness of the so called {\em intelligent devices}. From the application side, we point out the special topic of {\em Artificial Intelligence}, who clearly needs an efficient computing support in order to succeed in its purpose of being a {\em ubiquitous assistant}. There are mainly two contexts where {\em energy} is one of the top priority concerns: {\em embedded computing} and {\em supercomputing}. For the former, power consumption is critical because the amount of energy that is available for the devices is limited. For the latter, the heat dissipated is a serious source of failure and the financial cost related to energy is likely to be a significant part of the maintenance budget. On a single computer, the problem is commonly considered through the electrical power consumption. This paper, written in the form of a survey, we depict the landscape of energy concerns in computer activities, both from the hardware and the software standpoints.Comment: 20 page

    Balancer: bandwidth allocation and cache partitioning for multicore processors

    Get PDF
    The management of shared resources in multicore processors is an open problem due to the continuous evolution of these systems. The trend toward increasing the number of cores and organizing them in clusters sets out new challenges not considered in previous works. In this paper, we characterize the use of the shared cache and memory bandwidth of an AMD Rome processor executing multiprogrammed workloads and propose several mechanisms that control the use of these resources to improve the system performance and fairness. Our control mechanisms require no hardware or operating system modifications. We evaluate Balancer on a real system running SPEC CPU2006 and CPU2017 applications. Balancer tuned for performance shows an average increase of 7.1% in system performance and an unfairness reduction of 18.6% with respect to a system without any control mechanism. Balancer tuned for fairness decreases the performance by 1.3% in exchange for a 64.5% reduction of unfairness

    Precise event sampling on AMD versus intel: quantitative and qualitative comparison

    Get PDF
    Precise event sampling is a profiling feature in commodity processors that can sample hardware events and accurately locate the instructions that trigger the events. This feature has been used in a large number of tools to detect application performance issues. Although precise event sampling is readily supported in modern multicore architectures, vendor supports exhibit great differences that affect their accuracy, stability, overhead, and functionality. This work presents the most comprehensive study to date on benchmarking the event sampling features of Intel PEBS and AMD IBS and performs in-depth analysis on key differences through series of microbenchmarks. Our qualitative and quantitative analysis shows that PEBS allows finer-grained and more accurate sampling of hardware events, while IBS offers richer set of information at each sample though it suffers from lower accuracy and stability. Moreover, OS signal delivery, which is a common method used by the profiling software, introduces significant time overhead to the original overhead incurred by the hardware mechanisms in both PEBS and IBS. We also found that both PEBS and IBS have bias in sampling events across multiple different locations in a code. Lastly, we demonstrate how our findings on microbenchmarks under different thread counts hold for a full-fledged profiling tool that runs on the state-of-the-art Intel and AMD machines. Overall our detailed comparisons serve as a great reference and provide invaluable information for hardware designers and profiling tool developers

    A Prototype Adaptive Optics Real-Time Control Architecture for Extremely Large Telescopes using Many-Core CPUs

    Get PDF
    A proposed solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control (RTC) using many-core CPU technologies is presented. Due to the nearly 4x increase in primary aperture diameter the next generation of 30-40m class ELTs will require much greater computational power than the current 10m class of telescopes. The computational demands of AO RTC scale to the fourth power of telescope diameter to maintain the spatial sampling required for adequate atmospheric correction. The Intel Xeon Phi is a standard socketed CPU processor which combines many (450GB/s) on-chip high bandwidth memory, properties which are perfectly suited to the highly parallelisable and memory bandwidth intensive workloads of ELT-scale AO RTC. Performance of CPU-based RTC software is analysed and compared for the single conjugate, multi conjugate and laser tomographic types of AO operating on the Xeon Phi and other many-core CPU solutions. This report concludes with an investigation into the potential performance of the CPU-based AO RTC software for the proposed instruments of the next generation Extremely Large Telescope (ELT) and the Thirty Meter Telescope (TMT) and also for some high order AO systems at current observatories
    corecore