4 research outputs found
GPU NTC Process Variation Compensation with Voltage Stacking
Near-threshold computing (NTC) has the potential to significantly improve efficiency in high throughput architectures, such as general-purpose computing on graphic processing unit (GPGPU). Nevertheless, NTC is more sensitive to process variation (PV) as it complicates power delivery. We propose GPU stacking, a novel method based on voltage stacking, to manage the effects of PV and improve the power delivery simultaneously. To evaluate our methodology, we first explore the design space of GPGPUs in the NTC to find a suitable baseline configuration and then apply GPU stacking to mitigate the effects of PV. When comparing with an equivalent NTC GPGPU without PV management, we achieve 37% more performance on average. When considering high production volume, our approach shifts all the chips closer to the nominal non-PV case, delivering on average (across chips) Ëś80 % of the performance of nominal NTC GPGPU, whereas when not using our technique, chips would have Ëś50 % of the nominal performance. We also show that our approach can be applied on top of multifrequency domain designs, improving the overall performance
Project of a quantum coprocessor for crytographic algorithms optimization.
A descoberta do algoritmo de Shor, para a fatoração de inteiros em tempo polinomial, motivou esforços rumo a implementação de um computador quântico. Ele Ă© capaz de quebrar os principais criptossistemas de chave pĂşblica usados hoje (RSA e baseados em curvas elĂpticas). Estes fornecem diversos serviços de segurança, tais como confidencialidade e integridade dos dados e autenticação da fonte, alĂ©m de possibilitar a distribuição de uma chave simĂ©trica de sessĂŁo. Para quebrar estes criptossistemas, um computador quântico grande (2000 qubits) Ă© necessário. Todavia, alternativas começaram a ser investigadas. As primeiras respostas vieram da prĂłpria mecânica quântica. Apesar das propriedades interessantes encontradas na criptografia quântica, um criptossistema completo parece inatingĂvel, principalmente devido as assinaturas digitais, essenciais para a autenticação. Foram entĂŁo propostos criptossitemas baseadas em problemas puramente clássicos que (acredita-se) nĂŁo sĂŁo tratáveis por computadores quânticos, que sĂŁo chamadas de pĂłs-quânticas. Estes sistemas ainda sofrem da falta de praticidade, seja devido ao tamanho das chaves ou ao tempo de processamento. Dentre os criptossistemas pĂłs-quânticos, destacam-se o McEliece e o Niederreiter. Por si sĂł, nenhum deles prevĂŞ assinaturas digitais, no entanto, as assinaturas CFS foram propostas, complementandos. Ainda que computadores quânticos de propĂłsito geral estejam longe de nossa realidade, Ă© possĂvel imaginar um circuito quântico pequeno e dedicado. A melhoria trazida por ele seria a diferença necessária para tornar essas assinaturas práticas em um cenário legitimamente pĂłs-quântico. Neste trabalho, uma arquitetura hĂbrida quântica/clássica Ă© proposta para acelerar algoritmos criptográficos pĂłs-quânticos. Dois coprocessadores quânticos, implementando a busca de Grover, sĂŁo propostos: um para auxiliar o processo de decodificação de cĂłdigos de Goppa, no contexto do criptossistema McEliece; outro para auxiliar na busca por sĂndromes decodificáveis, no contexto das assinaturas CFS. Os resultados mostram que em alguns casos, o uso de um coprocessador quântico permite ganhos de atĂ© 99; 7% no tamanho da chave e atĂ© 76; 2% em tempo de processamento. Por se tratar de um circuito especĂfico, realizando uma função bem especĂfica, Ă© possĂvel manter um tamanho compacto (300 qubits, dependendo do que Ă© acelerado), mostrando adicionalmente que, caso computadores quânticos venham a existir, eles viabilizarĂŁo os criptossistemas pĂłs-quânticos antes de quebrar os criptossistemas prĂ©-quânticos. Adicionalmente, algumas tecnologias de implementação de computadores quânticos sĂŁo estudadas, com especial enfoque na Ăłptica linear e nas tecnologias baseadas em silĂcio. Este estudo busca avaliar a viabilidade destas tecnologias como potenciais candidatas Ă construção de um computador quântico completo e de caráter pessoal.The discovery of the Shor algorithm, which allows polynomial time factoring of integers, motivated efforts towards the implementation of a quantum computer. It is capable of breaking the main current public key cryptosystems used today (RSA and those based on elliptic curves). Those provide a set of security services, such as data confidentiality and integrity and source authentication, and also the distribution of a symmetric session key. To break those cryptosystem, a large quantum computer (2000 qubits) is needed. Nevertheless, cryptographers have started to look for alternatives. Some of which came from quantum mechanics itself. Despite some interesting properties found on quantum cryptography, a complete cryptosystem seems intangible, specially because of digital signatures, necessary to achieve authentication. Cryptosystems based on purely classical problems which are (believed) not treatable by quantum computers, called post-quantum, have them been proposed. Those systems still lacks of practicality, either because of the key size or the processing time. Among those post-quantum cryptosystems, specially the code based ones, the highlights are the McEliece and the Niederreiter cryptosystems. Per se, none of these provides digital signatures, but, the CFS signatures have been proposed, as a complement to them. Even if general purpose quantum computers are still far from our reality, it is possible to imagine a small dedicated quantum circuit. The benefits brought by it could make the deference to allow those signatures, in a truly post-quantum scenario. In this work, a quantum/classical hybrid architecture is proposed to accelerate post-quantum cryptographic algorithms. Two quantum coprocessors, implementing the Grover search, are proposed: one to assist the decoding process of Goppa codes, in the context of the McEliece and Niederreiter cryptosystems; another to assist the search for decodable syndromes, in the context of the CFS digital signatures. The results show that, for some cases, the use of the quantum coprocessor allows up to 99; 7% reduction in the key size and up to 76; 2% acceleration in the processing time. As a specific circuit, dealing with a well defined function, it is possible to keep a small size (300 qubits), depending on what is accelerated), showing that, if quantum computers come to existence, they will make post-quantum cryptosystems practical before breaking the current cryptosystems. Additionally, some implementation technologies of quantum computers are studied, in particular linear optics and silicon based technologies. This study aims to evaluate the feasibility of those technologies as potential candidates to the construction of a complete and personal quantum computer
Improving the Productivity of Hardware Design
Current hardware development techniques contrast with agile methods that became popular in modern software development. This has been mitigated with technology scaling, when performance gains for every generation relied mostly on transistor shrinking. However, the end of Dennard’s scaling, the limitations in multicore design and with hardware accelerators emerging as an alternative to improve performance, hardware design has become an important bottleneck for chip developers. This is particularly important as application domain experts, who are not hardware designers, turn to hardware accelerators to make new technologies viable. In this dissertation, I discuss efforts to improve hardware design productivity: improving pipeline design and reducing synthesis runtime. Pipeline configuration is typically set very early in the design phase, which make changes costly. I proposed Fluid Pipelines, a novel design style that allows for changes in the number of pipeline stages late in the design cycle. To accurately evaluate the impact of pipeline changes, a designer needs to wait for synthesis results. I also proposed LiveSynth and SMatch, two incremental techniques that re-use existing synthesis results to drastically reduce synthesis time. Combined with work from others, I expect these techniques to ease design overhead and improve the adoption of domain specific hardware