315 research outputs found

    Efficient and Robust Simulation, Modeling and Characterization of IC Power Delivery Circuits

    Get PDF
    As the Moore’s Law continues to drive IC technology, power delivery has become one of the most difficult design challenges. Two of the major components in power delivery are DC-DC converters and power distribution networks, both of which are time-consuming to simulate and characterize using traditional approaches. In this dissertation, we propose a complete set of solutions to efficiently analyze DC-DC converters and power distribution networks by finding a perfect balance between efficiency and accuracy. To tackle the problem, we first present a novel envelope following method based on a numerically robust time-delayed phase condition to track the envelopes of circuit states under a varying switching frequency. By adopting three fast simulation techniques, our proposed method achieves higher speedup without comprising the accuracy of the results. The robustness and efficiency of the proposed method are demonstrated using several DCDC converter and oscillator circuits modeled using the industrial standard BSIM4 transistor models. A significant runtime speedup of up to 30X with respect to the conventional transient analysis is achieved for several DC-DC converters with strong nonlinear switching characteristics. We then take another approach, average modeling, to enhance the efficiency of analyzing DC-DC converters. We proposed a multi-harmonic model that not only predicts the DC response but also captures the harmonics of arbitrary degrees. The proposed full-order model retains the inductor current as a state variable and accurately captures the circuit dynamics even in the transient state. Furthermore, by continuously monitoring state variables, our model seamlessly transitions between continuous conduction mode and discontinuous conduction mode. The proposed model, when tested with a system decoupling technique, obtains up to 10X runtime speedups over transistor-level simulations with a maximum output voltage error that never exceeds 4%. Based on the multi-harmonic averaged model, we further developed the small-signal model that provides a complete characterization of both DC averages and higher-order harmonic responses. The proposed model captures important high-frequency overshoots and undershoots of the converter response, which are otherwise unaccounted for by the existing techniques. In two converter examples, the proposed model corrects the misleading results of the existing models by providing the truthful characterization of the overall converter AC response and offers important guidance for converter design and closed-loop control. To address the problem of time-consuming simulation of power distribution networks, we present a partition-based iterative method by integrating block-Jacobi method with support graph method. The former enjoys the ease of parallelization, however, lacks a direct control of the numerical properties of the produced partitions. In contrast, the latter operates on the maximum spanning tree of the circuit graph, which is optimized for fast numerical convergence, but is bottlenecked by its difficulty of parallelization. In our proposed method, the circuit partitioning is guided by the maximum spanning tree of the underlying circuit graph, offering essential guidance for achieving fast convergence. The resulting block-Jacobi-like preconditioner maximizes the numerical benefit inherited from support graph theory while lending itself to straightforward parallelization as a partitionbased method. The experimental results on IBM power grid suite and synthetic power grid benchmarks show that our proposed method speeds up the DC simulation by up to 11.5X over a state-of-the-art direct solver

    Efficient and Robust Simulation, Modeling and Characterization of IC Power Delivery Circuits

    Get PDF
    As the Moore’s Law continues to drive IC technology, power delivery has become one of the most difficult design challenges. Two of the major components in power delivery are DC-DC converters and power distribution networks, both of which are time-consuming to simulate and characterize using traditional approaches. In this dissertation, we propose a complete set of solutions to efficiently analyze DC-DC converters and power distribution networks by finding a perfect balance between efficiency and accuracy. To tackle the problem, we first present a novel envelope following method based on a numerically robust time-delayed phase condition to track the envelopes of circuit states under a varying switching frequency. By adopting three fast simulation techniques, our proposed method achieves higher speedup without comprising the accuracy of the results. The robustness and efficiency of the proposed method are demonstrated using several DCDC converter and oscillator circuits modeled using the industrial standard BSIM4 transistor models. A significant runtime speedup of up to 30X with respect to the conventional transient analysis is achieved for several DC-DC converters with strong nonlinear switching characteristics. We then take another approach, average modeling, to enhance the efficiency of analyzing DC-DC converters. We proposed a multi-harmonic model that not only predicts the DC response but also captures the harmonics of arbitrary degrees. The proposed full-order model retains the inductor current as a state variable and accurately captures the circuit dynamics even in the transient state. Furthermore, by continuously monitoring state variables, our model seamlessly transitions between continuous conduction mode and discontinuous conduction mode. The proposed model, when tested with a system decoupling technique, obtains up to 10X runtime speedups over transistor-level simulations with a maximum output voltage error that never exceeds 4%. Based on the multi-harmonic averaged model, we further developed the small-signal model that provides a complete characterization of both DC averages and higher-order harmonic responses. The proposed model captures important high-frequency overshoots and undershoots of the converter response, which are otherwise unaccounted for by the existing techniques. In two converter examples, the proposed model corrects the misleading results of the existing models by providing the truthful characterization of the overall converter AC response and offers important guidance for converter design and closed-loop control. To address the problem of time-consuming simulation of power distribution networks, we present a partition-based iterative method by integrating block-Jacobi method with support graph method. The former enjoys the ease of parallelization, however, lacks a direct control of the numerical properties of the produced partitions. In contrast, the latter operates on the maximum spanning tree of the circuit graph, which is optimized for fast numerical convergence, but is bottlenecked by its difficulty of parallelization. In our proposed method, the circuit partitioning is guided by the maximum spanning tree of the underlying circuit graph, offering essential guidance for achieving fast convergence. The resulting block-Jacobi-like preconditioner maximizes the numerical benefit inherited from support graph theory while lending itself to straightforward parallelization as a partitionbased method. The experimental results on IBM power grid suite and synthetic power grid benchmarks show that our proposed method speeds up the DC simulation by up to 11.5X over a state-of-the-art direct solver

    Where Quantum Complexity Helps Classical Complexity

    Full text link
    Scientists have demonstrated that quantum computing has presented novel approaches to address computational challenges, each varying in complexity. Adapting problem-solving strategies is crucial to harness the full potential of quantum computing. Nonetheless, there are defined boundaries to the capabilities of quantum computing. This paper concentrates on aggregating prior research efforts dedicated to solving intricate classical computational problems through quantum computing. The objective is to systematically compile an exhaustive inventory of these solutions and categorize a collection of demanding problems that await further exploration

    Enhancing Program Soft Error Resilience through Algorithmic Approaches

    Get PDF
    The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resilient to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithm-specific error detection and tolerance techniques. Effective use of such techniques requires a detailed understanding of how a given application is affected by soft faults to ensure that (i) efforts to improve application resilience are spent in the code regions most vulnerable to faults, (ii) the appropriate resilience techniques is applied to each code region, and (iii) the understanding be obtained in an efficient manner. This thesis presents two tools: FaultTelescope helps application developers view the routine and application vulnerability to soft errors while ErrorSight helps perform modular fault characteristics analysis for more complex applications. This thesis also illustrates how these tools can be used in the context of representative applications and kernels. In addition to providing actionable insights into application behavior, the tools automatically selects the number of fault injection experiments required to efficiently generation error profiles of an application, ensuring that the information is statistically well-grounded without performing unnecessary experiments

    Combined optimization algorithms applied to pattern classification

    Get PDF
    Accurate classification by minimizing the error on test samples is the main goal in pattern classification. Combinatorial optimization is a well-known method for solving minimization problems, however, only a few examples of classifiers axe described in the literature where combinatorial optimization is used in pattern classification. Recently, there has been a growing interest in combining classifiers and improving the consensus of results for a greater accuracy. In the light of the "No Ree Lunch Theorems", we analyse the combination of simulated annealing, a powerful combinatorial optimization method that produces high quality results, with the classical perceptron algorithm. This combination is called LSA machine. Our analysis aims at finding paradigms for problem-dependent parameter settings that ensure high classifica, tion results. Our computational experiments on a large number of benchmark problems lead to results that either outperform or axe at least competitive to results published in the literature. Apart from paxameter settings, our analysis focuses on a difficult problem in computation theory, namely the network complexity problem. The depth vs size problem of neural networks is one of the hardest problems in theoretical computing, with very little progress over the past decades. In order to investigate this problem, we introduce a new recursive learning method for training hidden layers in constant depth circuits. Our findings make contributions to a) the field of Machine Learning, as the proposed method is applicable in training feedforward neural networks, and to b) the field of circuit complexity by proposing an upper bound for the number of hidden units sufficient to achieve a high classification rate. One of the major findings of our research is that the size of the network can be bounded by the input size of the problem and an approximate upper bound of 8 + √2n/n threshold gates as being sufficient for a small error rate, where n := log/SL and SL is the training set

    Apprentissage machine efficace : théorie et pratique

    Full text link
    Malgré des progrès constants en termes de capacité de calcul, mémoire et quantité de données disponibles, les algorithmes d'apprentissage machine doivent se montrer efficaces dans l'utilisation de ces ressources. La minimisation des coûts est évidemment un facteur important, mais une autre motivation est la recherche de mécanismes d'apprentissage capables de reproduire le comportement d'êtres intelligents. Cette thèse aborde le problème de l'efficacité à travers plusieurs articles traitant d'algorithmes d'apprentissage variés : ce problème est vu non seulement du point de vue de l'efficacité computationnelle (temps de calcul et mémoire utilisés), mais aussi de celui de l'efficacité statistique (nombre d'exemples requis pour accomplir une tâche donnée). Une première contribution apportée par cette thèse est la mise en lumière d'inefficacités statistiques dans des algorithmes existants. Nous montrons ainsi que les arbres de décision généralisent mal pour certains types de tâches (chapitre 3), de même que les algorithmes classiques d'apprentissage semi-supervisé à base de graphe (chapitre 5), chacun étant affecté par une forme particulière de la malédiction de la dimensionalité. Pour une certaine classe de réseaux de neurones, appelés réseaux sommes-produits, nous montrons qu'il peut être exponentiellement moins efficace de représenter certaines fonctions par des réseaux à une seule couche cachée, comparé à des réseaux profonds (chapitre 4). Nos analyses permettent de mieux comprendre certains problèmes intrinsèques liés à ces algorithmes, et d'orienter la recherche dans des directions qui pourraient permettre de les résoudre. Nous identifions également des inefficacités computationnelles dans les algorithmes d'apprentissage semi-supervisé à base de graphe (chapitre 5), et dans l'apprentissage de mélanges de Gaussiennes en présence de valeurs manquantes (chapitre 6). Dans les deux cas, nous proposons de nouveaux algorithmes capables de traiter des ensembles de données significativement plus grands. Les deux derniers chapitres traitent de l'efficacité computationnelle sous un angle différent. Dans le chapitre 7, nous analysons de manière théorique un algorithme existant pour l'apprentissage efficace dans les machines de Boltzmann restreintes (la divergence contrastive), afin de mieux comprendre les raisons qui expliquent le succès de cet algorithme. Finalement, dans le chapitre 8 nous présentons une application de l'apprentissage machine dans le domaine des jeux vidéo, pour laquelle le problème de l'efficacité computationnelle est relié à des considérations d'ingénierie logicielle et matérielle, souvent ignorées en recherche mais ô combien importantes en pratique.Despite constant progress in terms of available computational power, memory and amount of data, machine learning algorithms need to be efficient in how they use them. Although minimizing cost is an obvious major concern, another motivation is to attempt to design algorithms that can learn as efficiently as intelligent species. This thesis tackles the problem of efficient learning through various papers dealing with a wide range of machine learning algorithms: this topic is seen both from the point of view of computational efficiency (processing power and memory required by the algorithms) and of statistical efficiency (n umber of samples necessary to solve a given learning task).The first contribution of this thesis is in shedding light on various statistical inefficiencies in existing algorithms. Indeed, we show that decision trees do not generalize well on tasks with some particular properties (chapter 3), and that a similar flaw affects typical graph-based semi-supervised learning algorithms (chapter 5). This flaw is a form of curse of dimensionality that is specific to each of these algorithms. For a subclass of neural networks, called sum-product networks, we prove that using networks with a single hidden layer can be exponentially less efficient than when using deep networks (chapter 4). Our analyses help better understand some inherent flaws found in these algorithms, and steer research towards approaches that may potentially overcome them. We also exhibit computational inefficiencies in popular graph-based semi-supervised learning algorithms (chapter 5) as well as in the learning of mixtures of Gaussians with missing data (chapter 6). In both cases we propose new algorithms that make it possible to scale to much larger datasets. The last two chapters also deal with computational efficiency, but in different ways. Chapter 7 presents a new view on the contrastive divergence algorithm (which has been used for efficient training of restricted Boltzmann machines). It provides additional insight on the reasons why this algorithm has been so successful. Finally, in chapter 8 we describe an application of machine learning to video games, where computational efficiency is tied to software and hardware engineering constraints which, although often ignored in research papers, are ubiquitous in practice

    Quantum Algorithm Implementations for Beginners

    Full text link
    As quantum computers become available to the general public, the need has arisen to train a cohort of quantum programmers, many of whom have been developing classical computer programs for most of their careers. While currently available quantum computers have less than 100 qubits, quantum computing hardware is widely expected to grow in terms of qubit count, quality, and connectivity. This review aims to explain the principles of quantum programming, which are quite different from classical programming, with straightforward algebra that makes understanding of the underlying fascinating quantum mechanical principles optional. We give an introduction to quantum computing algorithms and their implementation on real quantum hardware. We survey 20 different quantum algorithms, attempting to describe each in a succinct and self-contained fashion. We show how these algorithms can be implemented on IBM's quantum computer, and in each case, we discuss the results of the implementation with respect to differences between the simulator and the actual hardware runs. This article introduces computer scientists, physicists, and engineers to quantum algorithms and provides a blueprint for their implementations

    Design Techniques for Energy-Quality Scalable Digital Systems

    Get PDF
    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing “dynamic” systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than “static” solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff

    A cluster algorithm for graphs

    Get PDF
    A cluster algorithm for graphs called the emph{Markov Cluster algorithm (MCL~algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL~process. The graphs may be both weighted (with nonnegative weight) and directed. Let~GG~be such a graph. The MCL~algorithm simulates flow in GG by first identifying GG in a canonical way with a Markov graph G1G_1. Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs G_{(i). Flow expansion corresponds with taking the~k^{th~power of a stochastic matrix, where~kinNkinN. Flow contraction corresponds with a parametrized operator~GammarGamma_r, rgeq0rgeq 0, which maps the set of (column) stochastic matrices onto itself. The image~GammarMGamma_r M is obtained by raising each entry in~MM to the~r^{th~power and rescaling each column to have sum~11 again. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The invariant limits of the process are easily derived and in practice the process converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph~GG. Overlap is limited to cases where the input graph has a symmetric structure inducing it. The contraction and expansion parameters of the MCL~process influence the granularity of the output. The algorithm is space and time efficient and lends itself to drastic scaling. This report describes the MCL~algorithm and process, convergence towards equilibrium states, interpretation of the states as clusterings, and implementation and scalability. The algorithm is introduced by first considering several related proposals towards graph clustering, of both combinatorial and probabilistic nature. Revised version of the report~[1]. A more mathematically oriented account on the MCL~process is given in~[2], establishing that under certain weak conditions the iterands of the MCL~process posses structure admitting a cluster interpretation. Various experiments conducted on a wide range of test-graphs are described in~[3]. The latter report also describes a generic graph clustering performance measure and a distance defined on the space of partitions. The work was carried out under project INS-3.2, Concept Building from Key-Phrases in Scientific Documents and Bottom Up Classification Methods in Mathematics. [1] A new cluster algorithm for graphs. Technical report INS-R9814, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 1998. [2] A stochastic uncoupling process for graphs. Technical report INS-R0011, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000. [3] Performance criteria for graph clustering and Markov cluster experiments. Technical report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000
    corecore