17 research outputs found

    Interleaving in Systolic-Arrays: a Throughput Breakthrough

    Get PDF
    In past years the most common way to improve computers performance was to increase the clock frequency. In recent years this approach suffered the limits of technology scaling, therefore computers architectures are shifting toward the direction of parallel computing to further improve circuits performance. Not only GPU based architectures are spreading in consideration, but also Systolic Arrays are particularly suited for certain classes of algorithms. An important point in favor of Systolic Arrays is that, due to the regularity of their circuit layout, they are appealing when applied to many emerging and very promising technologies, like Quantum-dot Cellular Automata and nanoarrays based on Silicon NanoWire or on Carbon nanotube Field Effect Transistors. In this work we present a systematic method to improve Systolic Arrays performance exploiting Pipelining and Input Data Interleaving. We tackle the problem from a theoretical point of view first, and then we apply it to both CMOS technology and emerging technologies. On CMOS we demonstrate that it is possible to vastly improve the overall throughput of the circuit. By applying this technique to emerging technologies we show that it is possible to overcome some of their limitations greatly improving the throughput, making a considerable step forward toward the post-CMOS era

    Meso-scale modeling of reaction-diffusion processes using cellular automata

    Get PDF

    Architectural Solutions for NanoMagnet Logic

    Get PDF
    The successful era of CMOS technology is coming to an end. The limit on minimum fabrication dimensions of transistors and the increasing leakage power hinder the technological scaling that has characterized the last decades. In several different ways, this problem has been addressed changing the architectures implemented in CMOS, adopting parallel processors and thus increasing the throughput at the same operating frequency. However, architectural alternatives cannot be the definitive answer to a continuous increase in performance dictated by Moore’s law. This problem must be addressed from a technological point of view. Several alternative technologies that could substitute CMOS in next years are currently under study. Among them, magnetic technologies such as NanoMagnet Logic (NML) are interesting because they do not dissipate any leakage power. More- over, magnets have memory capability, so it is possible to merge logic and memory in the same device. However, magnetic circuits, and NML in this specific research, have also some important drawbacks that need to be addressed: first, the circuit clock frequency is limited to 100 MHz, to avoid errors in data propagation; second, there is a connection between circuit layout and timing, and in particular, longer wires will have longer latency. These drawbacks are intrinsic to the technology and for this reason they cannot be avoided. The only chance is to limit their impact from an architectural point of view. The first step followed in the research path of this thesis is indeed the choice and optimization of architectures able to deal with the problems of NML. Systolic Ar- rays are identified as an ideal solution for this technology, because they are regular structures with local interconnections that limit the long latency of wires; more- over they are composed of several Processing Elements that work in parallel, thus exploit parallelization to increase throughput (limiting the impact of the low clock frequency). Through the analysis of Systolic Arrays for NML, several possible im- provements have been identified and addressed: 1) it has been defined a rigorous way to increase throughput with interleaving, providing equations that allow to esti- mate the number of operations to be interleaved and the rules to provide inputs; 2) a latency insensitive circuit has been designed, that exploits a data communication protocol between processing elements to avoid data synchronization problems. This feature has been exploited to design a latency insensitive Systolic Array that is able to execute the Floyd-Steinberg dithering algorithm. All the improvements presented in this framework apply to Systolic Arrays implemented in any technology. So, they can also be exploited to increase performance of today’s CMOS parallel circuits. This research path is presented in Chapter 3. While Systolic Arrays are an interesting solution for NML, their usage could be quite limited because they are normally application-specific. The second re- search path addresses this problem. A Reconfigurable Systolic Array is presented, that can be programmed to execute several algorithms. This architecture has been tested implementing many algorithms, including FIR and IIR filters, Discrete Cosine Transform and Matrix Multiplication. This research path is presented in Chapter 4. In common Von Neumann architectures, the logic part of the circuit and the memory one are separated. Today bus communication between logic and memory represents the bottleneck of the system. This problem is addressed presenting Logic- In-Memory (LIM), an architecture where memory elements are merged in logic ones. This research path aims at defining a real LIM architectures. This has been done in two steps. The first step is represented by an architecture composed of three layers: memory, routing and logic. In the second step instead the routing plane is no more present, and its features are inherited by the memory plane. In this solution, a pyramidal memory model is used, where memories near logic elements contain the most probably used data, and other memory layers contain the remaining data and instruction set. This circuit has been tested with odd-even sort algorithms and it has been benchmarked against GPUs and ASIC. This research path is presented in Chapter 5. MagnetoElastic NML (ME-NML) is a technological improvement of the NML principle, proposed by researchers of Politecnico di Torino, where the clock system is based on the induced stretch of a piezoelectric substrate when a voltage is ap- plied to its boundaries. The main advantage of this solution is that it consumes much less power than the classic clock implementation. This technology has not yet been investigated from an architectural point of view and considering complex circuits. In this research field, a standard methodology for the design of ME-NML circuits has been proposed. It is based on a Standard Cell Library and an enhanced VHDL model. The effectiveness of this methodology has been proved designing a Galois Field Multiplier. Moreover the serial-parallel trade-off in ME-NML has been investigated, designing three different solutions for the Multiply and Accumulate structure. This research path is presented in Chapter 6. While ME-NML is an extremely interesting technology, it needs to be combined with other faster technologies to have a real competitive system. Signal interfaces between NML and other technologies (mainly CMOS) have been rarely presented in literature. A mixed-technology multiplexer is designed and presented as the basis for a CMOS to NML interface. The reverse interface (from ME-NML to CMOS) is instead based on a sensing circuit for the Faraday effect: a change in the polarization of a magnet induces an electric field that can be used to generate an input signal for a CMOS circuit. This research path is presented in Chapter 7. The research work presented in this thesis represents a fundamental milestone in the path towards nanotechnologies. The most important achievement is the de- sign and simulation of complex circuits with NML, benchmarking this technology with real application examples. The characterization of a technology considering complex functions is a major step to be performed and that has not yet been ad- dressed in literature for NML. Indeed, only in this way it is possible to intercept in advance any weakness of NanoMagnet Logic that cannot be discovered consid- ering only small circuits. Moreover, the architectural improvements introduced in this thesis, although technology-driven, can be actually applied to any technology. We have demonstrated the advantages that can derive applying them to CMOS cir- cuits. This thesis represents therefore a major step in two directions: the first is the enhancement of NML technology; the second is a general improvement of parallel architectures and the development of the new Logic-In-Memory paradigm

    Vormen van inzicht

    Get PDF

    Computação evolucionária para indução de regras de autômatos celulares multidimensionais

    Get PDF
    A cellular automata is a discrete dynamic system that evolves thought interactions of rules and can be applied to solve several complex problems. The task to find the transition rule to solve a problem can be generalized as a problem of rule induction for cellular automata. Several approaches, based on evolutionary computation techniques, have been proposed to solve this problem. However, there is no generic methodology capable of being applied to a large range of problems. The main contribution of this work is a generic methodology for rule induction for cellular automata. This research was done in four steps to achieve this objective. In the first step we evaluated the performance of some dynamic behavior forecasting parameters calculated as function of a transition rule. The obtained results indicated that those parameters can be used in a careful way. This is due to the possibility of obtaining valid, but insatisfactory solutions. We stress the importance of considering reference parameters, which for the majority of real problems, are not available. In the second research step we proposed a new method to forecast the dynamic behavior. This method considers the transition rule and the initial configuration of the cellular automata. We used the qualitative dynamic behavior patterns described by Wolfram as reference to the forecast. This method was efficient for null behavior rules. Since the process of dynamic simulation can have a high computational cost, we developed a third methodology: an architecture based on the concept of hardware/software co-design to accelerate the processing time. This architecture implements the evolution of cellular automata using reconfigurable logic and was able to decrease hundreds of times the processing time. In the fourth step we developed a new parallel architecture based on the master-slave paradigm. In this paradigm, the master process implements the evolutionary algorithm and a set of slaves processes divide the task of validating the obtained rules. The system runs in a cluster with 120 processing cores connected by a local area network. The co-evolutionary strategy based on an insular model allowed the search for high quality solutions. The generic system implemented over a parallel environment was able to solve the problems proposed. A task distribution analyses among several processors emphasized the benefits of parallel processing. The experiments also indicated a set of reference parameters that can be used to configure the system. The contributions of this work were theoretical and methodological. The former refers to the evaluations done and the different methods for dynamic behavior forecasting parameters. The latter is about the development of two architectures for processing.Um autômato celular é um sistema dinâmico discreto que evolui pela iteração de regras. Os valores das variáveis do sistema mudam em função de seus valores correntes. Os autômatos celulares podem ser aplicados na resolução de diversos problemas. A tarefa de encontrar uma regra de transição que solucione um determinado problema pode ser generalizada como um problema de indução de regras para autômatos celulares. Várias abordagens baseadas em técnicas de computação evolucionária vêm sendo empregadas neste problema. No entanto, estas restringem-se a aplicações específicas. A principal contribuição deste trabalho é a proposição de uma metodologia genérica para indução de regras de autômatos celulares. Para alcançar este objetivo a pesquisa foi segmentada em quatro etapas. Na primeira etapa avaliou-se o desempenho de alguns parâmetros de previsão de comportamento calculados em função de regras de transição. Os resultados obtidos nesta etapa indicaram que os parâmetros de previsão de comportamento dinâmico devem ser utilizados de forma criteriosa. Este cuidado reside na possibilidade de se obter soluções válidas, porém, não satisfatórias. Ressalta-se também a necessidade da existência de parâmetros de referência que para a maioria dos problemas reais, não está disponível. Na segunda etapa apresentou-se um novo método para a previsão do comportamento dinâmico. Este método considera a regra de transição e a configuração inicial do autômato celular. Para a previsão utilizou-se como referência os padrões de comportamento dinâmico qualitativos descritos por Wolfram. O método mostrou-se eficiente para regras de comportamento nulo. Como o processo de simulação da dinâmica de um sistema pode ter um custo computacional elevado, desenvolveu-se uma terceira metodologia. Nesta metodologia implementou-se uma arquitetura baseada no conceito de hardware/software co-design com a finalidade de contornar problemas referentes a tempo de processamento. Esta arquitetura realiza a evolução de autômatos celulares utilizando lógica reconfigurável. A arquitetura diminuiu o tempo de processamento por centenas de vezes, mas algumas restrições do modelo, como número limitado de células lógicas e reprogramações do hardware inviabilizaram seu uso. Considerando-se as restrições impostas pela arquitetura implementada, iniciou-se a quarta etapa da pesquisa onde foi desenvolvida uma nova arquitetura paralela fundamentada no paradigma mestre-escravo. Neste paradigma um processo mestre implementa o algoritmo evolucionário e um conjunto de processos escravos dividem a tarefa de validação das regras obtidas. O sistema é executado em um cluster composto por 120 núcleos de processamento que se interligam por meio de uma rede ethernet. A estratégia co-evolucionária baseada em um modelo insular permitiu a busca por soluções que apresentam um melhor valor para função de fitness. O sistema genérico implementado sobre um ambiente paralelo foi capaz de solucionar os problemas abordados. Uma análise de distribuição de tarefas entre vários processadores enfatizou os benefícios do processamento paralelo. Os experimentos também indicaram um conjunto de parâmetros evolucionários de referência que podem ser utilizados para configurar o sistema. As contribuições deste trabalho foram tanto teóricas, com as avaliações realizadas sobre os parâmetros e os diferentes métodos de previsão de comportamento dinâmico, quanto metodológicas, pois desenvolveu-se a proposta de duas arquiteturas de processamento distintas

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Methods for the identification of common RNA motifs

    Get PDF
    Löwes B. Methods for the identification of common RNA motifs. Bielefeld: Universität Bielefeld; 2017.For a long time, non-coding RNAs were given less attention than messenger RNAs, even though their existence was proposed at a similar time in 1971, because the research focus was mostly on protein coding genes. With the discovery of catalytically active RNA molecules and micro RNAs, which are involved in the post-transcriptional regulation of gene expression, non-coding RNAs have gained widespread attention. It was revealed early on that non-coding RNAs are often more conserved in structure than in sequence. Since determining the function of non-coding RNAs includes costly and time consuming laboratory experiments, computational methods can help identifying further homologs of experimentally validated RNA families. But a question remains: can we identify potential RNAs with novel functions solely by using *in silico* methods? In this thesis, we perform an evaluation of 4,667 viral reference genomes in order to identify common RNA motifs shared by multiple taxonomically distant viruses. One potential mechanism that might explain similar motifs in taxonomically distant viruses that infect common hosts by interacting with their cellular components is convergent evolution. Convergent evolution is used to describe the phenomenon that two different species that are originated from two ancestors share related or similar traits. By looking for long stretches of exact RNA structure matches with low sequence conservation, we want to maximize the chance that the common motifs are the result of structural convergence due to similar selection criteria in common host organisms. Viruses are an excellent fit when it comes to the discovery of shared RNA motifs without the involvement of conserved sequence regions because of their high mutation rates. We were able to identify 69 RNA motifs, which could not be assigned to any of the existing RNA families, with a length of at least 50 nucleotides that are shared among at least three taxonomically distant viruses. The secondary structure of an RNA molecule can be represented as a string. Finding maximal repeats in strings can be done using well-known string matching techniques based on suffix trees and arrays. In contrast to normal RNA sequences, secondary structure strings represent base pairing interactions within a single molecule. Thus, not every substring of the secondary structure defines a well-formed RNA structure. Therefore, we describe a new data structure, the viable suffix tree, that takes the constraints on the RNA secondary structure into account and only returns maximal repeats that are well-formed structures. But this data structure is not limited to RNA structures, it can also be used for any other problem domain for which a set of allowed words can be defined, e.g. by using a grammar. However, the overall complexity of constructing the viable suffix tree cannot be lower than the complexity of the word problem for the language of such a grammar. A limitation of exact structure matching is the need for long common stretches of secondary structures that are not allowed to have a mismatch at any position. Therefore, we need to allow small mismatches to find more potential targets, but current state of the art techniques use computationally too expensive methods for sequence and structure comparisons and exhibit high false positive rates around 50%. We present a new approach that uses smaller RNA sequence and structure seed motifs that do not require long stretches of the secondary structure to be identical. The sequence and structure motifs can be hashed into integer values, which can be compared much faster. An evaluation using the three well understood hammerhead ribozyme families showed that our approach is able to detect 70% to 80% of the hammerhead motifs with a similar false positive rate as the other approaches. Whenever the performance of new and existing tools should be compared, there is a need for a benchmark data set with an underlying gold standard. BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40% to 60% sequence identity range, the so-called “BRaliBase dent”. We show that this dent is due to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of tRNAs, which exhibit a very conserved secondary structure. Furthermore, we show that a simple sampling approach that restricts the presence of the most abundant RNA families can prevent such artifacts during the performance evaluation

    Pattern Discovery from Biosequences

    Get PDF
    In this thesis we have developed novel methods for analyzing biological data, the primary sequences of the DNA and proteins, the microarray based gene expression data, and other functional genomics data. The main contribution is the development of the pattern discovery algorithm SPEXS, accompanied by several practical applications for analyzing real biological problems. For performing these biological studies that integrate different types of biological data we have developed a comprehensive web-based biological data analysis environment Expression Profiler (http://ep.ebi.ac.uk/)

    Structural RNA Homology Search and Alignment Using Covariance Models

    Get PDF
    Functional RNA elements do not encode proteins, but rather function directly as RNAs. Many different types of RNAs play important roles in a wide range of cellular processes, including protein synthesis, gene regulation, protein transport, splicing, and more. Because important sequence and structural features tend to be evolutionarily conserved, one way to learn about functional RNAs is through comparative sequence analysis - by collecting and aligning examples of homologous RNAs and comparing them. Covariance models: CMs) are powerful computational tools for homology search and alignment that score both the conserved sequence and secondary structure of an RNA family. However, due to the high computational complexity of their search and alignment algorithms, searches against large databases and alignment of large RNAs like small subunit ribosomal RNA: SSU rRNA) are prohibitively slow. Large-scale alignment of SSU rRNA is of particular utility for environmental survey studies of microbial diversity which often use the rRNA as a phylogenetic marker of microorganisms. In this work, we improve CM methods by making them faster and more sensitive to remote homology. To accelerate searches, we introduce a query-dependent banding: QDB) technique that makes scoring sequences more efficient by restricting the possible lengths of structural elements based on their probability given the model. We combine QDB with a complementary filtering method that quickly prunes away database subsequences deemed unlikely to receive high CM scores based on sequence conservation alone. To increase search sensitivity, we apply two model parameterization strategies from protein homology search tools to CMs. As judged by our benchmark, these combined approaches yield about a 250-fold speedup and significant increase in search sensitivity compared with previous implementations. To accelerate alignment, we apply a method that uses a fast sequence-based alignment of a target sequence to determine constraints for the more expensive CM sequence- and structure-based alignment. This technique reduces the time required to align one SSU rRNA sequence from about 15 minutes to 1 second with a negligible effect on alignment accuracy. Collectively, these improvements make CMs more powerful and practical tools for RNA homology search and alignment
    corecore