8 research outputs found

    Hardware co-processor to enable MIMO in next generation wireless networks

    Get PDF
    One prevailing technology in wireless communication is Multiple Input, Multiple Output (MIMO) communication. MIMO communication simultaneously transmits several data streams, each from their own antenna within the same frequency channel. This technique can increase data bandwidth by up to a factor of the number of transmitting antennas, but comes with the cost of a much higher computational complexity for the wireless receiver. MIMO communication exploits differing channel effects caused by physical distances between antennas to differentiate between transmitting antennas, an intrinsically two dimensional operation. Current Digital Signal Processors (DSPs), on the other hand, are designed to perform computations on one dimensional vectors of incoming data. To compensate for the lack of native support of these higher dimensional operations, current base stations are forced to add multiple new processing elements while many mobile devices cannot support MIMO communication. In order to allow wireless clients and stations to have native support of the two dimensional operations required by MIMO communication, a hardware co-processor was designed to allow the DSP to offload these operations onto another processor to reduce computation time

    Address optimizations for embedded processors

    Get PDF
    Embedded processors that are common in electronic devices perform a limited set of tasks compared to general-purpose processor systems. They have limited resources which have to be efficiently used. Optimal utilization of program memory needs a reduction in code size which can be achieved by eliminating unnecessary address computations i.e., generate optimal offset assignment that utilizes built-in addressing modes. Single offset assignment (SOA) solutions, used for processors with one address register; start with the access sequence of variables to determine the optimal assignment. This research uses the basic block to commutatively transform statements to alter the access sequence. Edges in the access graphs are classified into breakable and unbreakable edges. Unbreakable edges are preferred when selecting edges for the assignment. Breakable edges are used to commutatively transform statements such that the assignment cost is reduced. The use of a modify register in some processors allows the address to be modified by a value in MR in addition to post-increment/decrement modes. Though finding the most beneficial value of MR is a common practice, this research shows that modifying the access sequence using edge fold, node swap, and path interleave techniques for an MR value of two has significant benefit. General offset assignment requires variables in the access sequence to be partitioned to various address registers. Use of the node degree in the access graph demonstrates greater benefit than using edge weights and frequency of variables. The Static Single Assignment (SSA) form of the basic block introduces new variables to an access graph, making it sparser. Sparser access graphs usually have lower assignment costs. The SSA form allows reuse of variable space based on variable lifetimes. Offset assignment solutions may be improved by incrementally assignment based on uncovered edges, providing the best cost improvement. This heuristic considers improvements due to all uncovered edges. Optimization techniques have primarily been edge-based. Node-based SOA technique has been tested for use with commutative transformations and shown to be better than edge-based heuristics. Heuristics developed in this research perform address optimizations for embedded processors, employing new techniques that lower address computation costs

    A computational environment to support the teaching of structure processing

    Get PDF
    Orientador: José Raimundo de OliveiraDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A tecnologia dos processadores tem crescido rapidamente nos últimos anos. Em contrapartida, o ensino de arquitetura de computadores tem dificuldade de acompanhar esta evolução. Os livros textos e as aulas ainda utilizam de recursos estáticos que precisam de longas explicações. Esta dinâmica torna-se incompatível, inclusive, com as experiências, como usuários, de muitos alunos. Este trabalho propõe o desenvolvimento de um ambiente computacional (framework) para apoio ao ensino de arquitetura de processamento, denominado MODPRO. A ideia é dispor de módulos que possam ser interligados formando diversas possíveis estruturas de processamento. Desta maneira, o professor pode desenvolver junto aos alunos e expor de forma visual (utilizando de animações) desde componentes básicos até estruturas de processamento mais avançadas. O MODPRO é composto por um simulador, denominado SIMPRO o qual exibe de forma animada, passo a passo, ou em tempo real, o fluxo de dados e de sinais dentro da estrutura de processamento estudada. O SIMPRO foi desenvolvido em linguagem Javascript e utiliza recursos de Cascading Style Sheets, podendo, ainda, ser acessado pela web. O MODPRO ainda é composto por um emulador, chamado EMUPRO que contém os mesmos módulos do SIMPRO. O seu diferencial está relacionado ao fato de ter sido totalmente desenvolvido em hardware, utilizando a ferramenta QUARTUS II da Altera. Com este recurso, os alunos podem, em laboratório, validar a estrutura desenvolvida em classe. Por serem modulares, tanto o simulador SIMPRO quanto o emulador EMUPRO permitem que novos recursos (módulos) possam ser adicionados, permitindo assim, o ensino e o estudo de diferentes estruturas de processamentoAbstract: The processor technology has grown rapidly in recent years. In constrast, the teaching of computer architectures has difficulties to follow such evolution. The books and the classes still use static resources which need long explanations. This dynamic becomes incompatible with the experiences desirable for most students. This work proposes the development of a computational environment (framework) to support the teaching of processing architecture, which is called MODPRO. The idea is to have modules that can be connected together forming several possible processing structures. Therefore, the professor can develop different scenarios with the students and expose in a visual way (using animation features), from basic components to more advanced processing structures. The MODPRO consists of a simulator, called SIMPRO which displays (in an animated form), step by step or in real-time, the data flow and the signal processing within the structure being studied. The SIMPRO was developed in JavaScript language, uses Cascading Style Sheets and can be accessed via web. The MODPRO also consistes in an emulator called EMUPRO, which contains the same modules of SIMPRO. Its differential is related to the fact that it has been developed entirely in hardware, using the development environment QUARTUS II from Altera. Basically, with the features of MODPRO, the students can validate, in laboratory, the frameworks and the processing structures developed during the classes. Because they are modular, both the simulator SIMPRO and the emulator EMUPRO allow the addition of new features (modules), besides allowing the teaching and the study of different processing structuresMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    On the design and implementation of a control system processor

    Get PDF
    In general digital control algorithms are multi-input multi-output (MIMO) recursive digital filters, but there are particular numerical requirements in control system processing for which standard processor devices are not well suited, in particular arising in systems with high sample rates. There is therefore a clear need to understand the numerical requirements properly, to identity optimised forms for implementing control laws, and to translate these into efficient processor architectures. By taking a considered view of the numerical and calculation requirements of control algorithms, it is possible to consider special purpose processors that provide well-targeted support of control laws. This thesis describes a compact, high-speed, special-purpose processor which offers a low-cost solution to implementing linear time invariant controllers. [Continues.

    Efficient VLSI Architectures for Image Compression Algorithms

    Get PDF
    An image, in its original form, contains huge amount of data which demands not only large amount of memory requirements for its storage but also causes inconvenient transmission over limited bandwidth channel. Image compression reduces the data from the image in either lossless or lossy way. While lossless image compression retrieves the original image data completely, it provides very low compression. Lossy compression techniques compress the image data in variable amount depending on the quality of image required for its use in particular application area. It is performed in steps such as image transformation, quantization and entropy coding. JPEG is one of the most used image compression standard which uses discrete cosine transform (DCT) to transform the image from spatial to frequency domain. An image contains low visual information in its high frequencies for which heavy quantization can be done in order to reduce the size in the transformed representation. Entropy coding follows to further reduce the redundancy in the transformed and quantized image data. Real-time data processing requires high speed which makes dedicated hardware implementation most preferred choice. The hardware of a system is favored by its lowcost and low-power implementation. These two factors are also the most important requirements for the portable devices running on battery such as digital camera. Image transform requires very high computations and complete image compression system is realized through various intermediate steps between transform and final bit-streams. Intermediate stages require memory to store intermediate results. The cost and power of the design can be reduced both in efficient implementation of transforms and reduction/removal of intermediate stages by employing different techniques. The proposed research work is focused on the efficient hardware implementation of transform based image compression algorithms by optimizing the architecture of the system. Distribute arithmetic (DA) is an efficient approach to implement digital signal processing algorithms. DA is realized by two different ways, one through storage of precomputed values in ROMs and another without ROM requirements. ROM free DA is more efficient. For the image transform, architectures of one dimensional discrete Hartley transform (1-D DHT) and one dimensional DCT (1-D DCT) have been optimized using ROM free DA technique. Further, 2-D separable DHT (SDHT) and 2-D DCT architectures have been implemented in row-column approach using two 1-D DHT and two 1-D DCT respectively. A finite state machine (FSM) based architecture from DCT to quantization has been proposed using the modified quantization matrix in JPEG image compression which requires no memory in storage of quantization table and DCT coefficients. In addition, quantization is realized without use of multipliers that require more area and are power hungry. For the entropy encoding, Huffman coding is hardware efficient than arithmetic coding. The use of Huffman code table further simplifies the implementation. The strategies have been used for the significant reduction of memory bits in storage of Huffman code table and the complete Huffman coding architecture encodes the transformed coefficients one bit per clock cycle. Direct implementation algorithm of DCT has the advantage that it is free of transposition memory to store intermediate 1-D DCT. Although recursive algorithms have been a preferred method, these algorithms have low accuracy resulting in image quality degradation. A non-recursive equation for the direct computation of DCT coefficients have been proposed and implemented in both 0.18 µm ASIC library as well as FPGA. It can compute DCT coefficients in any order and all intermediate computations are free of fractions and hence very high image quality has been obtained in terms of PSNR. In addition, one multiplier and one register bit-width need to be changed for increasing the accuracy resulting in very low hardware overhead. The architecture implementation has been done to obtain zig-zag ordered DCT coefficients. The comparison results show that this implementation has less area in terms of gate counts and less power consumption than the existing DCT implementations. Using this architecture, the complete JPEG image compression system has been implemented which has Huffman coding module, one multiplier and one register as the only additional modules. The intermediate stages (DCT to Huffman encoding) are free of memory, hence efficient architecture is obtained

    Entrepreneurial Discovery and Information Complexity in Knowledge-Intensive Industries

    Get PDF
    Why are some firms better able than others to exploit new opportunities? I posit that differences in the type and level of complexity of the information obtained through the entrepreneurial discovery process may be a meaningful indicator of the likelihood that a firm is able to exploit a new opportunity. Specifically, I investigate knowledge reproduction processes for product replication (internal copying) and imitation (external copying) as a means of exploiting opportunities and building competitive advantage. Integrating concepts from information theory and the knowledge-based view of the firm, I introduce a generalized model and quantitative methods for estimating the inherent complexity of any unit of knowledge, such as a strategy, technology, product, or service, as long as the unit is represented in algorithm form. Modeling organizations as information processing systems, I develop measures of the information complexity of an algorithm representing a unit of knowledge in terms of the minimum amount of data (algorithmic complexity) and the minimum number of instructions (computational complexity) required to fully describe and execute the algorithm. I apply this methodology to construct and analyze a unique historical dataset of 91 firms (diversifying and de novo entrants) and 853 new product introductions (1974-2009), in a knowledge-intensive industry, digital signal processing. I find that: (1) information complexity is negatively and significantly related to product replication and imitation; (2) replicators have the greatest advantage over imitators at moderate levels of information complexity; (3) intellectual property regimes strengthening the patentability of algorithms significantly increase product replication, without significantly decreasing imitation; (4) outbound licensing of patented technologies decreases product replication and increases imitation; (5) products introduced by de novo entrants are less likely to be replicated and more likely to be imitated than products introduced by diversifying entrants; and (6) diversifying entrants have the greatest advantage over de novo entrants at high and low levels of information complexity; neither type of entrant has a significant advantage at moderate levels of complexity. These empirical findings support and extend predictions from earlier simulation studies. The model is applicable to other aspects of organizational strategy and has important implications for researchers, managers, and policymakers.Doctor of Philosoph

    Performance- und energieeffiziente Compilierung für digitale SIMD-Signalprozessoren mittels genetischer Algorithmen

    Get PDF
    In den letzten Jahren war ein ständig zunehmender Einsatz von eingebetteten Systemen in vielen Produkten unseres täglichen Lebens zu verzeichnen. Häufig sind an diese Systeme spezielle Anforderungen bezüglich einer Realzeitfähigkeit, einer geringen Größe und auch zunehmend eines geringen Energiebedarfs gebunden. Um diesen Anforderungen zu genügen und dennoch ein hohes Maß an Flexibilität beim Systementwurf beizubehalten, werden anstelle von anwendungsspezifischer Hardware häufig digitale Signalprozessoren (DSPs) zur Datenverarbeitung eingesetzt. Mit diesen wird auch bei Spezifikationsänderungen in späten Entwicklungsphasen i.d.R. keine kosten- und zeitintensive Neuentwicklung der verwendeten Hardware erforderlich. Leider stellt die manuelle Überführung eines Anwendungsprogramms in Assemblercode des Zielprozessors eine äußerst zeitaufwändige und fehlerträchtige Aufgabe dar. Aus diesem Grund werden Compiler benötigt, die in der Lage sind, eine gegebene Anwendung in effizienten Assemblercode zu überführen. Im Vergleich zu General-Purpose Prozessoren (GPPs) weisen DSPs jedoch spezielle Architekturmerkmale auf, die von herkömmlichen Compilertechniken nur unzureichend oder gar nicht ausgenutzt werden. Das Ziel dieser Arbeit besteht in der Entwicklung neuer Compilertechniken für DSPs, um die durch Compiler generierte Codequalität insbesondere hinsichtlich der Ausführungszeit und des Energiebedarfs zu verbessern. Um eine Wiederverwendung der entwickelten Techniken in anderen Compilern zu ermöglichen, setzen diese auf der ebenfalls in dieser Arbeit beschriebenen neuen Zwischendarstellung GeLIR (Generic Low-Level Intermediate Representation) auf. Als Schwerpunkt dieser Arbeit wird ein Codegenerator vorgestellt, der in der Lage ist, eine graphbasierte Codeselektion durchzuführen und zusätzlich die Phasen der Codeselektion, Instruktionsanordnung (einschließlich Kompaktierung) und Registerallokation im Sinne einer Phasenkopplung simultan löst. Da dies die Lösung eines NP-harten Optimierungsproblems darstellt, ist dem Codegenerator ein Optimierungsverfahren auf Basis eines genetischen Algorithmus zugrunde gelegt. Zusätzlich werden bei der Durchführung der Teilaufgaben Codeselektion, Instruktionsauswahl und Registerallokation bereits Wechselwirkungen mit der nachfolgend durchgeführten Adresscode-Generierung berücksichtigt. Aufgrund der flexiblen Spezifikationsmöglichkeit von Kostenfunktionen in genetischen Optimierungsverfahren ist der Codegenerator unter Verwendung eines Energiekostenmodells in der Lage, eine energieeffiziente Auswahl und Anordnung von Instruktionen durchzuführen. Als weiterer Schwerpunkt werden Optimierungsverfahren zur effektiven Ausnutzung der parallelen Datenpfade und von SIMD-Speicherzugriffen vorgestellt. Mit der Integration des Energiekostenmodells in den Codegenerator und den Simulator wird dabei mit dieser Arbeit erstmalig das Potential von SIMD-Operationen hinsichtlich der energieeffizienten Ausführung von DSP-Programmen compilerunterstützt untersucht. Durch die beispielhafte Implementierung der Techniken für eine DSP-Architektur und die Retargierung des genetischen Codegenerators auf einen weiteren DSP wird die Anwendbarkeit für reale Prozessoren gezeigt
    corecore