38 research outputs found

    Computations of Uniform Recurrence Equations Using Minimal Memory Size

    Get PDF
    International audienceWe consider a system of uniform recurrence equations (URE) of dimension one. We show how its computation can be carried out using minimal memory size with several synchronous processors. This result is then applied to register minimization for digital circuits and parallel computation of task graphs

    Degree of Sequentiality of Weighted Automata

    Get PDF
    Weighted automata (WA) are an important formalism to describe quantitative properties. Obtaining equivalent deterministic machines is a longstanding research problem. In this paper we consider WA with a set semantics, meaning that the semantics is given by the set of weights of accepting runs. We focus on multi-sequential WA that are defined as finite unions of sequential WA. The problem we address is to minimize the size of this union. We call this minimum the degree of sequentiality of (the relation realized by) the WA. For a given positive integer k, we provide multiple characterizations of relations realized by a union of k sequential WA over an infinitary finitely generated group: a Lipschitz-like machine independent property, a pattern on the automaton (a new twinning property) and a subclass of cost register automata. When possible, we effectively translate a WA into an equivalent union of k sequential WA. We also provide a decision procedure for our twinning property for commutative computable groups thus allowing to compute the degree of sequentiality. Last, we show that these results also hold for word transducers and that the associated decision problem is PSPACE -complete

    Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

    Full text link
    Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but their implementation in real code is tedious, especially when optimized for real-time performance on modern GPUs in interactive applications. In this work, we propose a new language, Opt (available under http://optlang.org), for writing these objective functions over image- or graph-structured unknowns concisely and at a high level. Our compiler automatically transforms these specifications into state-of-the-art GPU solvers based on Gauss-Newton or Levenberg-Marquardt methods. Opt can generate different variations of the solver, so users can easily explore tradeoffs in numerical precision, matrix-free methods, and solver approaches. In our results, we implement a variety of real-world graphics and vision applications. Their energy functions are expressible in tens of lines of code, and produce highly-optimized GPU solver implementations. These solver have performance competitive with the best published hand-tuned, application-specific GPU solvers, and orders of magnitude beyond a general-purpose auto-generated solver

    GAIA: Composition, Formation and Evolution of the Galaxy

    Get PDF
    The GAIA astrometric mission has recently been approved as one of the next two `cornerstones' of ESA's science programme, with a launch date target of not later than mid-2012. GAIA will provide positional and radial velocity measurements with the accuracies needed to produce a stereoscopic and kinematic census of about one billion stars throughout our Galaxy (and into the Local Group), amounting to about 1 per cent of the Galactic stellar population. GAIA's main scientific goal is to clarify the origin and history of our Galaxy, from a quantitative census of the stellar populations. It will advance questions such as when the stars in our Galaxy formed, when and how it was assembled, and its distribution of dark matter. The survey aims for completeness to V=20 mag, with accuracies of about 10 microarcsec at 15 mag. Combined with astrophysical information for each star, provided by on-board multi-colour photometry and (limited) spectroscopy, these data will have the precision necessary to quantify the early formation, and subsequent dynamical, chemical and star formation evolution of our Galaxy. Additional products include detection and orbital classification of tens of thousands of extra-Solar planetary systems, and a comprehensive survey of some 10^5-10^6 minor bodies in our Solar System, through galaxies in the nearby Universe, to some 500,000 distant quasars. It will provide a number of stringent new tests of general relativity and cosmology. The complete satellite system was evaluated as part of a detailed technology study, including a detailed payload design, corresponding accuracy assesments, and results from a prototype data reduction development.Comment: Accepted by A&A: 25 pages, 8 figure

    A VLSI DSP DESIGN AND IMPLEMENTATION OF ALL POLE LATTICE FILTER USING RETIMING METHODOLOGY

    Get PDF
    All pole lattice fil ters are used in a variety of signal processing applications that is speech processing, adaptive filters and various other applications. The implementation of lattice f i l t e r requires more clock period hence low speed. There are various transformation technique pr es ent for design of high-speed or low-area or lowpower implementations. This paper presents design of high-speed (smaller clock period) implementation of 8th order all pole lattice filter using the methodology named as Retiming. Retiming reduces the clock period of the circuit, reducing the number of registers in the circuit, reducing the power consumption of the circuit. Therefore, retiming has been used to reduce the clock period of all pole lattice filters and it increases the speed of the system

    Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

    No full text
    Many graphics and vision problems are naturally expressed as optimizations with either linear or non-linear least squares objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but their implementation in real code is tedious, especially when optimized for real-time performance in interactive applications. We propose a new language, Opt (available under http://optlang.org), in which a user simply writes energy functions over image- or graph-structured unknowns, and a compiler automatically generates state-of-the-art GPU optimization kernels. The end result is a system in which real-world energy functions in graphics and vision applications are expressible in tens of lines of code. They compile directly into highly-optimized GPU solver implementations with performance competitive with the best published hand-tuned, application-specific GPU solvers, and 1-2 orders of magnitude beyond a general-purpose auto-generated solver

    Optimizations of Cisco’s Embedded Logic Analyzer Module

    Get PDF
    Cisco’s embedded logic analyzer module (ELAM) is a debugging device used for many of Cisco’s application specific integrated chips (ASICs). The ELAM is used to capture data of interest to the user and stored for analysis purposes. The user enters a trigger expression containing data fields of interest in the form of a logical equation. The data fields associated with the trigger expression are stored in a set of Match and Mask (MM) registers. Incoming data packets are matched against these registers, and if the user-specified data pattern is detected, the ELAM triggers and begins a countdown sequence to stop data capture. The current ELAM implementation is restricted in the form of trigger expressions that are allowed and in the allocation of resources. Currently, data fields in the trigger expression can only be logically ANDed together, Match and Mask registers are inefficiently utilized, and a static state machine exists in the ELAM trigger logic. To optimize the usage of the ELAM, a trigger expression is first treated as a Boolean expression so that minimization algorithms can be run. Next, the data stored in the Match and Mask registers is analyzed for redundancies. Finally, a dynamic state machine is programmed with a distinct set of states generated from the trigger expression. This set of states is further minimized. A feasibility study is done to analyze the validity of the results

    Minimizing resources of sweeping and streaming string transducers

    Get PDF
    We consider minimization problems for natural parameters of word transducers: the number of passes performed by two-way transducers and the number of registers used by streaming transducers. We show how to compute in ExpSpace the minimum number of passes needed to implement a transduction given as sweeping transducer, and we provide effective constructions of transducers of (worst-case optimal) doubly exponential size. We then consider streaming transducers where concatenations of registers are forbidden in the register updates. Based on a correspondence between the number of passes of sweeping transducers and the number of registers of equivalent concatenation-free streaming transducers, we derive a minimization procedure for the number of registers of concatenation-free streaming transducers

    Interconnect-aware scheduling and resource allocation for high-level synthesis

    Get PDF
    A high-level architectural synthesis can be described as the process of transforming a behavioral description into a structural description. The scheduling, processor allocation, and register binding are the most important tasks in the high-level synthesis. In the past, it has been possible to focus simply on the delays of the processing units in a high-level synthesis and neglect the wire delays, since the overall delay of a digital system was dominated by the delay of the logic gates. However, with the process technology being scaled down to deep-submicron region, the global interconnect delays can no longer be neglected in VLSI designs. It is, therefore, imperative to include in high-level synthesis the delays on wires and buses used to communicate data between the processing units i.e., inter-processor communication delays. Furthermore, the way the process of register binding is performed also has an impact on the complexity of the interconnect paths required to transfer data between the processing units. Hence, the register binding can no longer ignore its effect on the wiring complexity of resulting designs. The objective of this thesis is to develop techniques for an interconnect-aware high-level synthesis. Under this common theme, this thesis has two distinct focuses. The first focus of this thesis is on developing a new high-level synthesis framework while taking the inter-processor communication delay into consideration. The second focus of this thesis is on the developing of a technique to carry out the register binding and a scheme to reduce the number of registers while taking the complexity of the interconnects into consideration. A novel scheduling and processor allocation technique taking into consideration the inter-processor communication delay is presented. In the proposed technique, the communication delay between a pair of nodes of different types is treated as a non-computing node, whereas that between a pair of nodes of the same type is taken into account by re-adjusting the firing times of the appropriate nodes of the data flow graph (DFG). Another technique for the integration of the placement process into the scheduling and processor allocation in order to determine the actual positions of the processing units in the placement space is developed. The proposed technique makes use of a hybrid library of functional units, which includes both operation-specific and reconfigurable multiple-operation functional units, to maximize the local data transfer. A technique for register binding that results in a reduced number of registers and interconnects is developed by appropriately dividing the lifetime of a token into multiple segments and then binding those having the same source and/or destination into a single register. A node regeneration scheme, in which the idle processing units are utilized to generate multiple copies of the nodes in a given DFG, is devised to reduce the number of registers and interconnects even further. The techniques and schemes developed in this thesis are applied to the synthesis of architectures for a number of benchmark DSP problems and compared with various other commonly used synthesis methods in order to assess their effectiveness. It is shown that the proposed techniques provide superior performance in terms of the iteration period, placement area, and the numbers of the processing units, registers and interconnects in the synthesized architectur
    corecore