72,380 research outputs found

    Radix-2n serial–serial multipliers

    Get PDF
    All serial–serial multiplication structures previously reported in the literature have been confined to bit serial–serial multipliers. An architecture for digit serial–serial multipliers is presented. A set of designs are derived from the radix-2n design procedure, which was first reported by the authors for the design of bit level pipelined digit serial–parallel structures. One significant aspect of the new designs is that they can be pipelined to the bit level and give the designer the flexibility to obtain the best trade-off between throughput rate and hardware cost by varying the digit size and the number of pipelining levels. Also, an area-efficient digit serial–serial multiplier is proposed which provides a 50% reduction in hardware without degrading the speed performance. This is achieved by exploiting the fact that some cells are idle for most of the multiplication operation. In the new design, the computations of these cells are remapped to other cells, which make them redundant. The new designs have been implemented on the S40BG256 device from the SPARTAN family to prove functionality and assess performance

    Towards an optimised VLSI design algorithm for the constant matrix multiplication problem

    Get PDF
    The efficient design of multiplierless implementations of constant matrix multipliers is challenged by the huge solution search spaces even for small scale problems. Previous approaches tend to use hill-climbing algorithms risking sub-optimal results. The proposed algorithm avoids this by exploring parallel solutions. The computational complexity is tackled by modelling the problem in a format amenable to genetic programming and hardware acceleration. Results show an improvement on state of the art algorithms with future potential for even greater savings

    First performance evaluation of a Multi-layer Thick Gaseous Electron Multiplier with in-built electrode meshes - MM-THGEM

    Full text link
    We describe a new micro-pattern gas detector structure comprising a multi-layer hole-type multiplier (M-THGEM) combined with two in-built electrode meshes: the Multi-Mesh THGEM-type multiplier (MM-THGEM). Suitable potential differences applied between the various electrodes provide an efficient collection of ionization electrons within the MM-THGEM holes and a large charge avalanche multiplication between the meshes. Different from conventional hole-type multipliers (e.g. Gas Electron Multipliers - GEMs, Thick Gas Electron Multipliers - THGEMs, etc.), which are characterized by a variable (dipole-like) field strength inside the avalanche gap, electrons in MM-THGEMs are largely multiplied by a strong uniform field established between the two meshes, like in the parallel-plate avalanche geometry. The presence of the two meshes within the holes allows for the trapping of a large fraction of the positive ions that stream back to the drift region. A gas gain above 10^5 has been achieved for single photo-electron detection with a single MM-THGEM in Ar/(10%)CH4 and He/(10%)CO2, at standard conditions for temperature and pressure. When the MM-THGEM is coupled to a conventional THGEM and used as first cascade element, the maximum achievable gains reach values above 10^6 in He/(10%)CO2, while the IBF approaches of 1.5% in the case of optimum detector-bias configuration. This IBF value is several times lower compared to the one obtained by a double GEM/THGEM detector (5-10%), and equivalent to the performance attained by a Micromegas detector.Comment: 11 pages, 8 figures. Submitted to JINS

    Parallel Algorithms for Constrained Tensor Factorization via the Alternating Direction Method of Multipliers

    Full text link
    Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-memory computation on a single machine; and the few that break away from this mold do not easily incorporate practically important constraints, such as nonnegativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration; and it naturally leads to distributed algorithms suitable for parallel implementation on regular high-performance computing (e.g., mesh) architectures. This opens the door for many emerging big data-enabled applications. The methodology is exemplified using nonnegativity as a baseline constraint, but the proposed framework can more-or-less readily incorporate many other types of constraints. Numerical experiments are very encouraging, indicating that the ADMoM-based nonnegative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches.Comment: Submitted to the IEEE Transactions on Signal Processin

    Two-dimensional DCT/IDCT architecture

    Get PDF
    A fully parallel architecture for the computation of a two-dimensional (2-D) discrete cosine transform (DCT), based on row-column decomposition is presented. It uses the same one dimensional (1-D) DCT unit for the row and column computations and (N2+N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2-D DCT. Each 1-D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2” arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2 compressors for the accumulation of the products in the intermediate stages and reduces the number of accumulators from N to one. Using this procedure, the 2-D DCT architecture requires less than N2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N x N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay
    corecore