72,380 research outputs found
Radix-2n serial–serial multipliers
All serial–serial multiplication structures previously reported in the literature have been
confined to bit serial–serial multipliers. An architecture for digit serial–serial multipliers is presented. A set of designs are derived from the radix-2n design procedure, which was first reported by the authors for the design of bit level pipelined digit serial–parallel structures. One significant aspect of the new designs is that they can be pipelined to the bit level and give the designer the flexibility to obtain the best trade-off between throughput rate and hardware cost by varying the digit size and the number of pipelining levels. Also, an area-efficient digit serial–serial multiplier is proposed which provides a 50% reduction in hardware without degrading the speed performance.
This is achieved by exploiting the fact that some cells are idle for most of the multiplication
operation. In the new design, the computations of these cells are remapped to other cells, which
make them redundant. The new designs have been implemented on the S40BG256 device from the
SPARTAN family to prove functionality and assess performance
Towards an optimised VLSI design algorithm for the constant matrix multiplication problem
The efficient design of multiplierless implementations of constant matrix multipliers is challenged by the huge solution search spaces even for small scale problems. Previous approaches tend to use hill-climbing algorithms risking sub-optimal results. The proposed algorithm avoids this by exploring parallel solutions. The computational complexity is tackled by modelling the problem in a format amenable to genetic programming and hardware acceleration. Results show an improvement on state of the art algorithms with future potential for even greater savings
First performance evaluation of a Multi-layer Thick Gaseous Electron Multiplier with in-built electrode meshes - MM-THGEM
We describe a new micro-pattern gas detector structure comprising a
multi-layer hole-type multiplier (M-THGEM) combined with two in-built electrode
meshes: the Multi-Mesh THGEM-type multiplier (MM-THGEM). Suitable potential
differences applied between the various electrodes provide an efficient
collection of ionization electrons within the MM-THGEM holes and a large charge
avalanche multiplication between the meshes. Different from conventional
hole-type multipliers (e.g. Gas Electron Multipliers - GEMs, Thick Gas Electron
Multipliers - THGEMs, etc.), which are characterized by a variable
(dipole-like) field strength inside the avalanche gap, electrons in MM-THGEMs
are largely multiplied by a strong uniform field established between the two
meshes, like in the parallel-plate avalanche geometry. The presence of the two
meshes within the holes allows for the trapping of a large fraction of the
positive ions that stream back to the drift region. A gas gain above 10^5 has
been achieved for single photo-electron detection with a single MM-THGEM in
Ar/(10%)CH4 and He/(10%)CO2, at standard conditions for temperature and
pressure. When the MM-THGEM is coupled to a conventional THGEM and used as
first cascade element, the maximum achievable gains reach values above 10^6 in
He/(10%)CO2, while the IBF approaches of 1.5% in the case of optimum
detector-bias configuration. This IBF value is several times lower compared to
the one obtained by a double GEM/THGEM detector (5-10%), and equivalent to the
performance attained by a Micromegas detector.Comment: 11 pages, 8 figures. Submitted to JINS
Parallel Algorithms for Constrained Tensor Factorization via the Alternating Direction Method of Multipliers
Tensor factorization has proven useful in a wide range of applications, from
sensor array processing to communications, speech and audio signal processing,
and machine learning. With few recent exceptions, all tensor factorization
algorithms were originally developed for centralized, in-memory computation on
a single machine; and the few that break away from this mold do not easily
incorporate practically important constraints, such as nonnegativity. A new
constrained tensor factorization framework is proposed in this paper, building
upon the Alternating Direction method of Multipliers (ADMoM). It is shown that
this simplifies computations, bypassing the need to solve constrained
optimization problems in each iteration; and it naturally leads to distributed
algorithms suitable for parallel implementation on regular high-performance
computing (e.g., mesh) architectures. This opens the door for many emerging big
data-enabled applications. The methodology is exemplified using nonnegativity
as a baseline constraint, but the proposed framework can more-or-less readily
incorporate many other types of constraints. Numerical experiments are very
encouraging, indicating that the ADMoM-based nonnegative tensor factorization
(NTF) has high potential as an alternative to state-of-the-art approaches.Comment: Submitted to the IEEE Transactions on Signal Processin
Two-dimensional DCT/IDCT architecture
A fully parallel architecture for the computation of a two-dimensional (2-D) discrete cosine transform (DCT), based on row-column decomposition is presented. It uses the same one dimensional (1-D) DCT unit for the row and column computations and (N2+N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2-D DCT. Each 1-D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2” arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2 compressors for the accumulation of the products in the intermediate stages and reduces the number of accumulators from N to one. Using this procedure, the 2-D DCT architecture requires less than N2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N x N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay
- …
