24 research outputs found
Fast decimal floating-point division
A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher. with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literatureHooman Nikmehr, Braden Phillips and Cheng-Chew Li
In vitro anti-leishmanial activity of methanolic extracts of Calendula officinalis flowers, Datura stramonium seeds, and Salvia officinalis leaves
Aim: The anti-leishmanial activity of methanolic extracts of Calendula officinalis flowers, Datura stramonium seeds, and Salvia officinalis leaves against extracellular (promastigote) and intracellular (amastigote) forms of Leishmania major were evaluated in this study. Method: In the first stage, promastigote forms of L. major, were treated with different doses of the plant extracts in a 96-well tissue-culture microplate and IC50 values for each extract were measured with colorimetric MTT assay. In the second stage, macrophage cells were infected with L. major promastigotes. Infected macrophages were treated with plant extracts. Then the macrophages were stained with Gimsa and the number of infected macrophages and amastigotes were counted with a light microscope. Results: The results indicated that the plant extracts inhibited the growth of promastigotes and amastigotes of L. major. Inhibitory concentrations (IC50) for promastigote assay were 108.19, 155.15, and 184.32 μg7·mL-1 for C. officinalis flowers, D. stramonium seeds and S. officinalis, respectively. The extracts also reduced the number of amastigotes in macrophage cells from 264 for control group to 88, 97, and 102 for test groups. Although the anti-leishmanial activity of the extracts were not comparable with the standard drug, miltefosine; but they showed significant efficiency in reducing the number of amastigotes in macrophages, in comparison with the control group (P < 0.001). These plant extracts had lower toxicity compared with miltefosine. Conclusion: This study demonstrates the potential efficacy of the methanolic extracts of C. officinalis flowers, D. stramonium seeds, and S. officinalis leaves to control of cutaneous leishmaniasis. © 2014 China Pharmaceutical University
A flexible CO-OFDM using reconfigurable multi-precision FFT
To improve flexibility, energy efficiency, and speed of elastic optical orthogonal frequency division multiplexing systems, a reconfigurable multi-precision fast Fourier transform (FFT) is presented. The minimum required FFT word length for different criteria-namely, the bit error rate (BER), FFT size, modulation format, and optical signal-to-noise ratio-is determined using simulations. At run time, system performance is improved significantly by using a reconfigurable multi-precision FFT, while the target BER is satisfied. The proposed design is synthesized on an FPGA, and the results show that both energy efficiency and processing speed are improved about 79% and 78%, respectively, compared with the conventional design.Hatam Abdoli, Hooman Nikmeh
Hierarchical Bayesian Reservoir Memory
In a quest for modeling human brain, we are going to introduce a brain model based on a general framework for brain called Memory-Prediction Framework. The model is a hierarchical Bayesian structure that uses Reservoir Computing methods as the state-of-theart and the most biological plausible Temporal Sequence Processing method for online and unsupervised learning. So, the model is called Hierarchical Bayesian Reservoir Memory (HBRM). HBRM uses a simple stochastic gradient descent learning algorithm to learn and organize common multi-scale spatio-temporal patterns/features of the input signals in a hierarchical structure in an unsupervised manner to provide robust and real-time prediction of future inputs. We suggest HBRM as a real-time high-dimensional stream processing model for the basic brain computations. In this paper we will describe the model and assess its prediction accuracy in a simulated real-world environment.Ali Nouri and Hooman Nikmeh
A classified and comparative study of 2-D convolvers
Two-dimensional (2-D) convolution is a common operation in a wide range of signal and image processing applications such as edge detection, sharpening, and blurring. In the hardware implementation of these applications, 2d convolution is one of the most challenging parts because it is a compute-intensive and memory-intensive operation. To address these challenges, several design techniques such as pipelining, constant multiplication, and time-sharing have been applied in the literature which leads to convolvers with different implementation features. In this paper, based on design techniques, we classify these convolvers into four classes named Non-Pipelined Convolver, Reduced-Bandwidth Pipelined Convolver, Multiplier- Less Pipelined Convolver, and Time-Shared Convolver. Then, implementation features of these classes, such as critical path delay, memory bandwidth, and resource utilization, are analytically discussed for different convolution kernel sizes. Finally, an instance of each class is captured in Verilog and their features are evaluated by implementing them on a Virtex-7 FPGA and reported confirming the analytical discussions.Mahdi Kalbasi, Hooman Nikmeh
A fine-grained pipelined 2-D convolver for high-performance applications
2-D convolution has been used as a subsystem for filtering and enhancement in a wide range of signal and image processing applications. It is not only a memory intensive operation but also a compute intensive process. The previous efforts to improve the performance of 2-D convolution have primarily focused on the memory access challenges. However, the convolver's performance is highly dependent on the efficient design of the computation units, which can be enhanced significantly by employing some techniques such as pipelining. In this brief, a pipelined with low pixel access rate architecture for implementation of 2-D convolution is presented. Compared to the conventional convolvers, the proposed design involves a fixed (independent to the problem/kernel) size, and a significantly shorter critical path specifically for large kernel sizes where the proposed convolver works with 283-MHz clock frequency, on a Xilinx Virtex-7 (XC7V2000t) field-programmable gate array, for a 3 × 3 kernel. Additionally, the required pixel access rate of the new scheme is less than that of the state-of-the-art methods, which is only 849 Mb/s for a 3 × 3 kernel and 8-bit pixels. The improvements in the critical path delay and the required pixel access rate are obtained without significant increase in the resource utilization.Mahdi Kalbasi, Hooman Nikmeh
Advances in Computer Systems Architecture
The original publication is available at www.springerlink.comDigit recurrence operations mainly use redundant representation, mostly in signed-digit format. These operations generate results one digit per iteration from left to right. Since the final result has to be converted into 2s complement format, to eliminate the end of operation conversion delay, on-the-fly algorithms are employed. These algorithms, with no use of carry propagating adders, convert the result into the conventional format as the signed-digits are generated. Some applications like rounding non-normalized results, positive or negative offsetting, coding and cryptography require not only the original result in 2s complement format, but also to generate the original result summed with (n)ulp 1, where n is an integer, in the ordinary representation. This paper proposes a new algorithm applying to digit recurrence operations to generate that value on-the-fly.Hooman Nikmehr and Cheng-Chew Li
A decimal carry-free adder
Recently, decimal arithmetic has become attractive in the financial and commercial world including banking, tax calculation, currency conversion, insurance and accounting. Although computers are still carrying out decimal calculation using software libraries and binary floating-point numbers, it is likely that in the near future, all processors will be equipped with units performing decimal operations directly on decimal operands. One critical building block for some complex decimal operations is the decimal carry-free adder. This paper discusses the mathematical framework of the addition, introduces a new signed-digit format for representing decimal numbers and presents an efficient architectural implementation. Delay estimation analysis shows that the adder offers improved performance over earlier designs.Hooman Nikmehr, Braden Phillips and Cheng-Chew Li
A novel implementation of radix-4 floating-point division/square-root using comparison multiples
First published online in 2008A new implementation for minimally redundant radix-4 floating-point SRT div/sqrt (division/square-root) with the recurrence in the signed-digit format is introduced. The implementation is developed based on the comparison multiples idea. In the proposed approach, the magnitude of the quotient (root) digit is calculated by comparing the truncated partial remainder with 2 limited precision multiples of the divisor (partial root). The digit sign is determined by investigating the polarity of the truncated partial remainder. A timing evaluation using the logical synthesis (Synopsys DC with Artisan 0.18 μm typical library) shows a latency of 2.5 ns for the recurrence of the proposed div/sqrt. This is less than of the conventional implementation. © 2008 Elsevier Ltd. All rights reserved.H. Nikmehr, B. Phillips, C.C. Li