6,829 research outputs found
Least-biased correction of extended dynamical systems using observational data
We consider dynamical systems evolving near an equilibrium statistical state
where the interest is in modelling long term behavior that is consistent with
thermodynamic constraints. We adjust the distribution using an
entropy-optimizing formulation that can be computed on-the- fly, making
possible partial corrections using incomplete information, for example measured
data or data computed from a different model (or the same model at a different
scale). We employ a thermostatting technique to sample the target distribution
with the aim of capturing relavant statistical features while introducing mild
dynamical perturbation (thermostats). The method is tested for a point vortex
fluid model on the sphere, and we demonstrate both convergence of equilibrium
quantities and the ability of the formulation to balance stationary and
transient- regime errors.Comment: 27 page
Symbol Synchronization for SDR Using a Polyphase Filterbank Based on an FPGA
This paper is devoted to the proposal of a highly efficient symbol synchronization subsystem for Software Defined Radio. The proposed feedback phase-locked loop timing synchronizer is suitable for parallel implementation on an FPGA. The polyphase FIR filter simultaneously performs matched-filtering and arbitrary interpolation between acquired samples. Determination of the proper sampling instant is achieved by selecting a suitable polyphase filterbank using a derived index. This index is determined based on the output either the Zero-Crossing or Gardner Timing Error Detector. The paper will extensively focus on simulation of the proposed synchronization system. On the basis of this simulation, a complete, fully pipelined VHDL description model is created. This model is composed of a fully parallel polyphase filterbank based on distributed arithmetic, timing error detector and interpolation control block. Finally, RTL synthesis on an Altera Cyclone IV FPGA is presented and resource utilization in comparison with a conventional model is analyzed
Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations
Although double-precision floating-point arithmetic currently dominates
high-performance computing, there is increasing interest in smaller and simpler
arithmetic types. The main reasons are potential improvements in energy
efficiency and memory footprint and bandwidth. However, simply switching to
lower-precision types typically results in increased numerical errors. We
investigate approaches to improving the accuracy of reduced-precision
fixed-point arithmetic types, using examples in an important domain for
numerical computation in neuroscience: the solution of Ordinary Differential
Equations (ODEs). The Izhikevich neuron model is used to demonstrate that
rounding has an important role in producing accurate spike timings from
explicit ODE solution algorithms. In particular, fixed-point arithmetic with
stochastic rounding consistently results in smaller errors compared to single
precision floating-point and fixed-point arithmetic with round-to-nearest
across a range of neuron behaviours and ODE solvers. A computationally much
cheaper alternative is also investigated, inspired by the concept of dither
that is a widely understood mechanism for providing resolution below the least
significant bit (LSB) in digital signal processing. These results will have
implications for the solution of ODEs in other subject areas, and should also
be directly relevant to the huge range of practical problems that are
represented by Partial Differential Equations (PDEs).Comment: Submitted to Philosophical Transactions of the Royal Society
Power-efficient design of 16-bit mixed-operand multipliers
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 53).Multiplication is an expensive and slow arithmetic operation, which plays an important role in many DSP algorithms. It usually lies in the critical-delay paths, having an effect on performance of the system as well as consuming large power. Consequently, significant improvements in both power and performance can be achieved in the overall DSP system by carefully designing and optimizing power and performance of the multiplier. This thesis explores several circuit-level techniques for power-efficiently designing multipliers, including supply voltage reduction, efficient multiplication algorithms, low power circuit logic styles, and transistor sizing using dynamic and static tuners. Based on these techniques, several 16-bit multipliers have been successfully designed and implemented in 0.13[micro]m CMOS technology at the supply voltage of 1.5V and 0.9V. The multipliers are modified to handle multiplications of two 16-bit operands in which each can be either signed magnitude or two's complement formats. Examining power-performance characteristics of these multipliers reveals that both array and tree structures are feasible solutions for designing 16-bit multipliers, and complementary CMOS and single-ended CPL-TG logics are promising candidates for power-efficient design. The appropriate choices of structures and logic styles depend on power and performance constraints of the particular design.by Sataporn Pornpromlikit.M.Eng
Diffuse interface models of locally inextensible vesicles in a viscous fluid
We present a new diffuse interface model for the dynamics of inextensible
vesicles in a viscous fluid. A new feature of this work is the implementation
of the local inextensibility condition in the diffuse interface context. Local
inextensibility is enforced by using a local Lagrange multiplier, which
provides the necessary tension force at the interface. To solve for the local
Lagrange multiplier, we introduce a new equation whose solution essentially
provides a harmonic extension of the local Lagrange multiplier off the
interface while maintaining the local inextensibility constraint near the
interface. To make the method more robust, we develop a local relaxation scheme
that dynamically corrects local stretching/compression errors thereby
preventing their accumulation. Asymptotic analysis is presented that shows that
our new system converges to a relaxed version of the inextensible sharp
interface model. This is also verified numerically. Although the model does not
depend on dimension, we present numerical simulations only in 2D. To solve the
2D equations numerically, we develop an efficient algorithm combining an
operator splitting approach with adaptive finite elements where the
Navier-Stokes equations are implicitly coupled to the diffuse interface
inextensibility equation. Numerical simulations of a single vesicle in a shear
flow at different Reynolds numbers demonstrate that errors in enforcing local
inextensibility may accumulate and lead to large differences in the dynamics in
the tumbling regime and differences in the inclination angle of vesicles in the
tank-treading regime. The local relaxation algorithm is shown to effectively
prevent this accumulation by driving the system back to its equilibrium state
when errors in local inextensibility arise.Comment: 25 page
On the initial estimate of interface forces in FETI methods
The Balanced Domain Decomposition (BDD) method and the Finite Element Tearing
and Interconnecting (FETI) method are two commonly used non-overlapping domain
decomposition methods. Due to strong theoretical and numerical similarities,
these two methods are generally considered as being equivalently efficient.
However, for some particular cases, such as for structures with strong
heterogeneities, FETI requires a large number of iterations to compute the
solution compared to BDD. In this paper, the origin of the bad efficiency of
FETI in these particular cases is traced back to poor initial estimates of the
interface stresses. To improve the estimation of interface forces a novel
strategy for splitting interface forces between neighboring substructures is
proposed. The additional computational cost incurred is not significant. This
yields a new initialization for the FETI method and restores numerical
efficiency which makes FETI comparable to BDD even for problems where FETI was
performing poorly. Various simple test problems are presented to discuss the
efficiency of the proposed strategy and to illustrate the so-obtained numerical
equivalence between the BDD and FETI solvers
Low-Power, Low-Cost, & High-Performance Digital Designs : Multi-bit Signed Multiplier design using 32nm CMOS Technology
Binary multipliers are ubiquitous in digital hardware. Digital multipliers along with the adders play a major role in computing, communicating, and controlling devices. Multipliers are used majorly in the areas of digital signal and image processing, central processing unit (CPU) of the computers, high-performance and parallel scientific computing, machine learning, physical layer design of the communication equipment, etc. The predominant presence and increasing demand for low-power, low-cost, and high-performance digital hardware led to this work of developing optimized multiplier designs. Two optimized designs are proposed in this work. One is an optimized 8 x 8 Booth multiplier architecture which is implemented using 32nm CMOS technology. Synthesis (pre-layout) and post-layout results show that the delay is reduced by 24.7% and 25.6% respectively, the area is reduced by 5.5% and 15% respectively, the power consumption is reduced by 21.5% and 26.6% respectively, and the area-delay-product is reduced by 28.8% and 36.8% respectively when compared to the performance results obtained for the state-of-the-art 8 x 8 Booth multiplier designed using 32nm CMOS technology with 1.05 V supply voltage at 500 MHz input frequency. Another is a novel radix-8 structure with 3-bit grouping to reduce the number of partial products along with the effective partial product reduction schemes for 8 x 8, 16 x 16, 32 x 32, and 64 x 64 signed multipliers. Comparing the performance results of the (synthesized, post-layout) designs of sizes 32 x 32, and 64 x 64 based on the simple novel radix-8 structure with the estimated performance measurements for the optimized Booth multiplier design presented in this work, reduction in delay by (2.64%, 0.47%) and (2.74%, 18.04%) respectively, and reduction in area-delay-product by (12.12%, -5.17%) and (17.82%, 12.91%) respectively can be observed. With the use of the higher radix structure, delay, area, and power consumption can be further reduced. Appropriate adder deployment, further exploring the optimized grouping or compression strategies, and applying more low-power design techniques such as power-gating, multi-Vt MOS transistor utilization, multi-VDD domain creation, etc., help, along with the higher radix structures, realizing the more efficient multiplier designs
- …