M
Michael Spencer
Guest
Hello,
Has anyone compared FPGA implementations of full-rate digital FIR filters based on the use of Multiplier Blocks vs. traditional FIRs with constant coefficient multipliers? By full rate, I mean: one output result per clock cycle and no interpolation or decimation.
For anyone not familiar, a multiplier block is a network of shifters and adders that performs multiplications by several coefficients efficiently by exploiting common sub-expressions. The multiplier block can be exploited in FIR filters by transposing the standard filter so that the products of all the coefficients with the current input-sample are required simultaneously.
Also, by representing the coefficients in the Canonical-Signed-Digit number system (a small number of +1 and -1's) along common sub-expression sharing the multiplier block can get even smaller.
For example, the multiplier block for a 100 tap FIR filter (fp=0.10 and fs=0.12) can be realized with only 61 adds (zero explicit multiplications). See filter example #4 in "FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders," http://ics.kaist.ac.kr/~dk/papers/TCAD2001.pdf
If the adder depth is constrained to a maximum of four, then the authors' algorithm can do the multiplier block in 69 additions.
It would seem that this approach would be very efficient in a target such as the Xilinx Spartan-IIE (with no dedicated multipliers).
Another question: If we only need one result per K clock periods (K ~= 1000 for audio applications), could a multiplier block approach realized with, say, bit-serial addition be more efficient than some other approach such as distributed arithmetic?
Comments welcome. Thanks.
-Michael
______________________
Michael E. Spencer, Ph.D.
President
Signal Processing Solutions, Inc.
Web: http://www.spsolutions.com
Has anyone compared FPGA implementations of full-rate digital FIR filters based on the use of Multiplier Blocks vs. traditional FIRs with constant coefficient multipliers? By full rate, I mean: one output result per clock cycle and no interpolation or decimation.
For anyone not familiar, a multiplier block is a network of shifters and adders that performs multiplications by several coefficients efficiently by exploiting common sub-expressions. The multiplier block can be exploited in FIR filters by transposing the standard filter so that the products of all the coefficients with the current input-sample are required simultaneously.
Also, by representing the coefficients in the Canonical-Signed-Digit number system (a small number of +1 and -1's) along common sub-expression sharing the multiplier block can get even smaller.
For example, the multiplier block for a 100 tap FIR filter (fp=0.10 and fs=0.12) can be realized with only 61 adds (zero explicit multiplications). See filter example #4 in "FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders," http://ics.kaist.ac.kr/~dk/papers/TCAD2001.pdf
If the adder depth is constrained to a maximum of four, then the authors' algorithm can do the multiplier block in 69 additions.
It would seem that this approach would be very efficient in a target such as the Xilinx Spartan-IIE (with no dedicated multipliers).
Another question: If we only need one result per K clock periods (K ~= 1000 for audio applications), could a multiplier block approach realized with, say, bit-serial addition be more efficient than some other approach such as distributed arithmetic?
Comments welcome. Thanks.
-Michael
______________________
Michael E. Spencer, Ph.D.
President
Signal Processing Solutions, Inc.
Web: http://www.spsolutions.com