K
Kevin Neilson
Guest
Normalisation of FP results also requires a "find first 1" operation.
Again, dedicated hardware is going to be a lot smaller and more
efficient than using LUT's.
Find first 1 can be done using a carry chain which is quite fast. It is
the same function as used in Gray code operations.
It is not something I have looked into, but I'll happily take your word
for it. However, like pretty much /any/ function, it will be smaller
and faster in dedicated hardware than in logic blocks.
I've done it in a Xilinx, and it's not fast. First you have to go across the routing fabric and go through a set of LUTs to get onto the carry chain. The carry chain is pretty fast; getting on and off the carry chain is slow. After you get off the carry chain, you have to go through the general routing fabric again. This is where most of your clock cycle gets eaten up. Remember, if you had dedicated hardware, this would be a dedicated route. Now you get into a second set of LUTs, where you have to AND the data from the carry chain with the original number in order to get a one-hot bus with only the leading 1 set. Now you have to encode that into a number which you can use for your shifter. You may be able to do this with the same set of LUTs; I can't remember.