Qs on HDL library code and pipelining

  • Thread starter Julio Di Egidio
  • Start date
J

Julio Di Egidio

Guest
Hello everybody, and Happy New Year 2018!

I am new to digital design, here are some basic questions:

In particular for algorithm acceleration (e.g. arithmetic, cryptography,
etc.), does it make sense to think both FPGA and ASIC when writing HDL
library code?

And, about pipelining combinational logic, what maximum gate-delay
granularity would be good (for fmax) on FPGAs? I am guessing a 2-gate
delay maximum granularity before introducing a register does not pay off.

And, would similar considerations and a 2-gate delay pipelining be best
for ASICs, too?

(Essentially, I am trying to find guidelines to write reusable HDL, but
at the moment I am not even sure that such a thing in fact makes sense.)

Thanks very much in advance for any insight,

Julio
 
In particular for algorithm acceleration (e.g. arithmetic, cryptography,
etc.), does it make sense to think both FPGA and ASIC when writing HDL
library code?

I've been making cores to target both ASICs and FPGAs and it's difficult to make it completely portable. For one thing, the ASIC clocks (for me) is twice as fast so I have to double the bus widths for the ASIC cores. The path delay in an FPGA can be 80% route time, but in the ASIC it's mostly logic delay. I also have to use specific FPGA primitives which don't exist in the ASIC. One section I constructed out of instantiated DSP48s for the Xilinx, but for the ASIC, I just wrote behavioral code and it synthesizes as a sea of flipflops. (This was a routing-intensive block so it was much easier on the ASIC.)

Portable, parameterizable code is something to strive for, but it's still only somewhat possible with today's tools, and you're going to have to make changes for every target, so you don't want to expend too much effort on it..
 
On Wednesday, January 3, 2018 at 1:48:22 AM UTC+1, Kevin Neilson wrote:
In particular for algorithm acceleration (e.g. arithmetic, cryptography,
etc.), does it make sense to think both FPGA and ASIC when writing HDL
library code?

I've been making cores to target both ASICs and FPGAs and it's difficult to make it completely portable. For one thing, the ASIC clocks (for me) is twice as fast so I have to double the bus widths for the ASIC cores.

You mean because the bus itself stays at the same frequency, right? I have
just started trying to get my head around clock domain crossing and similar..
If you don't mind me taking the chance: why would you double the bus width
in that case, i.e. doesn't that still need a provider that is twice as fast?
(Sorry, I guess I am just missing the particulars of the job involved.)

The path delay in an FPGA can be 80% route time, but in the ASIC it's mostly logic delay. I also have to use specific FPGA primitives which don't exist in the ASIC. One section I constructed out of instantiated DSP48s for the Xilinx, but for the ASIC, I just wrote behavioral code and it synthesizes as a sea of flipflops. (This was a routing-intensive block so it was much easier on the ASIC.)

Portable, parameterizable code is something to strive for, but it's still only somewhat possible with today's tools, and you're going to have to make changes for every target, so you don't want to expend too much effort on it.

OK, and thanks very much for your feedback, Kevin, appreciated.

It's a fine line then...

Julio
 
You mean because the bus itself stays at the same frequency, right? I have
just started trying to get my head around clock domain crossing and similar.
If you don't mind me taking the chance: why would you double the bus width
in that case, i.e. doesn't that still need a provider that is twice as fast?
(Sorry, I guess I am just missing the particulars of the job involved.)

I got that all backwards. I made a core that was to operate on an FPGA and when I ported it to the ASIC I doubled the clock speed and halved the bus width. (Of course I could've used the same clock and bus width but then the gate count would be twice as big as it really needed to be.) One might might think that halving the bus width is a simple matter of changing a parameter, but of course it never works that way in hardware design.

Watch out for clock domain crossings!
 
On Thursday, January 4, 2018 at 8:34:48 PM UTC+1, Kevin Neilson wrote:

> Watch out for clock domain crossings!

Yep, I am carefully following the reference designs I'm finding around! :)
Anyway, I don't see a way to escape the topic even very early in a beginner
course: the simplest top level I am writing has at least 2 clock domains, a
slow "user" domain for user input and output, and a fast "core" domain for
the core logic: and most user inputs I need to bring forward to the core
domain, as control signals, likewise I need to bring outputs from the core
back out to the user, for display/monitoring: which seems to me a very basic
scenario... (Anyway, never mind my beginner's adventures, I understand this
is going to take years, but please tell if I am missing something.)

Julio
 

Welcome to EDABoard.com

Sponsor

Back
Top