pipelined algorithm, flow control

M

mnentwig

Guest
Hi,

one RTL coding style for pipelined processing goes as follows:

- set arguments to function with latency, i.e. memory lookup o
multiplication
- set a trigger bit (/multi-bit word) in a parallel shift register
- when the trigger arrives at the output, continue processing
- cascade multiple stages, i.e. first memory lookup, output trigger
multiplication etc

My question is: Is there any commonly accepted and proven way to code thi
in RTL? I see the above emerging as a "red thread" in my own code, mayb
there is some "RTL design patterns" or 10-volume "The Art Of RTL Coding
that would discuss such ideas?


---------------------------------------
Posted through http://www.FPGARelated.com
 
In article <pvedndIrNdTRIW7PnZ2dnUVZ_sudnZ2d@giganews.com>,
mnentwig <24789@embeddedrelated> wrote:
Hi,

one RTL coding style for pipelined processing goes as follows:

- set arguments to function with latency, i.e. memory lookup or
multiplication
- set a trigger bit (/multi-bit word) in a parallel shift register
- when the trigger arrives at the output, continue processing
- cascade multiple stages, i.e. first memory lookup, output triggers
multiplication etc

My question is: Is there any commonly accepted and proven way to code this
in RTL? I see the above emerging as a "red thread" in my own code, maybe
there is some "RTL design patterns" or 10-volume "The Art Of RTL Coding"
that would discuss such ideas?

I'm wondering how others handle this too. I've done lots of pipelined designs,
but don't have a consistent design style with regard to these types of things.
I've used spreadsheets (with "time") along one of the axis, and state along the
other. Block diagrams, diagraming state values along side the registers, and
other haphazard strategies.

I don't usually code an explict handshake - after all the pipeline delays
are fixed in the end. Calculating the "fixed" value can sometimes be
tricky. But in the end you end with a fixed delay between "data in valid" and
"data out valid". (This may vary, for instance with a parameter number of
stages, but is still "fixed" in the end). You do have to be careful with
matching latencies for stuff coming together, which is again tricky, but
fixed.

So often I lay down the "datapath" with the "din_valid" -> "dout_valid" delay
just set along side in a SRL, with a tuneable depth.

I often struggle with the "definition" of registers with respect to
which register is just a "pipeline" register, and which are actual
"Z-1" delays of the actual filter you're trying to design. There's some kind
of trick here that I know I'm just missing. (I'm usually designing with
a non-systolic clock - i.e. my processing clock has no relation to
my sampling "clock").

So, not much advice here, just noting that I see the same issues....

Regards,

Mark
 
Hi Mark,

thanks for the comments.
I did one design twice: Once with a hard-coded 15-state FSM, the other on
with shift registers. The first one is more readable, the second on
slightly smaller. But this may be because I usually end up with mos
registers unused, while LUTs are the bottleneck.

Keeping multiple samples "in flight" for the hardware multiplier is so muc
work that I'm looking into bit-serial multipliers, maybe that's mor
rapid-prototyping-friendly for audio.

Cheers

Markus

---------------------------------------
Posted through http://www.FPGARelated.com
 

Welcome to EDABoard.com

Sponsor

Back
Top