Design issue and Synthesis problem

K

kb33

Guest
Hi,

I have been struggling with this bit of code for sometime, trying to
find the best way to write and synthesize it without warnings...but
have not been able to achieve the desired results. Though the module
is long, the main problem/warnings occurs in the attached code - it is
part of a CASE statement, and when the case `SORT_DATA is encountered,
the stored data in the arrays has to be sorted. My specific queries
are the following:

1. Strangely, the first always block in the Combinational logic
synthesizes, but the second one doesn't, even though the two are
supposed to react to the same set of conditions. The synthesis tool
just keeps running infinitely..

reg [`ATTRIB_SZ-1:0] attr_arr [0: `NUM_RSRC-1],
attr_arr_comb [0: `NUM_RSRC-1];
reg [`RSRC_ID_SZ-1:0] rsrc_id_arr [0: `NUM_RSRC-1],
rsrc_id_arr_comb [0: `NUM_RSRC-1];

//FLip-flop declaration....
always @(posedge clk)
begin
for (i = 0; i < `NUM_RSRC; i = i+1)
begin
attr_arr <= #1 attr_arr_comb ;
rsrc_id_arr <= #1 rsrc_id_arr_comb ;
end
end

//Combinational Logic...
always @*
begin
.
.
.
`SORT_DATA:
for (q=0; q<`NUM_RSRC/2 ; q=q+1)
if (((var_a & var_c_arr[q+b_index]) == 0) && (attr_arr[var_c_arr[q
+b_index]] > attr_arr[var_bxorc_arr[q+b_index]]))
begin
attr_arr_comb[var_c_arr[q+b_index]] <= attr_arr[var_bxorc_arr[q
+b_index]];
attr_arr_comb[var_bxorc_arr[q+b_index]] <= attr_arr[var_c_arr[q
+b_index]];
end

else if (((var_a & var_c_arr[q+b_index]) != 0) &&
(attr_arr[var_c_arr[q+b_index]] < attr_arr[var_bxorc_arr[q+b_index]]))
begin
attr_arr_comb[var_c_arr[q+b_index]] <= attr_arr[var_bxorc_arr[q
+b_index]];
attr_arr_comb[var_bxorc_arr[q+b_index]] <= attr_arr[var_c_arr[q
+b_index]];
end

else
begin
attr_arr_comb[var_c_arr[q+b_index]] <= attr_arr[var_c_arr[q
+b_index]];
attr_arr_comb[var_bxorc_arr[q+b_index]] <= attr_arr[var_bxorc_arr[q
+b_index]];
end
.
.
.
end

//This doesn't synthesize....
always @*
begin
.
.
.
`SORT_DATA:
for (n=0; n<`NUM_RSRC/2 ; n=n+1)
if (((var_a & var_c_arr[n+b_index]) == 0) && (attr_arr[var_c_arr[n
+b_index]] > attr_arr[var_bxorc_arr[n+b_index]]))
begin
rsrc_id_arr_comb[var_c_arr[n + b_index]] <=
rsrc_id_arr[var_bxorc_arr[n + b_index]];
rsrc_id_arr_comb[var_bxorc_arr[n + b_index]] <=
rsrc_id_arr[var_c_arr[n + b_index]];
end

else if (((var_a & var_c_arr[n+b_index]) != 0) &&
(attr_arr[var_c_arr[n+b_index]] < attr_arr[var_bxorc_arr[n+b_index]]))
begin
rsrc_id_arr_comb[var_c_arr[n + b_index]] <=
rsrc_id_arr[var_bxorc_arr[n + b_index]];
rsrc_id_arr_comb[var_bxorc_arr[n + b_index]] <=
rsrc_id_arr[var_c_arr[n + b_index]];
end

else
begin
rsrc_id_arr_comb[var_c_arr[n + b_index]] <= rsrc_id_arr[var_c_arr[n
+ b_index]];
rsrc_id_arr_comb[var_bxorc_arr[n + b_index]] <=
rsrc_id_arr[var_bxorc_arr[n + b_index]];
end
.
.
.
end


2. As is clear from the above code, I am using a complex scheme of
referencing other arrays and variables in order to determine the swap
condition. Though this gives me the correct simulation results, I am
not sure if this is a good way to do it. BTW, the values in the
var_c_arr and var_bxorc_arr arrays are static, they are set at the
time of initialization. The variable b_index changes, though.

3. I figured out that when arrays are created structurally instead of
behavior models, the synthesis warnings about the array element not
being in the sensitivity list are eliminated. Hence I try to use the
GENERATE statement to create arrays as often as I can. However, in
this case, the array has to be written into, and then later, the
stored values have to be swapped/sorted....I cannot figure out how to
do them both using a structural model. The behavioral model, of
course, is generating multiple synthesis warnings.

Thanks
kb33
 
On Wed, 9 Apr 2008 11:34:02 -0700 (PDT), kb33 wrote:

I have been struggling with this bit of code for sometime
I struggled with it for a while too.

I don't know how big the number NUM_RSRC really is.
Presumably you are aware that you are asking the
synthesis tool to create a bunch of memories, all
of whose elements can be accessed in a single
clock cycle? That sounds hard to me. I think you
will find a scarily large number of scarily large
multiplexers in the synthesised result.

Just speculating here, but.... you're not by any
chance a programmer who's trying to create hardware
without understanding what the logic is going to
look like? Surely not. That would be unworthy.

My first serious reaction to this code:
Two-level array indirection, such as
a[b[c]]
is unlikely to yield sensible logic unless the
range of both 'c' and 'b[c]' is rather small -
like, small as in "the number of fingers on one hand".

I think you need to tell us more about what
you're trying to do.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
Jonathan Bromley wrote:
On Wed, 9 Apr 2008 11:34:02 -0700 (PDT), kb33 wrote:

I have been struggling with this bit of code for sometime

I struggled with it for a while too.

I don't know how big the number NUM_RSRC really is.
Presumably you are aware that you are asking the
synthesis tool to create a bunch of memories, all
of whose elements can be accessed in a single
clock cycle? That sounds hard to me. I think you
will find a scarily large number of scarily large
multiplexers in the synthesised result.

Just speculating here, but.... you're not by any
chance a programmer who's trying to create hardware
without understanding what the logic is going to
look like? Surely not. That would be unworthy.

My first serious reaction to this code:
Two-level array indirection, such as
a[b[c]]
is unlikely to yield sensible logic unless the
range of both 'c' and 'b[c]' is rather small -
like, small as in "the number of fingers on one hand".

I think you need to tell us more about what
you're trying to do.
It was my impression that this was a software algorithm that had been
directly converted to HDL. The 'for' loops will be indexed across
silicon, not time, and if NUM_RSRC is big, so are the silicon
requirements. My advice is always: "Draw an RTL block diagram of the
circuit. If you can't draw it, the synthesizer can't either." -Kevin
 
Jonathan Bromley wrote:
(snip)

I don't know how big the number NUM_RSRC really is.
Presumably you are aware that you are asking the
synthesis tool to create a bunch of memories, all
of whose elements can be accessed in a single
clock cycle? That sounds hard to me. I think you
will find a scarily large number of scarily large
multiplexers in the synthesised result.
In FPGAs it isn't hard to get a large number of
medium sized memories. If you can live with synchronous
memories, the block RAMs in many FPGAs are even bigger.

Just speculating here, but.... you're not by any
chance a programmer who's trying to create hardware
without understanding what the logic is going to
look like? Surely not. That would be unworthy.
That does happen somewhat more often then it should.
Especially people who try to use C as a hardware
description language using sequential algorithms.

My first serious reaction to this code:
Two-level array indirection, such as
a[b[c]]
is unlikely to yield sensible logic unless the
range of both 'c' and 'b[c]' is rather small -
like, small as in "the number of fingers on one hand".
Probably two hands by now, and three in the near future.

I think you need to tell us more about what
you're trying to do.
This I do agree with.

-- glen
 
On Fri, 11 Apr 2008 00:42:25 -0800,
glen herrmannsfeldt wrote:

Jonathan Bromley wrote:
I don't know how big the number NUM_RSRC really is.
Presumably you are aware that you are asking the
synthesis tool to create a bunch of memories, all
of whose elements can be accessed in a single
clock cycle? That sounds hard to me. I think you
will find a scarily large number of scarily large
multiplexers in the synthesised result.

In FPGAs it isn't hard to get a large number of
medium sized memories.
Yes, but he needs...

memories, all
of whose elements can be accessed in a single
clock cycle?
I didn't phrase that very well - I meant that
the OP's code requires parallel access to the
entire contents of the whole memory in a single
clock cycle, not that each individual element
requires single-cycle access.

The way the OP had coded his logic would require
all the storage to be in fabric flipflops and
all the addressing to be done by multiplexers,
unless I'm badly misunderstanding something.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
On Fri, 11 Apr 2008 08:46:21 -0700 (PDT), kb33
<kanchan.devarakonda@gmail.com> wrote:

I am writing the code for a bitonic sorter.
Ahah!

I guess it's not a big deal if one is not too finicky about
the number of cycles it takes
Sure; but a bitonic sort maps nicely on to hardware for
single-cycle (or, at least, pipelined) execution, as you
have correctly identified.

[snip explanation]

I find it a bit strange that the synthesis tool (Synplicity) would
have a problem resolving something like a[b[c]] . Isn't it a common
thing that programmers do?
Yes, but it's NOT a common thing that hardware designers do, because
it represents a lot of switching-around (multiplexing) of values
in zero time.

Think about it this way: In software, you first compute c; then
you use c (or something like it) as an address, to extract b[c];
then you use that value as an address, to extract a[b[c]]. At
each step you only access just one memory location. BUT in
hardware... you're asking for a[b[c]] to be done in zero
simulated time, or - from a synthesis point of view - in the
time between two clocks. The first clock sets up the value
of c. That must be used as an address, the data b[c] extracted,
used as an address into a[], and a[b[c]] extracted... all in
one clock period, asynchronously. I don't know of any modern
FPGAs whose RAM blocks work in this manner; they all have at
least one clock cycle of latency from address to data out.
So the memory must be implemented as "distributed RAM", i.e.
a mass of flipflops and multiplexers. If you rewrite your
code so that each memory lookup is associated with a clock
cycle, you're in with a better chance of getting a compact
RAM-based implementation.

It is entirely possible to implement a bitonic sort in purely
combinational logic. You start by building a 2-in, 2-out sorter
module (that's obviously quite easy) and then you cascade the
things in layers. This bit of ASCII-art is the output of a
program I wrote years ago, implementing a bitonic sort on
eight input values:

|-|-|-|-|-|-|-| // 8 values come in from the top
O=======O | | | // O===O represents a 2-in 2-out sorter;
| O=======O | | // it swaps the two values if necessary
| | O=======O | // to put them into left-to-right order
| | | O=======O
|-|-|-|-|-|-|-| // ok, that's one layer of logic done...
O===O | | | | |
| O===O O===O |
| | | | | O===O
|-|-|-|-|-|-|-| // that layer had 4 sorter blocks in it
| | O===O | | |
| | | O===O | |
|-|-|-|-|-|-|-| // that layer had only 2 sorter blocks
O=O O=O O=O O=O
|-|-|-|-|-|-|-|
| O=====O | | |
| | | O=====O |
|-|-|-|-|-|-|-|
| O=O O=O O=O |
|-|-|-|-|-|-|-| // and now the 8 data points are sorted.

Other than implementing bitonic sort in this manner, I am at a loss as
to how I can speed up things.
I hope the diagram above shows how it might be done. If you imagine
a set of hardware registers at each of the |-|-|-|-| dividers, you
have a pipelined implementation that will take 6 clock cycles from
first data in to first data out, but thenceforward will yield a new
sorted result on each clock. If you don't put in the registers,
you will get a result in just one cycle; but those cycles will
need to be rather slow ones, to account for the propagation
delay through six swap modules.

Writing a suitable "generate" block to implement the slightly
weird layout of a bitonic sort is not trivial, but can be done.
As usual when you're fighting the hardware speed/space tradeoff,
some cunning is required!

One final thought: In hardware, it is sometimes better to use
cruder sorting mechanisms. Particularly if you're sorting
a relatively small number of values, a simple insertion sort
can work well. It all depends on the scale of your problem.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
On Apr 11, 6:10 am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
On Fri, 11 Apr 2008 00:42:25 -0800,

glen herrmannsfeldt wrote:
Jonathan Bromley wrote:
I don't know how big the number NUM_RSRC really is.
Presumably you are aware that you are asking the
synthesis tool to create a bunch of memories, all
of whose elements can be accessed in a single
clock cycle? That sounds hard to me. I think you
will find a scarily large number of scarily large
multiplexers in the synthesised result.

In FPGAs it isn't hard to get a large number of
medium sized memories.

Yes, but he needs...

memories, all
of whose elements can be accessed in a single
clock cycle?

I didn't phrase that very well - I meant that
the OP's code requires parallel access to the
entire contents of the whole memory in a single
clock cycle, not that each individual element
requires single-cycle access.

The way the OP had coded his logic would require
all the storage to be in fabric flipflops and
all the addressing to be done by multiplexers,
unless I'm badly misunderstanding something.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

I appreciate the effort you guys are putting into my code. SO let me
be more explicit about the issue. I am writing the code for a bitonic
sorter. I guess it's not a big deal if one is not too finicky about
the number of cycles it takes to sort an array of values, but I am
trying to do so in as few cycles as possible. And that's where the
problem arises. The two arrays var_c_arr and var_bxorc_arr are the
ones that have static values in them (because as we know, the bitonic
sort algorithm compares values in fixed locations in each cycle). The
attr_arr is the one storing the values to be sorted. Now as far as the
size of NUM_RSRC is concerned, I am currently working with 8. I don't
plan to go too high, because then the size of the var_c_arr and
var_bxorc_arr also goes up too much. I have given below the list of
values as they change (for NUM_RSRC = 8)

var_a var_c var_bxorc b_index q (or n)
2 0 1 0 0
2 2 3 0 1
2 4 5 0 2
2 6 7 0 3

4 0 2 4 0
4 1 3 4 1
4 4 6 4 2
4 5 7 4 3

4 0 1 8 0
4 2 3 8 1
4 4 5 8 2
4 6 7 8 3

8 0 4 12 0
8 1 5 12 1
8 2 6 12 2
8 3 7 12 3

8 0 2 16 0
8 1 3 16 1
8 4 6 16 2
8 5 7 16 3

8 0 1 20 0
8 2 3 20 1
8 4 5 20 2
8 6 7 20 3

I find it a bit strange that the synthesis tool (Synplicity) would
have a problem resolving something like a[b[c]] . Isn't it a common
thing that programmers do?
Other than implementing bitonic sort in this manner, I am at a loss as
to how I can speed up things.

kb33
 
Jonathan Bromley wrote:
(snip)

Think about it this way: In software, you first compute c; then
you use c (or something like it) as an address, to extract b[c];
then you use that value as an address, to extract a[b[c]]. At
each step you only access just one memory location. BUT in
hardware... you're asking for a[b[c]] to be done in zero
simulated time, or - from a synthesis point of view - in the
time between two clocks.
Yes, the time between two clocks. It is the up to the designer
to decide if that is needed or not. If it is...

The first clock sets up the value
of c. That must be used as an address, the data b[c] extracted,
used as an address into a[], and a[b[c]] extracted... all in
one clock period, asynchronously. I don't know of any modern
FPGAs whose RAM blocks work in this manner; they all have at
least one clock cycle of latency from address to data out.
Older FPGA technology would implement a 32 word memory in
CLBs without any added multiplexers. If you need it then
you use it.

So the memory must be implemented as "distributed RAM", i.e.
a mass of flipflops and multiplexers. If you rewrite your
code so that each memory lookup is associated with a clock
cycle, you're in with a better chance of getting a compact
RAM-based implementation.
In addition to RAMs, many logic systems can benefit from
pipelining. This is especially true in FPGAs where the
FF's are there whether you use them or not.

If one needs speed then pipelining is often the way to go,
and all designers should know how to do it. Even so, it
is sometimes necessary to do nested logic functions without
a pipeline stage.

-- glen
 
kb33 wrote:
(snip)

I appreciate the effort you guys are putting into my code. SO let me
be more explicit about the issue. I am writing the code for a bitonic
sorter. I guess it's not a big deal if one is not too finicky about
the number of cycles it takes to sort an array of values, but I am
trying to do so in as few cycles as possible. And that's where the
problem arises.
What some are trying to say is that most people want something done
as fast as possible, not necessarily in as few clock cycles as possible.
In that case, more registers allows for more pipelining and overall
faster operation. There are assumptions in that statement that aren't
true for all problems.

-- glen
 
Hi,

I am able to get the sorted output in 6 clock cycles by using bitonic
sort on 8 values. My problem, I guess, was that I trying to make the
design too scalable and in that process, adding many arrays and thus
making the synthesis complicated. I just tried a crude approach (not
sure if you guys would approve of it!) by having different pieces of
code being plugged in (using `ifdef) for different number of values to
be sorted - 8, 16, 32 (my application may not require sorting more
than this for now). However, I am still getting some latches in the
design - the synthesis report says due to missing if or case
assignments, but I cannot find any missing assignments. The code (for
sorting 8 values) is below. The four segments behave identically,
except for the values of the operands:

//attr_arr_comb ...
always @*
begin
if (~reset_n)
for(j=0; j < `NUM_VAL; j=j+1)
attr_arr_comb [j] <= 0;

else case (fsm_state)
`READ_VALUE:
for(j=0; j < `NUM_VAL; j=j+1)
if (j == nodeid_in)
attr_arr_comb[j] <= input_val;
else
attr_arr_comb[j] <= attr_arr[j];

`SORT_DATA:
begin
//1 out of 4 code segment...
if (~swap_vector[b_index] && (attr_arr[val_c_0] >
attr_arr[val_bxorc_0]))
begin
attr_arr_comb[val_c_0] <= attr_arr[val_bxorc_0];
attr_arr_comb[val_bxorc_0] <= attr_arr[val_c_0];
end
else if (swap_vector[b_index] && (attr_arr[val_c_0] <
attr_arr[val_bxorc_0]))
begin
attr_arr_comb[val_c_0] <= attr_arr[val_bxorc_0];
attr_arr_comb[val_bxorc_0] <= attr_arr[val_c_0];
end
else
begin
attr_arr_comb[val_c_0] <= attr_arr[val_c_0];
attr_arr_comb[val_bxorc_0] <= attr_arr[val_bxorc_0];
end // else: !if(swap_vector[b_index] && (attr_arr[val_c_0] <
attr_arr[val_bxorc_0]))


//2 out of 4 code segment...
if (~swap_vector[b_index + 1] && (attr_arr[val_c_1] >
attr_arr[val_bxorc_1]))
begin
attr_arr_comb[val_c_1] <= attr_arr[val_bxorc_1];
attr_arr_comb[val_bxorc_1] <= attr_arr[val_c_1];
end
else if (swap_vector[b_index + 1] && (attr_arr[val_c_1] <
attr_arr[val_bxorc_1]))
begin
attr_arr_comb[val_c_1] <= attr_arr[val_bxorc_1];
attr_arr_comb[val_bxorc_1] <= attr_arr[val_c_1];
end
else
begin
attr_arr_comb[val_c_1] <= attr_arr[val_c_1];
attr_arr_comb[val_bxorc_1] <= attr_arr[val_bxorc_1];
end // else: !if(swap_vector[b_index + 1] &&
(attr_arr[val_c_1] < attr_arr[val_bxorc_1]))


//3 out of 4 code segment...
if (~swap_vector[b_index + 2] && (attr_arr[val_c_2] >
attr_arr[val_bxorc_2]))
begin
attr_arr_comb[val_c_2] <= attr_arr[val_bxorc_2];
attr_arr_comb[val_bxorc_2] <= attr_arr[val_c_2];
end
else if (swap_vector[b_index + 2] && (attr_arr[val_c_2] <
attr_arr[val_bxorc_2]))
begin
attr_arr_comb[val_c_2] <= attr_arr[val_bxorc_2];
attr_arr_comb[val_bxorc_2] <= attr_arr[val_c_2];
end
else
begin
attr_arr_comb[val_c_2] <= attr_arr[val_c_2];
attr_arr_comb[val_bxorc_2] <= attr_arr[val_bxorc_2];
end // else: !if(swap_vector[b_index + 2] &&
(attr_arr[val_c_2] < attr_arr[val_bxorc_2]))


//4 out of 4 code segment...
if (~swap_vector[b_index + 3] && (attr_arr[val_c_3] >
attr_arr[val_bxorc_3]))
begin
attr_arr_comb[val_c_3] <= attr_arr[val_bxorc_3];
attr_arr_comb[val_bxorc_3] <= attr_arr[val_c_3];
end
else if (swap_vector[b_index + 3] && (attr_arr[val_c_3] <
attr_arr[val_bxorc_3]))
begin
attr_arr_comb[val_c_3] <= attr_arr[val_bxorc_3];
attr_arr_comb[val_bxorc_3] <= attr_arr[val_c_3];
end
else
begin
attr_arr_comb[val_c_3] <= attr_arr[val_c_3];
attr_arr_comb[val_bxorc_3] <= attr_arr[val_bxorc_3];
end // else: !if(swap_vector[b_index + 3] &&
(attr_arr[val_c_3] < attr_arr[val_bxorc_3]))
end // case: `SORT_DATA


`OUTPUT_DATA:
for(j=0; j < `NUM_VAL; j=j+1)
attr_arr_comb[j] <= attr_arr[j];

default:
for(j=0; j < `NUM_VAL; j=j+1)
attr_arr_comb[j] <= 0;

endcase
end // always @ *

The warnings that I am getting from Synplicity are:
@A: CL110 :"C:\Verilog\bsort_arrays.v":125:8:125:9|Too many clocks (>
8) for set/reset analysis of attr_arr_comb_7_, try moving enabling
expressions outside always block
@W: CL118 :"C:\Verilog\bsort_arrays.v":125:8:125:9|Latch generated
from always block for signal attr_arr_comb_7_[3:0], probably caused by
a missing assignment in an if or case stmt

There is, of course, such a warning for each of the stored values that
are being sorted.

kb33
 
kb33 wrote:

I am able to get the sorted output in 6 clock cycles by using bitonic
sort on 8 values. My problem, I guess, was that I trying to make the
design too scalable and in that process, adding many arrays and thus
making the synthesis complicated. I just tried a crude approach (not
sure if you guys would approve of it!) by having different pieces of
code being plugged in (using `ifdef) for different number of values to
be sorted - 8, 16, 32 (my application may not require sorting more
than this for now). However, I am still getting some latches in the
design - the synthesis report says due to missing if or case
assignments, but I cannot find any missing assignments. The code (for
sorting 8 values) is below.
(snip)

I don't see them either, but it is hard to say that they aren't there.

It would help to know val_c_* and val_bxorc_*, though.

If at any time there is an if without else, or case where not all
possible values are given then a latch is generated.

if x then y<=z+1;
else y<=z-1;

generates a multiplexer (and two adders).

if x then y<=z+1;

generates a latch and one adder.

I usually try to use continuous assignment statements where
possible which tends to avoid this problem.

It might work as a systolic array, which might make the structure
easier to see and extend.

-- glen
 
kb33 wrote:
Hi,

I am able to get the sorted output in 6 clock cycles by using bitonic
sort on 8 values. My problem, I guess, was that I trying to make the
design too scalable and in that process, adding many arrays and thus
making the synthesis complicated. I just tried a crude approach (not
sure if you guys would approve of it!) by having different pieces of
code being plugged in (using `ifdef) for different number of values to
be sorted - 8, 16, 32 (my application may not require sorting more
than this for now). However, I am still getting some latches in the
design - the synthesis report says due to missing if or case
assignments, but I cannot find any missing assignments.
This kind of thing can also be done with beautiful code :) that
synthesizes perfectly. However, the proper coding technique
is structural recursion - don't know how feasible / difficult
that is in Verilog.

Jan

--
Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
Kaboutermansstraat 97, B-3000 Leuven, Belgium
From Python to silicon:
http://myhdl.jandecaluwe.com
 
On Apr 15, 2:18 pm, Jan Decaluwe <j...@jandecaluwe.com> wrote:
This kind of thing can also be done with beautiful code :) that
synthesizes perfectly. However, the proper coding technique
is structural recursion - don't know how feasible / difficult
that is in Verilog.
Verilog does allow recursive module instantiation, with conditional
generates used to terminate the recursion. Is that adequate?
 
sharp@cadence.com wrote:
On Apr 15, 2:18 pm, Jan Decaluwe <j...@jandecaluwe.com> wrote:

This kind of thing can also be done with beautiful code :) that
synthesizes perfectly. However, the proper coding technique
is structural recursion - don't know how feasible / difficult
that is in Verilog.


Verilog does allow recursive module instantiation, with conditional
generates used to terminate the recursion. Is that adequate?
Yes, and I suggest to the OP to look at it from that angle.
Start with from an algorithmic description of bitonic sort
(presumably recursive) and map that to a dataflow structure
that keeps the recursion.

Jan

--
Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
Kaboutermansstraat 97, B-3000 Leuven, Belgium
From Python to silicon:
http://myhdl.jandecaluwe.com
 

Welcome to EDABoard.com

Sponsor

Back
Top