This question seems simpler than it actually is...

A

arubin

Guest
Hello All,

Let's start with a simple problem: Given a buffer of "Buf_Depth"
elements (let's say they are unsigned vectors with "Data_Width" number
of bits), find the minimum element in the buffer.

Our first solution is straightforward:

-----------------------
PROCESS
VARIABLE min : UNSIGNED(Data_Width-1 DOWNTO 0);
BEGIN

min := Data_Buf(0);

FOR i IN 1 TO Buf_Depth-1 LOOP
IF (Data_Buf(i) < min) THEN
min := Data_Buf(i)
END IF;
END LOOP;

FinalResult <= min;

END PROCESS;
--------------------------

The nice thing about this solution is that it will work for any value
"Buf_Depth". Everything seems great until we look at the timing
analysis (after synthesis). This design is painfully slow.

To get better performance, we can speed up the design with some
pipelining. Setting "Buf_Depth" = 4, we get:

-----------------------

SIGNAL pipeA, pipeB : UNSIGNED(Data_Width-1 DOWNTO 0);

PROCESS(Clk)
BEGIN
IF (rising_edge(Clk)) THEN

IF (Data_Buf(0) < Data_Buf(1)) THEN
pipeA <= Data_Buf(0);
ELSE
pipeA <= Data_Buf(1);
END IF;

IF (Data_Buf(2) < Data_Buf(3)) THEN
pipeB <= Data_Buf(2);
ELSE
pipeB <= Data_Buf(3);
END IF;

IF (pipeA < pipeB) THEN
FinalResult <= pipeA;
ELSE
FinalResult <= pipeB;
END IF;

END IF;
END PROCESS;

-----------------------

This solution (after synthesis) is very fast, but Buf_Depth is hard
coded to '4' - ie it is not a generic solution.

So here is my question: How can I code up a VHDL block similar to
the pipelined solution that can be instantiated for any value of
Buff_Depth?

Thanks,
Amir
 
arubin wrote:
Hello All,

Let's start with a simple problem: Given a buffer of "Buf_Depth"
elements (let's say they are unsigned vectors with "Data_Width" number
of bits), find the minimum element in the buffer.

Our first solution is straightforward:

-----------------------
PROCESS
VARIABLE min : UNSIGNED(Data_Width-1 DOWNTO 0);
BEGIN

min := Data_Buf(0);

FOR i IN 1 TO Buf_Depth-1 LOOP
IF (Data_Buf(i) < min) THEN
min := Data_Buf(i)
END IF;
END LOOP;

FinalResult <= min;

END PROCESS;
--------------------------

The nice thing about this solution is that it will work for any value
"Buf_Depth". Everything seems great until we look at the timing
analysis (after synthesis). This design is painfully slow.

To get better performance, we can speed up the design with some
pipelining. Setting "Buf_Depth" = 4, we get:

-----------------------

SIGNAL pipeA, pipeB : UNSIGNED(Data_Width-1 DOWNTO 0);

PROCESS(Clk)
BEGIN
IF (rising_edge(Clk)) THEN

IF (Data_Buf(0) < Data_Buf(1)) THEN
pipeA <= Data_Buf(0);
ELSE
pipeA <= Data_Buf(1);
END IF;

IF (Data_Buf(2) < Data_Buf(3)) THEN
pipeB <= Data_Buf(2);
ELSE
pipeB <= Data_Buf(3);
END IF;

IF (pipeA < pipeB) THEN
FinalResult <= pipeA;
ELSE
FinalResult <= pipeB;
END IF;

END IF;
END PROCESS;

-----------------------

This solution (after synthesis) is very fast, but Buf_Depth is hard
coded to '4' - ie it is not a generic solution.

So here is my question: How can I code up a VHDL block similar to
the pipelined solution that can be instantiated for any value of
Buff_Depth?

Thanks,
Amir
This sounds like a good candidate for recursion. I don't know
how you'd code it in VHDL, but the theory would be
to implement a function that would return the minimum
element of an array. Here's some 'c' code that works:

int min(int *v, int n)
{
int l, r, n2;
if(n==1) return *v;
n2 = n/2;
l=min(v, n2);
r=min(v+n2, n-n2);
return l<r ? l : r;
}

Just convert this to an equivalent VHDL function
declaration.

-Dave

--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
 
To refine David's suggestion a bit and to incorporate the pipelining
you want to improve performance you'll probably want to make a
procedure that takes as input both the array that you want to get the
minimum value of along with another integer parameter (a constant, call
it 'N' for now) that defines how many clock cycles you're willing to
wait for this to be calculated.

This procedure would then have 'N' calls to David's minimum value
function each function call working on BufDepth / N elements of the
array. After one clock cycle that procedure would then have an array
of 'N' values which you would then find the minimum of by repeating
this process (as David mentions recursion would be handy here).

Obviously a few of those sticky details would need to be worked out but
that would be the basic idea.

KJ
 
KJ wrote:
To refine David's suggestion a bit and to incorporate the pipelining
you want to improve performance you'll probably want to make a
procedure that takes as input both the array that you want to get the
minimum value of along with another integer parameter (a constant, call
it 'N' for now) that defines how many clock cycles you're willing to
wait for this to be calculated.

This procedure would then have 'N' calls to David's minimum value
function each function call working on BufDepth / N elements of the
array. After one clock cycle that procedure would then have an array
of 'N' values which you would then find the minimum of by repeating
this process (as David mentions recursion would be handy here).

Obviously a few of those sticky details would need to be worked out but
that would be the basic idea.
I think all of you are going about this the wrong way (IMHO). First, I
find a lot of logic designers forget what HDL stands for... Hardware
Description Language. Instead of designing their logic to suit both
the behavior required as well as the practical requirements of the
problem, they often start coding the problem as if they were writing
software. The result is... well I don't know what the result is and
that is the point, neither do they! Of course if you have coded a
problem in the past in a similar style you can anticipate what the
compiler will do with your code. But if you have not written this sort
of code before, you won't know!

I design my logic without writing a line of HDL. Then once I am happy
with the approach, I start writing HDL to "describe" the logic I want.
Every HDL tool I have seen in the last 10 years has provided templates
for HDL that the tool understands and explains how it will map to
hardware. If you use those templates to describe your hardware, you
will get what you have designed and you will understand your result
completely.

The difference between the two approaches above is that one chains N-1
compares in a serial fashion and the other seems to be oriented in a
tree structure. I actually have no idea exactly how the logic will
work out for the serial coding because the compares are all done
against a variable rather than between the input elements. I have
never coded in this style so I can't say how the synthesis will
translate this.

The tree solution can be designed using Generate statements and a few
basic calculations to figure out the number of levels (log of Buf_Depth
with upward rounding) and the number of compares in the first level
(difference between 2 to the power of the number of levels and
Buf_Depth). If this is hard to think of just consider that you are
describing a logarithmic triangle, so to speak. This should be a
simple geometry problem.

BTW, this type of solution is not likely to be very practical for a
buffer of any size since it will consume O(N) hardware resources.
 
rickman wrote:
KJ wrote:
To refine David's suggestion a bit and to incorporate the pipelining
you want to improve performance you'll probably want to make a
procedure that takes as input both the array that you want to get the
minimum value of along with another integer parameter (a constant, call
it 'N' for now) that defines how many clock cycles you're willing to
wait for this to be calculated.

This procedure would then have 'N' calls to David's minimum value
function each function call working on BufDepth / N elements of the
array. After one clock cycle that procedure would then have an array
of 'N' values which you would then find the minimum of by repeating
this process (as David mentions recursion would be handy here).

Obviously a few of those sticky details would need to be worked out but
that would be the basic idea.

I think all of you are going about this the wrong way (IMHO).
I don't...in my opinion ;)

First, I
find a lot of logic designers forget what HDL stands for... Hardware
Description Language.
Yep...and the 'V' in VHDL stands for 'VHSIC' which stands for Very High
Speed Integrated Circuit....not sure what you were getting at, but this
just as relevant.

Instead of designing their logic to suit both
the behavior required as well as the practical requirements of the
problem, they often start coding the problem as if they were writing
software.
Again, kind of missing the relevance to the original post?

The result is... well I don't know what the result is and
that is the point, neither do they! Of course if you have coded a
problem in the past in a similar style you can anticipate what the
compiler will do with your code. But if you have not written this sort
of code before, you won't know!
I do. I've written min/max functions before. Many times it is simply
to search through some set of constants/generics and find the min/max
and return a constant....and that's exactly what is returned and
synthesized...a constant. I've also written much more complicated
functions to do all kinds of things and again, what got synthesized is
right along the lines of what I wanted and expected to pop out.

Granted the poster is looking to do min on signals and shouldn't get a
constant as a result but if one can succintly code what he is looking
to do and parameterize it so as to allow for multiple clock cycle for
pipelining roughly along the lines of how I suggested and have it work
out just fine.

I design my logic without writing a line of HDL. Then once I am happy
with the approach, I start writing HDL to "describe" the logic I want.
Every HDL tool I have seen in the last 10 years has provided templates
for HDL that the tool understands and explains how it will map to
hardware. If you use those templates to describe your hardware, you
will get what you have designed and you will understand your result
completely.
The only thing in the original post was a 'min' function which should
translate into a subtractor to implement each (if X < Y)....not much of
a stretch there in my opinion.

The difference between the two approaches above is that one chains N-1
compares in a serial fashion and the other seems to be oriented in a
tree structure.
Serial versus pipelined is what I would call it but OK.

I actually have no idea exactly how the logic will
work out for the serial coding because the compares are all done
against a variable rather than between the input elements. I have
never coded in this style so I can't say how the synthesis will
translate this.
As with all other code...it will translate it exactly as written....by
that I mean the synthesis output will be functionally equivalent to the
source code.....if you're not always getting that all the time then you
have a tool problem.

The tree solution can be designed using Generate statements and a few
basic calculations to figure out the number of levels (log of Buf_Depth
with upward rounding) and the number of compares in the first level
(difference between 2 to the power of the number of levels and
Buf_Depth). If this is hard to think of just consider that you are
describing a logarithmic triangle, so to speak. This should be a
simple geometry problem.
Like I said, I did leave some of the nasty details to the original
poster...I just gave an outline of how I would approach it.

BTW, this type of solution is not likely to be very practical for a
buffer of any size since it will consume O(N) hardware resources.
Not true. 'N' will be bounded and will be synthesizable into 'some'
number of logic cells. Presumably the poster already has a family of
parts in mind that he/she is targetting and will either get it to fit
in the design or say...DANG....I can't do it this way and see about if
it would be possible to stream the data in and calculate the minimum on
the fly....that way only needing one comparator for the entire design.
But that's not what they were asking for, maybe there is some reason
that the data can not be streamed in but is just 'there' and now he
needs to find the minimum of it.

My 2 cents.
KJ
 
rickman wrote:
I think all of you are going about this the wrong way (IMHO). First, I
find a lot of logic designers forget what HDL stands for... Hardware
Description Language. Instead of designing their logic to suit both
the behavior required as well as the practical requirements of the
problem, they often start coding the problem as if they were writing
software. The result is... well I don't know what the result is and
that is the point, neither do they! Of course if you have coded a
I think I understand the point you're trying to make but that's not
where the approach is coming from. I'm not trying to shoehorn
'c' into VHDL, far from it. What I'm trying to do is express the
concept of automatically building up the comparator tree, yet
have the code be readable + intuitive.

For the exercise I went ahead and did this with a 6 element array.
It simulates fine, and then I synthesized it. The synthesis report
indicates 5 comparators were used, which is exactly what would
be required:

HDL Synthesis Report

Macro Statistics
# Registers : 3
8-bit register : 3
# Comparators : 5
8-bit comparator greater : 2
8-bit comparator less : 3
# Multiplexers : 3
8-bit 4-to-1 multiplexer : 2
8-bit 8-to-1 multiplexer : 1

Below is the code for reference.

Now, I look at the function "min" and it seems perfectly
intuitive what it is doing. And it maps perfectly into VHDL,
and it seems to get the job done.

There is an accepted principle that any algorithm implemented
using recursion can be implemented without recursion. However
I can't imagine a non-recursive form being this nifty.

Now, in practice, I can't see much use for this sort of thing
in the first place. I think in actual digital circuit design one
wouldn't have all the elements of an array built up with their
values. You'd have to have logic to initialize the array
elements -- I doubt if in one clock cycle you'd fill out the whole
array. So if you're going to spread out the initialization of the
array across multiple clock cycles, you can take advantage
of that and spread out the mechanism to determine the
minimum value over those very same clock cycles.

I look at this as just an interesting exercise. A possible
application might be if you have X devices trying to control
a resource, and each has a priority associated with it, you
want to pick the device with the highest priority.

-Dave

library IEEE;
use ieee.numeric_std.all;
use ieee.std_logic_1164.all;

entity system is
port (
clk, rst_p : inout std_logic;
min_out : out std_logic_vector(7 downto 0));
end system;

architecture behav of system is


subtype entry is unsigned(7 downto 0);
type invect is array (natural range <>) of entry;
signal vec : invect(0 to 5);
signal minimum : entry;

type state is (c0, c1, c2, c3);
signal mystate : state := c0;

function min(iv : invect) return entry is
variable n2 : integer;
variable l, r : entry;
begin
if iv'length = 1 then
return iv(iv'left);
end if;
n2 := (iv'left + iv'right + 1) / 2;
l := min(iv(iv'left to n2-1));
r := min(iv(n2 to iv'right));
if l < r then
return l;
else
return r;
end if;
end;


begin


-- pragma translate_off
process
begin
clk <= '0';
for i in 0 to 100 loop
wait for 5 ns;
clk <= not clk;
end loop;
wait;
end process;
process
begin
rst_p <= '1';
wait for 20 ns;
rst_p <= '0';
wait;
end process;
-- pragma translate_on




process(clk)
begin

if clk'event and clk='1' then
if rst_p = '1' then
mystate <= c0;
end if;
case mystate is
when c0 =>
vec(0) <= X"22";
vec(1) <= X"11";
vec(2) <= X"55";
vec(3) <= X"77";
vec(4) <= X"33";
vec(5) <= X"99";
mystate <= c1;
when c1 =>
vec(3) <= X"05";
mystate <= c2;
when c2 =>
vec(3) <= X"77";
vec(1) <= X"aa";
mystate <= c3;
when c3 =>
vec(5) <= X"02";
mystate <= c0;
end case;
end if;

end process;

minimum <= min(vec);
min_out <= std_logic_vector(minimum);

end behav;


--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
 
"David Ashley" <dash@nowhere.net.dont.email.me> wrote in message
news:5--dnYevwMRJ1oXYnZ2dnUVZ_umdnZ2d@adelphia.com...

I agree with everything you said and pass kudos for posting code and
results...as well as for the DDR thing you had posted a while ago too.

Now, I look at the function "min" and it seems perfectly
intuitive what it is doing. And it maps perfectly into VHDL,
and it seems to get the job done.
And that's what makes it a 'good' implementation in my book. It maps well
to the hardware and is painfully obvious to understand.

There is an accepted principle that any algorithm implemented
using recursion can be implemented without recursion. However
I can't imagine a non-recursive form being this nifty.

Now, in practice, I can't see much use for this sort of thing
in the first place. I think in actual digital circuit design one
wouldn't have all the elements of an array built up with their
values. You'd have to have logic to initialize the array
elements -- I doubt if in one clock cycle you'd fill out the whole
array.
What about if on every clock cycle you're getting some sensor inputs and you
need something to find the minimum for whatever reason as part of the signal
processing. Those sensor inputs would all be in parallel.

So if you're going to spread out the initialization of the
array across multiple clock cycles, you can take advantage
of that and spread out the mechanism to determine the
minimum value over those very same clock cycles.
Probably true in many cases but the original poster may have something else
in mind that would preclude this....like the above mentioned sensor inputs.
Although if he hadn't considered this approach the mention that had now been
made a couple times in this thread may tickle a brain cell on his side to
question if the serial approach might be better....since there is a logic
resource issue to contend with doing it in parallel.

I look at this as just an interesting exercise.
Yep, and the good thing is that sometimes you get new insights when
pondering these exercises. Even if not directly applicable now, you've
tickled a brain cell or two that might remember it when you need it.

KJ
 
What about if on every clock cycle you're getting some sensor inputs and you
need something to find the minimum for whatever reason as part of the signal
processing. Those sensor inputs would all be in parallel.
This is exactly my scenario. Every clock cycle I grab data from my
sensors and shift the new data into the data buffer (oldest data gets
shifted out). Part of my signal processing involves getting the
minimum value of this data buffer each cycle.

So if you're going to spread out the initialization of the
array across multiple clock cycles, you can take advantage
of that and spread out the mechanism to determine the
minimum value over those very same clock cycles.
I actually tried a few approaches similar to this and couldn't come up
with anything I liked.

I look at this as just an interesting exercise.
I do to. I could have just 'hard-coded' the data buffer size, but I
want a solution that is flexible enough so that I can re-use it with
different buffer sizes (and maybe eventually be able to use different
buffer sizes on the fly). I also wanted to explore what I could do
with VHDL a little bit more.

Yep, and the good thing is that sometimes you get new insights when
pondering these exercises. Even if not directly applicable now, you've
tickled a brain cell or two that might remember it when you need it.
Yes - big thanks to everyone's insightful comments
 

Welcome to EDABoard.com

Sponsor

Back
Top