Signal in a Case Statement

Shannon · Jul 29, 2007

The newbie is back with another basic question!

I'm using a massive Case statement as a look-up table. All is working
great thanks to the help I've gotten on here. But now the time has
come to replace the constants I've been using to debug with an array
of vectors to be filled at run-time. Yeah, the comparison values must
be locally static. What do I do now? Is there anyway in VHDL to do a
look-up table where the values aren't known until run time?

On a side note I have no idea why the compiler wouldn't be able to use
a signal and still create logic.

Shannon

Jonathan Bromley · Jul 29, 2007

On Sun, 29 Jul 2007 13:56:34 -0000,
Shannon <sgomes@sbcglobal.net> wrote:

The newbie is back with another basic question!

Not so basic, I think.

I'm using a massive Case statement

How massive? 20 branches? 200? 2000? The scale may be
important, if this will be implemented in hardware.

as a look-up table. All is working
great thanks to the help I've gotten on here. But now the time has
come to replace the constants I've been using to debug with an array
of vectors to be filled at run-time. Yeah, the comparison values must
be locally static. What do I do now? Is there anyway in VHDL to do a
look-up table where the values aren't known until run time?

So, if I understand you correctly, you're trying to do something
like

case input_vector is
when table(0) => ...;
when table(1) => ...;
...
end case;

This is, in effect, a reverse lookup table - a content-addressable
memory. It requires one fully-programmable equality comparator for
each branch of the case. Alternatively, if you are a software bod
and you know your sorting and searching algorithms, you may be
able to think of a few other ways to do it.

The brute-force approach in VHDL is...

variable result: integer;
...
result := <some default value>;
for i in table'range loop
if table(i) = input_vector then
result := i;
exit;
end if;
end loop;
case result is
when 0 => <do table(0) code>;
when 1 => <do table(1) code>;
default => <do code for no match>;
end case;

but that will lead to truly horrible logic. If you are
100% sure that the values will never overlap, you can
improve it considerably by ORing together the results
from all the comparisons, and NOT exiting the loop:

variable result: integer;
variable uns_result: unsigned(7 downto 0); -- or other range
...
result := (others => '0');
for i in table'range loop
if table(i) = input_vector then
uns_result := uns_result or to_unsigned(i, i'length);
end if;
end loop;
result := to_integer(uns_result);
case result is
when 1 => <do table(1) code>;
default => <do code for no match>;
end case;

Note that the default (un-matched) value of "result" is
now 0, so you probably don't want an entry 0 in your
matching table.

HOWEVER... this will still build as many equality comparators
as you have entries in the table - there will then be a mess of
OR gates gathering-up the outputs of these comparators.

Do you *really* need this? Couldn't you find a different
approach, in which the matching function is not expressed
as a huge lookup table?

On a side note I have no idea why the compiler wouldn't be able to use
a signal and still create logic.

It could - Verilog can do it - but the logic is likely to be MUCH
more complex. VHDL's rules for "case" ensure that you end up with
pretty simple logic in most realistic situations.

Tell us more about the real problem - what you're trying to
achieve - and you may get some helpful alternative ideas that
may be much more efficient than my brute-force answer.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Shannon · Jul 29, 2007

On Jul 29, 8:24 am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:

On Sun, 29 Jul 2007 13:56:34 -0000,

Shannon <sgo...@sbcglobal.net> wrote:
The newbie is back with another basic question!

Not so basic, I think.

I'm using a massive Case statement

How massive? 20 branches? 200? 2000? The scale may be
important, if this will be implemented in hardware.

as a look-up table. All is working
great thanks to the help I've gotten on here. But now the time has
come to replace the constants I've been using to debug with an array
of vectors to be filled at run-time. Yeah, the comparison values must
be locally static. What do I do now? Is there anyway in VHDL to do a
look-up table where the values aren't known until run time?

So, if I understand you correctly, you're trying to do something
like

case input_vector is
when table(0) => ...;
when table(1) => ...;
...
end case;

This is, in effect, a reverse lookup table - a content-addressable
memory. It requires one fully-programmable equality comparator for
each branch of the case. Alternatively, if you are a software bod
and you know your sorting and searching algorithms, you may be
able to think of a few other ways to do it.

The brute-force approach in VHDL is...

variable result: integer;
...
result := <some default value>;
for i in table'range loop
if table(i) = input_vector then
result := i;
exit;
end if;
end loop;
case result is
when 0 => <do table(0) code>;
when 1 => <do table(1) code>;
default => <do code for no match>;
end case;

but that will lead to truly horrible logic. If you are
100% sure that the values will never overlap, you can
improve it considerably by ORing together the results
from all the comparisons, and NOT exiting the loop:

variable result: integer;
variable uns_result: unsigned(7 downto 0); -- or other range
...
result := (others => '0');
for i in table'range loop
if table(i) = input_vector then
uns_result := uns_result or to_unsigned(i, i'length);
end if;
end loop;
result := to_integer(uns_result);
case result is
when 1 => <do table(1) code>;
default => <do code for no match>;
end case;

Note that the default (un-matched) value of "result" is
now 0, so you probably don't want an entry 0 in your
matching table.

HOWEVER... this will still build as many equality comparators
as you have entries in the table - there will then be a mess of
OR gates gathering-up the outputs of these comparators.

Do you *really* need this? Couldn't you find a different
approach, in which the matching function is not expressed
as a huge lookup table?

On a side note I have no idea why the compiler wouldn't be able to use
a signal and still create logic.

It could - Verilog can do it - but the logic is likely to be MUCH
more complex. VHDL's rules for "case" ensure that you end up with
pretty simple logic in most realistic situations.

Tell us more about the real problem - what you're trying to
achieve - and you may get some helpful alternative ideas that
may be much more efficient than my brute-force answer.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Fantastic answer Jonathan! You saw right through my obtuse question.

Unfortunately the lookup table is not 'functionizable' The pairings
are truly random. I can guarantee that there are no overlaps however
so all is well with that. I'm afraid I'm going to have to take some
time and stare at your solution to see if it will work. I'm slow to
pick up new techniques.

You asked how big and right now the table is around 40 items. The
table values are fortunately only single bytes so that is good. I
have to cram this all into a Cyclone.

I was going down the path of a huge 'IF...ELSIF' structure to get this
done. I have a feeling your solution might yield more compact logic.

Thanks again. I'll be back when I understand things more fully.

Shannon

Shannon · Jul 29, 2007

On Jul 29, 11:58 am, Shannon <sgo...@sbcglobal.net> wrote:

On Jul 29, 8:24 am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com
wrote:

On Sun, 29 Jul 2007 13:56:34 -0000,

Shannon <sgo...@sbcglobal.net> wrote:
The newbie is back with another basic question!

Not so basic, I think.

I'm using a massive Case statement

How massive? 20 branches? 200? 2000? The scale may be
important, if this will be implemented in hardware.

as a look-up table. All is working
great thanks to the help I've gotten on here. But now the time has
come to replace the constants I've been using to debug with an array
of vectors to be filled at run-time. Yeah, the comparison values must
be locally static. What do I do now? Is there anyway in VHDL to do a
look-up table where the values aren't known until run time?

So, if I understand you correctly, you're trying to do something
like

case input_vector is
when table(0) => ...;
when table(1) => ...;
...
end case;

This is, in effect, a reverse lookup table - a content-addressable
memory. It requires one fully-programmable equality comparator for
each branch of the case. Alternatively, if you are a software bod
and you know your sorting and searching algorithms, you may be
able to think of a few other ways to do it.

The brute-force approach in VHDL is...

variable result: integer;
...
result := <some default value>;
for i in table'range loop
if table(i) = input_vector then
result := i;
exit;
end if;
end loop;
case result is
when 0 => <do table(0) code>;
when 1 => <do table(1) code>;
default => <do code for no match>;
end case;

but that will lead to truly horrible logic. If you are
100% sure that the values will never overlap, you can
improve it considerably by ORing together the results
from all the comparisons, and NOT exiting the loop:

variable result: integer;
variable uns_result: unsigned(7 downto 0); -- or other range
...
result := (others => '0');
for i in table'range loop
if table(i) = input_vector then
uns_result := uns_result or to_unsigned(i, i'length);
end if;
end loop;
result := to_integer(uns_result);
case result is
when 1 => <do table(1) code>;
default => <do code for no match>;
end case;

Note that the default (un-matched) value of "result" is
now 0, so you probably don't want an entry 0 in your
matching table.

HOWEVER... this will still build as many equality comparators
as you have entries in the table - there will then be a mess of
OR gates gathering-up the outputs of these comparators.

Do you *really* need this? Couldn't you find a different
approach, in which the matching function is not expressed
as a huge lookup table?

On a side note I have no idea why the compiler wouldn't be able to use
a signal and still create logic.

It could - Verilog can do it - but the logic is likely to be MUCH
more complex. VHDL's rules for "case" ensure that you end up with
pretty simple logic in most realistic situations.

Tell us more about the real problem - what you're trying to
achieve - and you may get some helpful alternative ideas that
may be much more efficient than my brute-force answer.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Fantastic answer Jonathan! You saw right through my obtuse question.

Unfortunately the lookup table is not 'functionizable' The pairings
are truly random. I can guarantee that there are no overlaps however
so all is well with that. I'm afraid I'm going to have to take some
time and stare at your solution to see if it will work. I'm slow to
pick up new techniques.

You asked how big and right now the table is around 40 items. The
table values are fortunately only single bytes so that is good. I
have to cram this all into a Cyclone.

I was going down the path of a huge 'IF...ELSIF' structure to get this
done. I have a feeling your solution might yield more compact logic.

Thanks again. I'll be back when I understand things more fully.

Shannon

Hehehe...that was quick! I just took a moment to read your answer
more fully. What you have done is a "IF..THEN" structure where the
output is a singular index to a "CASE"-style look-up table. I might
be able to compact it a little bit in my specific case. I think I can
<do stuff on match> right inside the "IFF..THEN" loop. We'll see.

Thanks so much for the help. It's amazing what a resource this group
has been to me. I hope I can return the favor soon.

Shannon

David M. Palmer · Jul 29, 2007

In article <1185735513.830489.155950@i13g2000prf.googlegroups.com>,
Shannon <sgomes@sbcglobal.net> wrote:

Unfortunately the lookup table is not 'functionizable' The pairings
are truly random. I can guarantee that there are no overlaps however
so all is well with that. I'm afraid I'm going to have to take some
time and stare at your solution to see if it will work. I'm slow to
pick up new techniques.

You asked how big and right now the table is around 40 items. The
table values are fortunately only single bytes so that is good. I
have to cram this all into a Cyclone.

If the input vector is 8 bits, and you are populating 40 items, it may
be best to use a 256-entry block ram as a lookup table. It wastes 84%
of the entries, but that might still be a win compared to having 40
comparators.

--
David M. Palmer dmpalmer@email.com (formerly @clark.net, @ematic.com)

Shannon · Jul 30, 2007

On Jul 29, 3:32 pm, "David M. Palmer" <dmpal...@email.com> wrote:

In article <1185735513.830489.155...@i13g2000prf.googlegroups.com>,

Shannon <sgo...@sbcglobal.net> wrote:
Unfortunately the lookup table is not 'functionizable' The pairings
are truly random. I can guarantee that there are no overlaps however
so all is well with that. I'm afraid I'm going to have to take some
time and stare at your solution to see if it will work. I'm slow to
pick up new techniques.

You asked how big and right now the table is around 40 items. The
table values are fortunately only single bytes so that is good. I
have to cram this all into a Cyclone.

If the input vector is 8 bits, and you are populating 40 items, it may
be best to use a 256-entry block ram as a lookup table. It wastes 84%
of the entries, but that might still be a win compared to having 40
comparators.

--
David M. Palmer dmpal...@email.com (formerly @clark.net, @ematic.com)

Interesting answer. I looked into using a RAM but it looked like you
needed to do all the addressing and clocking that you would have to do
as if it were an external IC. In my case I have only a couple of
clocks to get things done. Can a RAM be made to do this?

Shannon

Philipp TĂślke · Jul 30, 2007

Shannon <sgomes@sbcglobal.net> schrieb:

On Jul 29, 3:32 pm, "David M. Palmer" <dmpal...@email.com> wrote:
If the input vector is 8 bits, and you are populating 40 items, it may
be best to use a 256-entry block ram as a lookup table. It wastes 84%
of the entries, but that might still be a win compared to having 40
comparators.

Interesting answer. I looked into using a RAM but it looked like you
needed to do all the addressing and clocking that you would have to do
as if it were an external IC. In my case I have only a couple of
clocks to get things done. Can a RAM be made to do this?

That's not a problem. I think.

I only used Xilinx, but it inferes RAM whenever you do something like:

| signal ram: std_logic_vector(18000000 downto 0);

Other synthesizers should do similar tricks.

But I am not sure how to implement the lookup-table David proposed --
David could you please elaborate? I'm a newbie to VHDL myself...

Have a nice day,
--
Philipp TĂślke
PGP: 0x96A1FE7A

Jonathan Bromley · Jul 30, 2007

On Sun, 29 Jul 2007 19:04:24 -0000,
Shannon <sgomes@sbcglobal.net> wrote:

Hehehe...that was quick! I just took a moment to read your answer
more fully. What you have done is a "IF..THEN" structure where the
output is a singular index to a "CASE"-style look-up table. I might
be able to compact it a little bit in my specific case. I think I can
do stuff on match> right inside the "IFF..THEN" loop. We'll see.

Yes, but if you do that, then the parallelization of the CASE is lost.
The neat thing about the idea of ORing together all the "result"
indices is that it codifies your design intent, that the matches
are unique, and therefore allows the compiler to build a simple
parallel OR structure instead of a cascade of multiplexers
which will be somewhat larger and MUCH sloooooower. If you
put your actions within the IF, I suspect you will go back to
a cascaded-MUX effort. With 40 stages that could be a killer.

Thanks so much for the help. It's amazing what a resource this group
has been to me.

You and me both. Note VERY carefully David Palmer's excellent
suggestion of using a RAM as a lookup table (again, to generate
an index into a CASE) - that is pretty sure to be a win.
Of course, it runs out of steam if you have more than about
8 bits in your lookup values - then you might need to consider
some hash function to do the lookup. The usefulness of that
would be dependent on the distribution of your index values.

Also please note a typo in the second solution I suggested (sorry):

variable result: integer;
variable uns_result: unsigned(7 downto 0); -- or other range
...
result := (others => '0'); <<<<<<<<<<<< OOOOOPS
--- should have been...
uns_result := (others => '0');
--- that'll work better.
for i in table'range loop
...

What's the application? Sounds interesting.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Andy · Jul 30, 2007

On Jul 30, 3:38 am, Philipp Tölke <ascii...@web.de> wrote:

Shannon <sgo...@sbcglobal.net> schrieb:

On Jul 29, 3:32 pm, "David M. Palmer" <dmpal...@email.com> wrote:
If the input vector is 8 bits, and you are populating 40 items, it may
be best to use a 256-entry block ram as a lookup table. It wastes 84%
of the entries, but that might still be a win compared to having 40
comparators.

Interesting answer. I looked into using a RAM but it looked like you
needed to do all the addressing and clocking that you would have to do
as if it were an external IC. In my case I have only a couple of
clocks to get things done. Can a RAM be made to do this?

That's not a problem. I think.

I only used Xilinx, but it inferes RAM whenever you do something like:

| signal ram: std_logic_vector(18000000 downto 0);

Other synthesizers should do similar tricks.

But I am not sure how to implement the lookup-table David proposed --
David could you please elaborate? I'm a newbie to VHDL myself...

Have a nice day,
--
Philipp Tölke
PGP: 0x96A1FE7A

Xilinx has distributed ram that can be read asynchronously
(combinatorial read path from address to data out). I'm not sure about
Altera cyclone.

Just declaring an array does NOT infer ram. The array must also be
accessed carefully to ensure (and so that the synthesizer knows) that
only one (for single port ram) or two (for dual port ram) elements of
the array are being accessed at a time (i.e. in a clock cycle).
Depending on whether the read logic is registered or not, arrays can
be used to infer distributed (async read) or block (registered read)
rams.

Andy

Shannon · Jul 30, 2007

On Jul 30, 6:13 am, Andy <jonesa...@comcast.net> wrote:

On Jul 30, 3:38 am, Philipp Tölke <ascii...@web.de> wrote:

Shannon <sgo...@sbcglobal.net> schrieb:

On Jul 29, 3:32 pm, "David M. Palmer" <dmpal...@email.com> wrote:
If the input vector is 8 bits, and you are populating 40 items, it may
be best to use a 256-entry block ram as a lookup table. It wastes 84%
of the entries, but that might still be a win compared to having 40
comparators.

Interesting answer. I looked into using a RAM but it looked like you
needed to do all the addressing and clocking that you would have to do
as if it were an external IC. In my case I have only a couple of
clocks to get things done. Can a RAM be made to do this?

That's not a problem. I think.

I only used Xilinx, but it inferes RAM whenever you do something like:

| signal ram: std_logic_vector(18000000 downto 0);

Other synthesizers should do similar tricks.

But I am not sure how to implement the lookup-table David proposed --
David could you please elaborate? I'm a newbie to VHDL myself...

Have a nice day,
--
Philipp Tölke
PGP: 0x96A1FE7A

Xilinx has distributed ram that can be read asynchronously
(combinatorial read path from address to data out). I'm not sure about
Altera cyclone.

Just declaring an array does NOT infer ram. The array must also be
accessed carefully to ensure (and so that the synthesizer knows) that
only one (for single port ram) or two (for dual port ram) elements of
the array are being accessed at a time (i.e. in a clock cycle).
Depending on whether the read logic is registered or not, arrays can
be used to infer distributed (async read) or block (registered read)
rams.

Andy

Thanks for the information Jonathan and Andy. I will look into the
RAM idea Andy.

I'm going to try both versions Jonathan and see if I can understand
why one version will work out better than the other. I still don't
have an intuitive feel for VHDL so what I usually do is compile the
code and then look at an RTL viewer to see what is "really"
generated. Then I can more deeply understand the implications of
various techniques.

On first blush to me it seems MORE compact to do away with the case
statement and put everything in the IF..THEN structure. I trust what
you are saying is true however so I will try it both ways and see if I
can make sense of what's going on.

The application isn't all that exciting. I can't say much because
it's a product after all! But I will say that I'm basically reading
the frequency of an incoming signal and looking up a value based on
that frequency. Unfortunately for me the relationship between
frequency and result isn't obvious at this point in our program. It's
possible that it is exponential but we just don't know yet.

Thanks so much for your help,

Shannon

Shannon · Jul 30, 2007

The array must also be
accessed carefully to ensure (and so that the synthesizer knows) that
only one (for single port ram) or two (for dual port ram) elements of
the array are being accessed at a time (i.e. in a clock cycle).

Andy, I'm a little confused by the above statement. Can you clarify?
It sounds like you are suggesting with the RAM I could only access one
"table value" per clock?

Shannon

Jonathan Bromley · Jul 30, 2007

On Mon, 30 Jul 2007 13:47:08 -0000,
Shannon <sgomes@sbcglobal.net> wrote:

I'm basically reading
the frequency of an incoming signal and looking up a value based on
that frequency. Unfortunately for me the relationship between
frequency and result isn't obvious at this point in our program. It's
possible that it is exponential but we just don't know yet.

Aw shucks, at LAST you've told us what you're really trying to do!
So the lookup value doesn't really control what your code should
DO, but only a VALUE... you have some unknown or unpredictable
function that can only be expressed as a lookup table.

Here's an idea that may make it less clumsy. Your incoming
frequency (ordinate) is 8 bits, you say. OK, so split it
into two pieces - top 4 bits and lower 4 bits. Create a
16-entry lookup table, indexed by the top 4 bits, with TWO
columns: the first "column" (group of bits in each element
of the table) is the correct function value for that value
of the top 4 bits, if the lower 4 bits were stuck at zero.
The second "column" is the first derivative of the function
at that point, suitably scaled. Now you multiply that
derivative by the lower 4 bits of the input, and add it
to the first-approximation value. Bingo: cut-price
first-order linear interpolation. Works nicely for
any reasonably smooth function. If the function happens
to be exponential, you're in even better shape - the first
derivative is obtained from the value by multiplying by
a scale factor, no table required!!! - but that's too
lucky to be true.

This little example uses constants for the LUT - yes,
I know you need it to be variables - but it illustrates
the idea; it generates a pretty good sine-function
approximation from just 32 numbers. Note also that
the first-differences table has pretty small numbers
in it, and doesn't need much space. I've registered
the LUT outputs so that they could go into blockRAM if
need be.

There are tricks for making the approximation better,
at the cost of a little more logic.

entity pipelined_lookup is
port (
clk : in std_logic;
freq: in unsigned(7 downto 0);
result: out unsigned(11 downto 0) --- as an example
);
end;
architecture linear_interpolator of pipelined_lookup is
subtype uint_12bit is integer range 0 to 4095;
type lookup_table is array(0 to 15) of uint_12bit;
--- y0 (approximation) lookup table, sine scaled by 4000
constant y0: lookup_table :=
( 0 => 0
, 1 => 392
, 2 => 780
, 3 => 1161
, 4 => 1531
, 5 => 1886
, 6 => 2222
, 7 => 2538
, 8 => 2828
, 9 => 3092
,10 => 3326
,11 => 3528
,12 => 3696
,13 => 3828
,14 => 3923
,15 => 3981
);
--- dy (first derivative) lookup table, cosine suitably scaled
constant dy: lookup_table :=
( 0 => 25
, 1 => 24
, 2 => 24
, 3 => 23
, 4 => 23
, 5 => 22
, 6 => 20
, 7 => 19
, 8 => 17
, 9 => 16
,10 => 14
,11 => 11
,12 => 9
,13 => 6
,14 => 4
,15 => 2
);

signal reg_y0, reg_dy: uint_12bit;
signal delta: integer range 0 to 15;

begin
process (clk)
variable index: integer range 0 to 15;
begin
if rising_edge(clk) then
-- first pipeline stage: lookup table
index := to_integer(freq(7 downto 4));
reg_y0 <= y0(index);
reg_dy <= dy(index);
delta <= to_integer(freq(3 downto 0));
-- second pipeline stage: interpolation
result <= to_unsigned(reg_y0 + reg_dy * delta, result'length);
end if;
end process;
end;
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Andy · Jul 30, 2007

On Jul 30, 9:54 am, Shannon <sgo...@sbcglobal.net> wrote:

The array must also be
accessed carefully to ensure (and so that the synthesizer knows) that
only one (for single port ram) or two (for dual port ram) elements of
the array are being accessed at a time (i.e. in a clock cycle).

Andy, I'm a little confused by the above statement. Can you clarify?
It sounds like you are suggesting with the RAM I could only access one
"table value" per clock?

Shannon

Memories (single port) have an address port and a single data port;
you only get one piece of data out of it for each address. So you have
to access one address at a time (i.e. once per clock) and grab the
data on the clock edge for that address. Thus it would take N clocks
to read N locations in a single port RAM. Dual port memories have two
address and two data ports, so you could read N locations in N/2
clocks.

That's not to say that you cannot use arrays and access all elements
in that array in one clock. The synthesis tool will just use luts and
flops to implement the array, instead of using ram resources.

Andy

Shannon · Jul 30, 2007

On Jul 30, 9:37 am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:

On Mon, 30 Jul 2007 13:47:08 -0000,

Shannon <sgo...@sbcglobal.net> wrote:
I'm basically reading
the frequency of an incoming signal and looking up a value based on
that frequency. Unfortunately for me the relationship between
frequency and result isn't obvious at this point in our program. It's
possible that it is exponential but we just don't know yet.

Aw shucks, at LAST you've told us what you're really trying to do!
So the lookup value doesn't really control what your code should
DO, but only a VALUE... you have some unknown or unpredictable
function that can only be expressed as a lookup table.

Here's an idea that may make it less clumsy. Your incoming
frequency (ordinate) is 8 bits, you say. OK, so split it
into two pieces - top 4 bits and lower 4 bits. Create a
16-entry lookup table, indexed by the top 4 bits, with TWO
columns: the first "column" (group of bits in each element
of the table) is the correct function value for that value
of the top 4 bits, if the lower 4 bits were stuck at zero.
The second "column" is the first derivative of the function
at that point, suitably scaled. Now you multiply that
derivative by the lower 4 bits of the input, and add it
to the first-approximation value. Bingo: cut-price
first-order linear interpolation. Works nicely for
any reasonably smooth function. If the function happens
to be exponential, you're in even better shape - the first
derivative is obtained from the value by multiplying by
a scale factor, no table required!!! - but that's too
lucky to be true.

This little example uses constants for the LUT - yes,
I know you need it to be variables - but it illustrates
the idea; it generates a pretty good sine-function
approximation from just 32 numbers. Note also that
the first-differences table has pretty small numbers
in it, and doesn't need much space. I've registered
the LUT outputs so that they could go into blockRAM if
need be.

There are tricks for making the approximation better,
at the cost of a little more logic.

entity pipelined_lookup is
port (
clk : in std_logic;
freq: in unsigned(7 downto 0);
result: out unsigned(11 downto 0) --- as an example
);
end;
architecture linear_interpolator of pipelined_lookup is
subtype uint_12bit is integer range 0 to 4095;
type lookup_table is array(0 to 15) of uint_12bit;
--- y0 (approximation) lookup table, sine scaled by 4000
constant y0: lookup_table :=
( 0 => 0
, 1 => 392
, 2 => 780
, 3 => 1161
, 4 => 1531
, 5 => 1886
, 6 => 2222
, 7 => 2538
, 8 => 2828
, 9 => 3092
,10 => 3326
,11 => 3528
,12 => 3696
,13 => 3828
,14 => 3923
,15 => 3981
);
--- dy (first derivative) lookup table, cosine suitably scaled
constant dy: lookup_table :=
( 0 => 25
, 1 => 24
, 2 => 24
, 3 => 23
, 4 => 23
, 5 => 22
, 6 => 20
, 7 => 19
, 8 => 17
, 9 => 16
,10 => 14
,11 => 11
,12 => 9
,13 => 6
,14 => 4
,15 => 2
);

signal reg_y0, reg_dy: uint_12bit;
signal delta: integer range 0 to 15;

begin
process (clk)
variable index: integer range 0 to 15;
begin
if rising_edge(clk) then
-- first pipeline stage: lookup table
index := to_integer(freq(7 downto 4));
reg_y0 <= y0(index);
reg_dy <= dy(index);
delta <= to_integer(freq(3 downto 0));
-- second pipeline stage: interpolation
result <= to_unsigned(reg_y0 + reg_dy * delta, result'length);
end if;
end process;
end;
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

I apologize for not giving out ALL of the information about my
problem. I have to be a little cagey so that I don't give out too
much proprietary information. It's hard for someone who doesn't fully
understand the problem to know what is relevant information and what
is superfluous.

I will look into your newest solution (I'm amazed at how much someone
will do for a complete stanger!). I have to model some sets of data
we have and see just how well behaved things are. I must say this
latest version seems much more elegant! I'm learning so much!

Shannon

Jonathan Bromley · Jul 31, 2007

On Mon, 30 Jul 2007 18:13:17 -0000,
Shannon <sgomes@sbcglobal.net> wrote:

I will look into your newest solution (I'm amazed at how much
someone will do for a complete stanger!).

Read with your eyes open: I post here for many reasons, not all
of them altruistic. Free publicity is always good. Solving
small but interesting problems is a good way of keeping ageing
neurons active. Finding out about the problems that really
hurt people is a good way of honing our training courses.
Discussing solutions with other experienced folk is a great
way to calibrate my own understanding.

Gratitude is nice, and good for my ego, but hardly justified
when all I did was to write ten lines of VHDL and spend five
minutes with a spreadsheet to get the table values. If you
feel the need to repay your (very small) debt, then keep up
your valuable habit of telling us whether the responses are
useful (or not), and why (or why not).

Thanks for the interesting questions!
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Torsten Landschoff · Aug 1, 2007

Hi Jonathan,

On 30 Jul., 18:37, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:

The second "column" is the first derivative of the function
at that point, suitably scaled. Now you multiply that
derivative by the lower 4 bits of the input, and add it
to the first-approximation value. Bingo: cut-price
first-order linear interpolation. Works nicely for

Now that's a nice trick. I'll try to remember it for when I need
something like this. I only wonder if the price for the multiplier
would outweight the price for the full 256-entry lookup table...

Greetings

Torsten

Jonathan Bromley · Aug 1, 2007

On Wed, 01 Aug 2007 02:07:44 -0700,
Torsten Landschoff <t.landschoff@gmx.de> wrote:

I only wonder if the price for the multiplier
would outweight the price for the full
256-entry lookup table...

In DSP-intensive designs that might well be an issue.
Personally I've never yet done a design that used all
the available multipliers, but I *have* run out of
memory... And of course the interpolator scales better
to higher precisions, although you might then need
second-order interpolation to get an accurate enough
result.

I guess the real lesson is: the more tricks you have
up your sleeve, the better chance you have of coming
up with a solution to any specific problem. No single
solution is the right answer for every problem.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Andy · Aug 1, 2007

On Aug 1, 4:43 am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:

On Wed, 01 Aug 2007 02:07:44 -0700,

Torsten Landschoff <t.landsch...@gmx.de> wrote:
I only wonder if the price for the multiplier
would outweight the price for the full
256-entry lookup table...

In DSP-intensive designs that might well be an issue.
Personally I've never yet done a design that used all
the available multipliers, but I *have* run out of
memory... And of course the interpolator scales better
to higher precisions, although you might then need
second-order interpolation to get an accurate enough
result.

I guess the real lesson is: the more tricks you have
up your sleeve, the better chance you have of coming
up with a solution to any specific problem. No single
solution is the right answer for every problem.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

And even if you do run out of multipliers, the small size of the
derivative may make shift-and-add feasible.

Nice trick, John!

Andy

Torsten Landschoff · Aug 2, 2007

On 1 Aug., 11:43, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:

In DSP-intensive designs that might well be an issue.
Personally I've never yet done a design that used all
the available multipliers, but I *have* run out of
memory... And of course the interpolator scales better
to higher precisions, although you might then need
second-order interpolation to get an accurate enough
result.

At some point I'll try to apply this to my DDS sine/cosine generator.
Currently, it uses a single lookup table with 2^11 values.

I guess the real lesson is: the more tricks you have
up your sleeve, the better chance you have of coming
up with a solution to any specific problem. No single
solution is the right answer for every problem.

I definitely agree on that statement.

Greetings, Torsten

Shannon · Aug 2, 2007

At some point I'll try to apply this to my DDS sine/cosine generator.
Currently, it uses a single lookup table with 2^11 values.

I guess the real lesson is: the more tricks you have
up your sleeve, the better chance you have of coming
up with a solution to any specific problem. No single
solution is the right answer for every problem.

I definitely agree on that statement.

Greetings, Torsten

2^11 values???? Isn't that at least 200Gigabytes????

Shannon

P.S. I'm still working on the data so I don't have an answer yet which
approach you guys have given me is going to work best.

Signal in a Case Statement

Shannon

Guest

Jonathan Bromley

Guest

Shannon

Guest

Shannon

Guest

David M. Palmer

Guest

Shannon

Guest

Philipp TĂślke

Guest

Jonathan Bromley

Guest

Andy

Guest

Shannon

Guest

Shannon

Guest

Jonathan Bromley

Guest

Andy

Guest

Shannon

Guest

Jonathan Bromley

Guest

Torsten Landschoff

Guest

Jonathan Bromley

Guest

Andy

Guest

Torsten Landschoff

Guest

Shannon

Guest

Log in

Welcome to EDABoard.com

Sponsor