Quartus II compilation too slow for RAM design

S

sora

Guest
when using qu(at)rtus tools to compile the correction of my Dual port
RAM design,
it takes me hours to compile and synthesize the source code file.

The RAM with only the size of 64 bytes, successful compile after 5
minutes.
The times increase linearly when the size of the RAM double,
let's say 128bytes takes about 10minutes or more, so on.

Im going to design 4 kbytes RAM myself with my small project.
Everything is
pretty good except the RAM with the compilation times consume hours of
time.

That's pretty unhappy.

************************************************
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity RAM128 is
generic (
A: integer := 7;
WORDS: integer := 128;
M: integer := 8
);
port (
clk : in STD_LOGIC;
TxR : in STD_LOGIC;
TxW : in STD_LOGIC;
AddrTx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrTx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
DataTxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataTxOut : out STD_LOGIC_VECTOR(M-1 downto 0);
AddrRx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrRx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
RxW : in STD_LOGIC;
RxR : in STD_LOGIC;
DataRxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataRxOut : out STD_LOGIC_VECTOR(M-1 downto 0)
);
end RAM128;

architecture RAM128_arch of RAM128 is

subtype cell is std_logic_vector(M-1 downto 0);
type ramArray is array (0 to WORDS-1) of cell;
signal ram: ramArray;
signal AddrMatch :std_logic;

begin
AddrMatch <= '1' when (AddrTx1 = AddrRx2) else '0';

process(clk, AddrTx1, AddrRx2, TxW, RxW, AddrMatch)
begin
if (clk'event and clk = '1')then
if (TxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;

if (RxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrRx2))) <= DataRxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;
end if;
end process;

process(RxR, AddrRx1, ram)
begin
if (RxR = '1')then
DataRxOut <= ram(CONV_INTEGER(unsigned(AddrRx1)));
else
DataRxout <= (others => '0');
end if;
end process;

process(TxR, AddrTx2, ram)
begin
if (TxR = '1')then
DataTxOut <= ram(CONV_INTEGER(unsigned(AddrTx2)));
else
DataTxOut <= (others => '0');
end if;
end process;

end RAM128_arch;
******************************************************************

Did any experts there can help to tackle this problem?
Did any good ideas there to reduce the time of compilation ?
This makes me crazy!
Please help to post your appreciatable ideas!!!
 
I'm just because most modern FPGA's include sometime RAM, why are
implementing your own?

In QuartusII, under 'Assignments' -> 'Settings' -> 'Analysis
& Synthesis' (left hand panel) -> 'More Settings...' (button), in
this dialog it gives you the option select how it should generate RAM.
Have you tried playing with these settings?

Derek
 
Derek Simmons wrote:
I'm just because most modern FPGA's include sometime RAM, why are
implementing your own?

In QuartusII, under 'Assignments' -> 'Settings' -> 'Analysis
& Synthesis' (left hand panel) -> 'More Settings...' (button), in
this dialog it gives you the option select how it should generate RAM.
Have you tried playing with these settings?

Derek

Ya I know quartus provides its own RAM, but it is not flexible for me to design my >project.
I cant find quartus II provides any Dual port RAM with two writes port,
and my project requires it to transmit and recieve data like a FIFO with bidirection.

Ya, I never tried playing that settings yet.
However i'll try it and let u all know all about it.
 
Ya I know quartus provides its own RAM, but it is not flexible for me to
design my >project.
I cant find quartus II provides any Dual port RAM with two writes port,
and my project requires it to transmit and recieve data like a FIFO with
bidirection.
Can you double the clock-rate and write every other cycle from the other
source into the RAM via its single write-port? Implementing your own RAM
will be an enormus waste of resources, esp. for 4KB.

Thomas

www.entner-electronics.com
 
"sora" <sora5563@gmail.com> wrote in message
news:1166063937.632046.57120@t46g2000cwa.googlegroups.com...
when using qu(at)rtus tools to compile the correction of my Dual port
RAM design,
it takes me hours to compile and synthesize the source code file.

The RAM with only the size of 64 bytes, successful compile after 5
minutes.
The times increase linearly when the size of the RAM double,
let's say 128bytes takes about 10minutes or more, so on.

Im going to design 4 kbytes RAM myself with my small project.
Everything is
pretty good except the RAM with the compilation times consume hours of
time.
In the Quartus manual they give examples in there of how to properly code to
infer memory. Perhaps peruse that a bit.

KJ
 
I don't know about altera FPGAs, but xilinx (and maybe others) have
distributed (LUT) 16x1 dual port rams with async (combinatorial) reads.
Your code probably would infer those resources rather quickly, but 4k
would still be a huge amount of them to stitch together. I know if I've
goofed up a ram inferrence if it takes a while to run in synplify
(because it cannot use the rams for some reason, and has to build it
out of registers).

Can you re-design to allow a clock cycle delay on your reads, and thus
infer synchronous-read block rams which both altera and xilinx have?

Andy


sora wrote:
when using qu(at)rtus tools to compile the correction of my Dual port
RAM design,
it takes me hours to compile and synthesize the source code file.

The RAM with only the size of 64 bytes, successful compile after 5
minutes.
The times increase linearly when the size of the RAM double,
let's say 128bytes takes about 10minutes or more, so on.

Im going to design 4 kbytes RAM myself with my small project.
Everything is
pretty good except the RAM with the compilation times consume hours of
time.

That's pretty unhappy.

************************************************
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity RAM128 is
generic (
A: integer := 7;
WORDS: integer := 128;
M: integer := 8
);
port (
clk : in STD_LOGIC;
TxR : in STD_LOGIC;
TxW : in STD_LOGIC;
AddrTx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrTx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
DataTxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataTxOut : out STD_LOGIC_VECTOR(M-1 downto 0);
AddrRx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrRx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
RxW : in STD_LOGIC;
RxR : in STD_LOGIC;
DataRxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataRxOut : out STD_LOGIC_VECTOR(M-1 downto 0)
);
end RAM128;

architecture RAM128_arch of RAM128 is

subtype cell is std_logic_vector(M-1 downto 0);
type ramArray is array (0 to WORDS-1) of cell;
signal ram: ramArray;
signal AddrMatch :std_logic;

begin
AddrMatch <= '1' when (AddrTx1 = AddrRx2) else '0';

process(clk, AddrTx1, AddrRx2, TxW, RxW, AddrMatch)
begin
if (clk'event and clk = '1')then
if (TxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;

if (RxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrRx2))) <= DataRxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;
end if;
end process;

process(RxR, AddrRx1, ram)
begin
if (RxR = '1')then
DataRxOut <= ram(CONV_INTEGER(unsigned(AddrRx1)));
else
DataRxout <= (others => '0');
end if;
end process;

process(TxR, AddrTx2, ram)
begin
if (TxR = '1')then
DataTxOut <= ram(CONV_INTEGER(unsigned(AddrTx2)));
else
DataTxOut <= (others => '0');
end if;
end process;

end RAM128_arch;
******************************************************************

Did any experts there can help to tackle this problem?
Did any good ideas there to reduce the time of compilation ?
This makes me crazy!
Please help to post your appreciatable ideas!!!
 
Hi Sora,

Your design models simultaniously 2 writes and 2 reads.
Under some setting of the control signals, you need simultanious access to ram's addresses AddrTx1, AddrRx2, AddrRx1 and AddrTx2.
That's a quad-port RAM.
Most FPGA RAM models have only 2 ports (3 at most) that can be accessed simultaniously.

Both Altera and Xilinx tools complain about this (not able to extract a RAM for this behavior)
and implement the model with flip-flops and decoders etc etc.

So if you want a RAM for this, re-code it so there are no more that two ports active.
You can resolve the multiple simultanious reads by replicating the ram, but do you really need the
simultanious write to AddrTx1 and AddrRx2 ?

Rob

"sora" <sora5563@gmail.com> wrote in message news:1166063937.632046.57120@t46g2000cwa.googlegroups.com...
when using qu(at)rtus tools to compile the correction of my Dual port
RAM design,
it takes me hours to compile and synthesize the source code file.

The RAM with only the size of 64 bytes, successful compile after 5
minutes.
The times increase linearly when the size of the RAM double,
let's say 128bytes takes about 10minutes or more, so on.

Im going to design 4 kbytes RAM myself with my small project.
Everything is
pretty good except the RAM with the compilation times consume hours of
time.

That's pretty unhappy.

************************************************
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity RAM128 is
generic (
A: integer := 7;
WORDS: integer := 128;
M: integer := 8
);
port (
clk : in STD_LOGIC;
TxR : in STD_LOGIC;
TxW : in STD_LOGIC;
AddrTx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrTx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
DataTxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataTxOut : out STD_LOGIC_VECTOR(M-1 downto 0);
AddrRx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrRx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
RxW : in STD_LOGIC;
RxR : in STD_LOGIC;
DataRxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataRxOut : out STD_LOGIC_VECTOR(M-1 downto 0)
);
end RAM128;

architecture RAM128_arch of RAM128 is

subtype cell is std_logic_vector(M-1 downto 0);
type ramArray is array (0 to WORDS-1) of cell;
signal ram: ramArray;
signal AddrMatch :std_logic;

begin
AddrMatch <= '1' when (AddrTx1 = AddrRx2) else '0';

process(clk, AddrTx1, AddrRx2, TxW, RxW, AddrMatch)
begin
if (clk'event and clk = '1')then
if (TxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;

if (RxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrRx2))) <= DataRxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;
end if;
end process;

process(RxR, AddrRx1, ram)
begin
if (RxR = '1')then
DataRxOut <= ram(CONV_INTEGER(unsigned(AddrRx1)));
else
DataRxout <= (others => '0');
end if;
end process;

process(TxR, AddrTx2, ram)
begin
if (TxR = '1')then
DataTxOut <= ram(CONV_INTEGER(unsigned(AddrTx2)));
else
DataTxOut <= (others => '0');
end if;
end process;

end RAM128_arch;
******************************************************************

Did any experts there can help to tackle this problem?
Did any good ideas there to reduce the time of compilation ?
This makes me crazy!
Please help to post your appreciatable ideas!!!
 
Rob is correct. You cannot have lut based RAMs with two write ports (at
least in xilinx). You can have a write port, and a read port, and an
optional extra read port using the write port's address.

If altera has the same restrictions, that is probably why it is
creating registers.

There is a way to implement a memory with arbitrary numbers of read and
write ports, using multiple rams and xor encoding/decoding the data
between them. However it does not support simultaneous writes to the
same address from different ports. The number of rams required adds up
fast, though. Also it only works with combinatorial read rams, but the
combinatorial read capabiltiy is maintained, so long as you have time
for the xor decode on the data.

Andy


Rob Dekker wrote:
Hi Sora,

Your design models simultaniously 2 writes and 2 reads.
Under some setting of the control signals, you need simultanious access to ram's addresses AddrTx1, AddrRx2, AddrRx1 and AddrTx2.
That's a quad-port RAM.
Most FPGA RAM models have only 2 ports (3 at most) that can be accessed simultaniously.

Both Altera and Xilinx tools complain about this (not able to extract a RAM for this behavior)
and implement the model with flip-flops and decoders etc etc.

So if you want a RAM for this, re-code it so there are no more that two ports active.
You can resolve the multiple simultanious reads by replicating the ram, but do you really need the
simultanious write to AddrTx1 and AddrRx2 ?

Rob

"sora" <sora5563@gmail.com> wrote in message news:1166063937.632046.57120@t46g2000cwa.googlegroups.com...
when using qu(at)rtus tools to compile the correction of my Dual port
RAM design,
it takes me hours to compile and synthesize the source code file.

The RAM with only the size of 64 bytes, successful compile after 5
minutes.
The times increase linearly when the size of the RAM double,
let's say 128bytes takes about 10minutes or more, so on.

Im going to design 4 kbytes RAM myself with my small project.
Everything is
pretty good except the RAM with the compilation times consume hours of
time.

That's pretty unhappy.

************************************************
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity RAM128 is
generic (
A: integer := 7;
WORDS: integer := 128;
M: integer := 8
);
port (
clk : in STD_LOGIC;
TxR : in STD_LOGIC;
TxW : in STD_LOGIC;
AddrTx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrTx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
DataTxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataTxOut : out STD_LOGIC_VECTOR(M-1 downto 0);
AddrRx1 : in STD_LOGIC_VECTOR(A-1 downto 0);
AddrRx2 : in STD_LOGIC_VECTOR(A-1 downto 0);
RxW : in STD_LOGIC;
RxR : in STD_LOGIC;
DataRxIn : in STD_LOGIC_VECTOR(M-1 downto 0);
DataRxOut : out STD_LOGIC_VECTOR(M-1 downto 0)
);
end RAM128;

architecture RAM128_arch of RAM128 is

subtype cell is std_logic_vector(M-1 downto 0);
type ramArray is array (0 to WORDS-1) of cell;
signal ram: ramArray;
signal AddrMatch :std_logic;

begin
AddrMatch <= '1' when (AddrTx1 = AddrRx2) else '0';

process(clk, AddrTx1, AddrRx2, TxW, RxW, AddrMatch)
begin
if (clk'event and clk = '1')then
if (TxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;

if (RxW = '1') and (AddrMatch = '0')then
ram(CONV_INTEGER(unsigned(AddrRx2))) <= DataRxIn;
else
if (TxW = '1') and (AddrMatch = '1') and (RxW = '1') then
ram(CONV_INTEGER(unsigned(AddrTx1))) <= DataTxIn;
end if;
end if;
end if;
end process;

process(RxR, AddrRx1, ram)
begin
if (RxR = '1')then
DataRxOut <= ram(CONV_INTEGER(unsigned(AddrRx1)));
else
DataRxout <= (others => '0');
end if;
end process;

process(TxR, AddrTx2, ram)
begin
if (TxR = '1')then
DataTxOut <= ram(CONV_INTEGER(unsigned(AddrTx2)));
else
DataTxOut <= (others => '0');
end if;
end process;

end RAM128_arch;
******************************************************************

Did any experts there can help to tackle this problem?
Did any good ideas there to reduce the time of compilation ?
This makes me crazy!
Please help to post your appreciatable ideas!!!
 

Welcome to EDABoard.com

Sponsor

Back
Top