Question about enable on DRAM module.

P

pallav

Guest
Hi all,

I'm playing around with a very simple DRAM module in Verilog.
According to my fictitious specification, I have the following:

"Any type of memory can be added. The memory should expect an enable
input when the cache system requests a read/write. Once the memory has
performed the operation, and driven the necessary output ports, it
asserts done."

Based on that, my code fro the memory module is below.

The question I have is regarding data. Currently its: assign data =
(rwb) ? DRAM[addr] : 32'bz.
Should this be made to say assign data = (en & rwb) ? DRAM[addr] :
32'bz;

Similarly, should the writing to DRAM be made to say

always @(posedge ph1)
if (en & ~rwb)
DRAM[addr] <= {byte3, byte2, byte1, byte0};

I'm wondering a scenario where en was low and rwb/addr changed. Then
I'd be incorrectly writing or changing the data port. My question is
it the responsibility of the DRAM module to ensure this doesn't
happen, or is it the responsibility of the block sending the signal to
ensure rwb/addr don't change if en is deasserted.

Which would be the preferred way? Thanks for any help.

Kind regards,
--------------------------------------------------------------------------------

`timescale 1ns / 1ps

`define DRAM_TEST_PROGRAM "../tests/mult.bin"

`define MEMSYS_INSTR_PROGRAM "../tests/mult.bin"
`define MEMSYS_DATA_PROGRAM "../tests/fib.bin"

/*
* Generic DRAM module. Data width is 32 bits and big-endian (address
is MSB
* byte) is assumed.
*
* SIZE = 2^{ADDRIDX}.
*/
module dram #(parameter SIZE = 8192, parameter ADDRIDX = 13)
(input ph1, ph2, reset, // non-overlapping clocks/reset
input [ADDRIDX-1:0] addr, // address
inout [31:0] data, // bi-directional data
input [3:0] bytemask, // byte enable mask
input rwb, en, // read/write bar, memory enable
output done); // memory operation complete

reg [31:0] DRAM[SIZE-1:0];

// assign bytes of data to write depending upon bytemask.
wire [7:0] byte3, byte2, byte1, byte0;
assign byte0 = bytemask[0] ? data[7:0] : DRAM[addr][7:0];
assign byte1 = bytemask[1] ? data[15:8] : DRAM[addr][15:8];
assign byte2 = bytemask[2] ? data[23:16] : DRAM[addr][23:16];
assign byte3 = bytemask[3] ? data[31:24] : DRAM[addr][31:24];

// if reading drive data on bus; otherwise bus is high impedance.
assign data = (rwb) ? DRAM[addr] : 32'bz;

// for memory op, simulate 2 cycle delay by creating a 3 state FSM,
enable
// done when we reach state 3.
wire [1:0] state;
reg [1:0] nextstate;

dflopr #(2) dramstate(.ph1(ph1), .ph2(ph2), .reset(reset), .d
(nextstate), .q(state));

assign done = state[1];
always @(*)
case (state)
2'b00:
if (en)
nextstate <= 2'b01;
else
nextstate <= 2'b00;
2'b01:
if (en)
nextstate <= 2'b10;
else
nextstate <= 2'b00;
default:
nextstate <= 2'b00;
endcase

// write to memory (rwb = 0)
always @(posedge ph1)
if (~rwb)
DRAM[addr] <= {byte3, byte2, byte1, byte0};

// for DRAM testing only
`ifdef DRAM_TEST
initial
begin
$readmemh(`DRAM_TEST_PROGRAM, DRAM);
$display($time, ": Memory initialized with ",
`DRAM_TEST_PROGRAM, ".");
end
`endif

// for MEMSYS testing only
`ifdef MEMSYS_TEST
initial
begin
$readmemh(`MEMSYS_INSTR_PROGRAM, DRAM, 'h100);
$display($time, ": Memory initialized at 0x%h with ", 13'h100,
`MEMSYS_INSTR_PROGRAM, ".");
$readmemh(`MEMSYS_DATA_PROGRAM, DRAM, 'h200);
$display($time, ": Memory initialized at 0x%h with ", 13'h200,
`MEMSYS_DATA_PROGRAM, ".");
end
`endif
endmodule // dram
 
Hi Pallav,

a couple questions:

Why did you define two clocks ? What are they used for ?
(btw: most companies do enforce that clocks are called ck_??? or
clk_???)
I once read some code where the clocks were H1, H2 from the french
"horloge" and at times it was a little bit confusing.

Back to your questions:
1) If you want the 3-state function to work properly then you should
defintely do

assign data = (en & rwb) ? DRAM[addr] :
32'bz;

This is because otherwise the only way you have to not be driving the
output pins is to put the memory in write mode, and that is unsafe
(even if the write is protected by en). Very often in memory subsystem
the same rd/wr signal is sent to all memory blocks, only the enable
(or chip select) is used to qualify which memory is active
On top of that, in your code the write operation is not protected by
en :

always @(posedge ph1)
if (~rwb)
DRAM[addr] <= {byte3, byte2, byte1, byte0};

So the only way of putting the memory in 3state is to corrupt some
location.

And as per the last question yes, it is up to the user to make sure
the address does not change until the data has been consumed.

Finally, in this configuration the state machine is totally useless,
because your memory is completely asynchronous in read and only takes
on clock in write.
Your done could be just a one-clock-delay of (en & ~rwb)


If you don't mind, some additional considerations:

1) Your memory read is asynchronous, which means that as soon as your
address/en/rw change then your output changes.
This completely asynchronous behaviour is very unusual for real
memories, for a number of reasons. Some of which are:

- completely asynch read makes it difficult to prepare for the next
data. Everything must remain stable until the data is consumed.
Most asynch memories use the edge of en to latch the address and rwb,
so that they'll be stable internally for the duration of the read.
memory experts could add other explanations

- most embedded memories (and even component ones) are fully
synchronous. The output data is sampled by a flop before being sent
out.
This is because at the frequencies needed, it is not possible to do
everything in one clock cycle. You need to sample the outputs to allow
enough time for the data to travel back to the module/asic needing it.

2) It is really unclear to me what is the meaning of the state
machine.
Why state "01" goes back to "00" (then never sending a done) if en !=
0 ?
Normally, enable stays up for multiple clock cycles to signify that
multiple data is being transferred. In your case it seems it needs to
stay up until one clock before done is asserted, but then your memory
really executes two writes.

Feel free to ask for clarification if something I wrote is unclear

Marco
 
Hi Marco,

Thanks a lot for your detailed response. I'll answer many of your
questions inline.

Why did you define two clocks ? What are they used for ?
(btw: most companies do enforce that clocks are called ck_??? or
clk_???)
I once read some code where the clocks were H1, H2 from the french
"horloge" and at times it was a little bit confusing.
I agree ph1/ph2 is poor naming convention. I have changed them to clk1/
clk2 in all my files.
They are two non-overlapping clocks. Although, I mainly use clk1 in
most of the logic blocks,
I'm using a master-slave latch configuration for a d-flip flop (this
is because in my layouts, I
interconnect two 6 CMOS transistor latches for the D-flip flop. To
prevent any race conditions, the
easiest solution is non-overlapping of the clocks (so anywhere I need
a D-flop I need clk1/clk2).
In Verilog, I've defined my d-flop (w/reset) as follows:

module dflopr #(parameter WIDTH = 32)
(input clk1, clk2, reset, // non-overlapping clocks, reset
input [WIDTH-1:0] d, // in data
output reg [WIDTH-1:0] q); // out data

reg [WIDTH-1:0] master; // master storage

// set 0 on reset. otherwise latch data onto master
always @(clk2, d, reset)
if (clk2)
#1 master <= reset ? 0 : d;

// transfer data from master to slave
always @(clk1, master)
if (clk1)
#1 q <= master;
endmodule // flopr

Back to your questions:
1) If you want the 3-state function to work properly then you should
defintely do

assign data = (en & rwb) ? DRAM[addr] :
32'bz;

This is because otherwise the only way you have to not be driving the
output pins is to put the memory in write mode, and that is unsafe
(even if the write is protected by en). Very often in memory subsystem
the same rd/wr signal is sent to all memory blocks, only the enable
(or chip select) is used to qualify which memory is active
On top of that, in your code the write operation is not protected by
en :

always @(posedge ph1)
     if (~rwb)
       DRAM[addr] <= {byte3, byte2, byte1, byte0};

So the only way of putting the memory in 3state is to corrupt some
location.

And as per the last question yes, it is up to the user to make sure
the address does not change until the data has been consumed.

Finally, in this configuration the state machine is totally useless,
because your memory is completely asynchronous in read and only takes
on clock in write.
Your done could be just a one-clock-delay of (en & ~rwb)

If you don't mind, some additional considerations:

1) Your memory read is asynchronous, which means that as soon as your
address/en/rw change then your output changes.
This completely asynchronous behaviour is very unusual for real
memories, for a number of reasons. Some of which are:

- completely asynch read makes it difficult to prepare for the next
data. Everything must remain stable until the data is consumed.
Most asynch memories use the edge of en to latch the address and rwb,
so that they'll be stable internally for the duration of the read.
memory experts could add other explanations

- most embedded memories (and even component ones) are fully
synchronous. The output data is sampled by a flop before being sent
out.
This is because at the frequencies needed, it is not possible to do
everything in one clock cycle. You need to sample the outputs to allow
enough time for the data to travel back to the module/asic needing it.
I see. I think then it is better to use fully synchronous behavior on
both read/write and I should
change my code as follows:

reg [31:0] data_out;

assign data = (rwb) ? data_out : 32'bz;
always @(posedge clk1)
begin
if (en & rwb)
data_out <= DRAM[addr];

if (en & ~rwb)
DRAM[addr] <= {byte3, byte2, byte1, byte0};
end

So now data_out will only change synchronously if we are reading and
memory is enabled. Even though
output data is asynchronously updated it is being sampled from the
data_out flop. Is this the correct way to
define the behavior?


2) It is really unclear to me what is the meaning of the state
machine.
Why state "01" goes back to "00" (then never sending a done) if en !> 0 ?
Normally, enable stays up for multiple clock cycles to signify that
multiple data is being transferred. In your case it seems it needs to
stay up until one clock before done is asserted, but then your memory
really executes two writes.
The state machine doesn't do anything worthwhile. The point of keeping
it there
was to "simulate" a 2 cycle delay before done is asserted to indicate
memory operation complete.
That is, suppose at time 0, a request comes in. The user sets up
address, rwb, and en. All these
signals will not change until the user receives a high done from the
memory module 2 clock cycles later.

So I just have a counter that is going from state[1:0] = 00 -> 01 ->
10 (and done = state[1]). So when it reaches
state 10 (i.e., 2 clock cycles have elapsed), done is asserted. In
light of what you have said, I think a better approach might be like
this:

reg [31:0] data_out;
reg [1:0] counter;

assign data = (rwb) ? data_out : 32'bz;

initial
counter <= 2'b00;

// negative edge of clk1 increment counter if enable
always @(negedge clk1)
if (en) counter <= counter + 1;

// positive edge of clk1 do write/read if counter = 10 and mem enabled
always @(posedge clk1)
begin
if (en & rwb & counter == 2'10)
begin
data_out <= DRAM[addr];
done <= 1'b1;
counter <= 2'b00;
end
if (en & ~rwb & counter == 2'b10)
begin
DRAM[addr] <= {byte3, byte2, byte1, byte0};
done <= 1'b1;
counter <= 2'b00;
end
end

Is this better?

Thanks for all the information you have shared. I'll also browse
through some of the available memory
cores on opencores.org to see how things are implemented for "real"
memories.

Kind regards,
pallav
 
Pallav,

sorry for the delay.


About phi1, phi2 (clk1/clk2) : our of curiosity: what is the reason
that makes you use two latches instead of a flop cell ?
Is it a full custom layout ?
I am asking because all the stdcell libraries only require clk, the
clk_l is generated internally in the cell. This has the disadvantage
of making the flop slightly bigger, but at the same time the huge
simplification of distributing only one clock signal, with no need to
keep it aligned to a second one (don't want to think about doing it,
when considering OCV and the rest)

Some other answers inline:

assign data = (rwb) ? data_out : 32'bz;
always @(posedge clk1)
begin
if (en & rwb)
data_out <= DRAM[addr];

if (en & ~rwb)
DRAM[addr] <= {byte3, byte2, byte1, byte0};
end

So now data_out will only change synchronously if we are reading and
memory is enabled. Even though
output data is asynchronously updated it is being sampled from the
data_out flop. Is this the correct way to
define the behavior?
I have no knowledge about standalone memories, either sram or dram,
but in an embedded sram this is more or less what would happen:

The address, operation are latched, if qualified by chip select, at
the raising edge.
The read circuitry is always asynchronously selecting the proper read
path, but the sense amps are turned on only if cs is enabled.
The latching is there to guarantee stability to the inputs, so that
the user can change them to prepare for the next operation.

The output data is usually NOT latched: the data out is valid after
the access time.
In reality there is some additional logic in some memories to keep
data out stable when the memory is not accessed or being written.

So in your case the logic would more or less look like

alwyas @(clk, reset)
begin
r_rden <= (en && rwb)

if (en)
begin
r_addr <= addr
r_din <= din
r_rwb <= rwb
end

if (r_rden)
begin
r_dataout <= dataout
end
end

(please not I did not add any punctuation)

assign dataout = r_rden ? DRAM[r_addr] : r_dataout;

Please note I did not add any 3-state to the output, as most asic
vendors do not like at all having 3state busses inside the chip


Last item:

If you're modeling a fictional memory you're obviously free to do
whatever you like in terms of access method, but the "memory done" is
really never seen (or at least I never saw it in the past 9yrs).
This because it would limit drastically the memory bandwidth.
In general, people try to build systems that can accept bursts. If the
system is too slow to do everything in one cycle you pipeline it, but
don't' stall the user waiting for the info to complete

Ciao, Marco.
 

Welcome to EDABoard.com

Sponsor

Back
Top