Can I use Verilog or SystemVerilog to write a state machine

W

Weng Tianxiang

Guest
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng
 
On 05/01/2019 05:29, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Clock gating can be written in any language you like. It's FPGAs that
don't support clock gating.

Nicolas
 
On Saturday, January 5, 2019 at 3:44:39 AM UTC-8, Nicolas Matringe wrote:
On 05/01/2019 05:29, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Clock gating can be written in any language you like. It's FPGAs that
don't support clock gating.

Nicolas

Hi Nicolas,

I am asking if Verilog or SystemVerilog has the ability to automatically generate a state machine with clock gating function without any extra new statements? For example, do they have an attribute if the attribute being set the state machine generated will have the clock gating function?

At least VHDL-2008 does not have the ability.

Thank you.

Weng
 
On 05/01/2019 15:18, Weng Tianxiang wrote:

Hi Nicolas,

I am asking if Verilog or SystemVerilog has the ability to automatically generate a state machine with clock gating function without any extra new statements? For example, do they have an attribute if the attribute being set the state machine generated will have the clock gating function?

Well then I don't know what that "clock gating function" is, I'm sorry.

Nicolas
 
Apparently you cannot, but yes it can be done by others. It can also be written in VHDL but apparently you don't like how to do that so you state that it can't be done. Perhaps you should more clearly state your problem.

Kevin
 
Weng Tianxiang <wtxwtx@gmail.com> wrote:

I am asking if Verilog or SystemVerilog has the ability to automatically
generate a state machine with clock gating function without any extra new
statements?

What do you mean 'extra new statements'? This looks to me like clock
gating:


input clk;
input enable;
wire gated;

assign gated = clk & enable;

always @(posedge gated) begin
....
end


For example, do they have an attribute if the attribute being
set the state machine generated will have the clock gating function?

I don't know what you mean by that. (System)Verilog's abstraction doesn't
generate abstract state machines, it just allows you to write them.
Whatever synthesis tools do with that code is up to them. I presume tools
could pick up the above style if they so desire (I don't know if any ASIC
tools do but expect they would).

Theo
 
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng

One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
.... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.
 
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng


One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Actually I realized how to implement the power consumption scheme in VHDL as follows after the post is posted:

type STATE_TYPE is (s0, s1, ..., Sn);

signal WState, WState_NS: STATE_TYPE;

....;
a: process(clk)
begin
if rising_edge(clk) then
if SINI then
WState <= S0;

elsif WState /= WState_NS then -- WState /= WState_NS is necessary!
WState <= WState_NS;
end if;
end if;
end process;

b: process(all)
begin
case WState is
when S0 =>
if C00 then
WState_NS <= S1;

elsif C01 then
WState_NS <= S2;

else
WState_NS <= S0;
end if;

...;
end case;
end process;

Thank you.

Weng
 
On 1/5/19 8:23 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng


One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Thank you.

Weng

One issue with gated clocks is that each gating of the clock needs to be
considered a different clock domain from every other gating of the clock
and from the ungated clock, because the gating (and rebuffering) of the
clock introduces a delay in the clock, so you need to take precautions
when the signal passes from one domain to another. A FPGA might have,
and a gate array may provide a special circuit to generate a set of
gated clocks that will be kept in good enough alignment to not need
this, but then that would be a special application macro that needs to
be instanced.

Second, the power consumption between my first and second method (actual
gating of the clock and using a clock enable) is primarily in the power
to drive the clock line as the clock enable also keeps the state the
same in the 'skipped' clock cycle.
 
On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
On 1/5/19 8:23 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng


One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Thank you.

Weng


One issue with gated clocks is that each gating of the clock needs to be
considered a different clock domain from every other gating of the clock
and from the ungated clock, because the gating (and rebuffering) of the
clock introduces a delay in the clock, so you need to take precautions
when the signal passes from one domain to another. A FPGA might have,
and a gate array may provide a special circuit to generate a set of
gated clocks that will be kept in good enough alignment to not need
this, but then that would be a special application macro that needs to
be instanced.

Second, the power consumption between my first and second method (actual
gating of the clock and using a clock enable) is primarily in the power
to drive the clock line as the clock enable also keeps the state the
same in the 'skipped' clock cycle.

Hi Richard,

There are 2 things to consider on how to generate a clock gating function:
1. Generate CE logic.
2. Make gated clock signal working properly.

You address the part 2) and I emphasize on the part 1).

Is it complex to generate CE logic?

In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.

I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function.

Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.

Only CPU designers know their implementation. I need the information.

Thank you.

Weng
 
On 1/6/19 12:08 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
On 1/5/19 8:23 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng


One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Thank you.

Weng


One issue with gated clocks is that each gating of the clock needs to be
considered a different clock domain from every other gating of the clock
and from the ungated clock, because the gating (and rebuffering) of the
clock introduces a delay in the clock, so you need to take precautions
when the signal passes from one domain to another. A FPGA might have,
and a gate array may provide a special circuit to generate a set of
gated clocks that will be kept in good enough alignment to not need
this, but then that would be a special application macro that needs to
be instanced.

Second, the power consumption between my first and second method (actual
gating of the clock and using a clock enable) is primarily in the power
to drive the clock line as the clock enable also keeps the state the
same in the 'skipped' clock cycle.

Hi Richard,

There are 2 things to consider on how to generate a clock gating function:
1. Generate CE logic.
2. Make gated clock signal working properly.

You address the part 2) and I emphasize on the part 1).

Is it complex to generate CE logic?

In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.

I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function.

Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.

Only CPU designers know their implementation. I need the information.

Thank you.

Weng

Actually gating the clock is a single gate (but then in an ASIC it can't
drive much logic, so things start to get more complicated). Making it
work gets things much more complicated, and probably gets you out of the
domain of portable Verilog or VHDL. That is the nature of clock trees.

Thus, step one is in a sense trivial if you are ignoring step two, but
doing step one while ignoring step two is worthless.

I personally don't know whether it is simpler/better to add the clock
enable functionality to the flip flops or gate the clock and deal with
all the timing/buffering issues, and it wouldn't surprise me if it
turned out that which is better very much depends on the process and
other criteria.

The only real answer would be to talk to the process people, but my
guess is that the answer is very much proprietary, and unless it looks
like you are willing and planning on spending the big bucks to actually
do this, won't waste their time talking about it.
 
On Sunday, January 6, 2019 at 1:20:16 PM UTC-5, Richard Damon wrote:
On 1/6/19 12:08 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
On 1/5/19 8:23 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng


One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Thank you.

Weng


One issue with gated clocks is that each gating of the clock needs to be
considered a different clock domain from every other gating of the clock
and from the ungated clock, because the gating (and rebuffering) of the
clock introduces a delay in the clock, so you need to take precautions
when the signal passes from one domain to another. A FPGA might have,
and a gate array may provide a special circuit to generate a set of
gated clocks that will be kept in good enough alignment to not need
this, but then that would be a special application macro that needs to
be instanced.

Second, the power consumption between my first and second method (actual
gating of the clock and using a clock enable) is primarily in the power
to drive the clock line as the clock enable also keeps the state the
same in the 'skipped' clock cycle.

Hi Richard,

There are 2 things to consider on how to generate a clock gating function:
1. Generate CE logic.
2. Make gated clock signal working properly.

You address the part 2) and I emphasize on the part 1).

Is it complex to generate CE logic?

In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.

I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function.

Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.

Only CPU designers know their implementation. I need the information.

Thank you.

Weng


Actually gating the clock is a single gate (but then in an ASIC it can't
drive much logic, so things start to get more complicated). Making it
work gets things much more complicated, and probably gets you out of the
domain of portable Verilog or VHDL. That is the nature of clock trees.

Thus, step one is in a sense trivial if you are ignoring step two, but
doing step one while ignoring step two is worthless.

I personally don't know whether it is simpler/better to add the clock
enable functionality to the flip flops or gate the clock and deal with
all the timing/buffering issues, and it wouldn't surprise me if it
turned out that which is better very much depends on the process and
other criteria.

The only real answer would be to talk to the process people, but my
guess is that the answer is very much proprietary, and unless it looks
like you are willing and planning on spending the big bucks to actually
do this, won't waste their time talking about it.

Sure, in full custom ASICs it is not uncommon to gate the clock. In fast chips the clock tree design can consume half the dynamic power in the chip. So gating the clock can bring significant power savings. However, the clock gating being described here is over far too small a portion of the chip to be effective on many levels if I understand what is going on. The OP is talking about 100,000 identical state machines, one for each cache item. I believe what he is talking about as FSMs are really just a handful of FFs but I'm not sure. If so, the clock gating logic is nearly as large and so would consume nearly as much power and area as the logic it is controlling..

Will it be practical to design 100,000 clock gating circuits to control 100,000 tiny FSMs? Maybe I am wrong about the size of the FSMs. Or maybe it would be practical to combine the clock gating to many of the 100,000 FSMs so they are shut off in large blocks? I don't know, but the OP seems preoccupied with the idea of this being a language feature rather than a design feature added by the user. I'm sure he wants to produce an idea using a library or something that he can patent. That seems to be his MO. Oh well....

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209
 
On Sunday, January 6, 2019 at 11:23:10 AM UTC-8, gnuarm.del...@gmail.com wrote:
On Sunday, January 6, 2019 at 1:20:16 PM UTC-5, Richard Damon wrote:
On 1/6/19 12:08 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
On 1/5/19 8:23 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng


One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption..

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Thank you.

Weng


One issue with gated clocks is that each gating of the clock needs to be
considered a different clock domain from every other gating of the clock
and from the ungated clock, because the gating (and rebuffering) of the
clock introduces a delay in the clock, so you need to take precautions
when the signal passes from one domain to another. A FPGA might have,
and a gate array may provide a special circuit to generate a set of
gated clocks that will be kept in good enough alignment to not need
this, but then that would be a special application macro that needs to
be instanced.

Second, the power consumption between my first and second method (actual
gating of the clock and using a clock enable) is primarily in the power
to drive the clock line as the clock enable also keeps the state the
same in the 'skipped' clock cycle.

Hi Richard,

There are 2 things to consider on how to generate a clock gating function:
1. Generate CE logic.
2. Make gated clock signal working properly.

You address the part 2) and I emphasize on the part 1).

Is it complex to generate CE logic?

In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.

I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function.

Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.

Only CPU designers know their implementation. I need the information.

Thank you.

Weng


Actually gating the clock is a single gate (but then in an ASIC it can't
drive much logic, so things start to get more complicated). Making it
work gets things much more complicated, and probably gets you out of the
domain of portable Verilog or VHDL. That is the nature of clock trees.

Thus, step one is in a sense trivial if you are ignoring step two, but
doing step one while ignoring step two is worthless.

I personally don't know whether it is simpler/better to add the clock
enable functionality to the flip flops or gate the clock and deal with
all the timing/buffering issues, and it wouldn't surprise me if it
turned out that which is better very much depends on the process and
other criteria.

The only real answer would be to talk to the process people, but my
guess is that the answer is very much proprietary, and unless it looks
like you are willing and planning on spending the big bucks to actually
do this, won't waste their time talking about it.

Sure, in full custom ASICs it is not uncommon to gate the clock. In fast chips the clock tree design can consume half the dynamic power in the chip.. So gating the clock can bring significant power savings. However, the clock gating being described here is over far too small a portion of the chip to be effective on many levels if I understand what is going on. The OP is talking about 100,000 identical state machines, one for each cache item. I believe what he is talking about as FSMs are really just a handful of FFs but I'm not sure. If so, the clock gating logic is nearly as large and so would consume nearly as much power and area as the logic it is controlling.

Will it be practical to design 100,000 clock gating circuits to control 100,000 tiny FSMs? Maybe I am wrong about the size of the FSMs. Or maybe it would be practical to combine the clock gating to many of the 100,000 FSMs so they are shut off in large blocks? I don't know, but the OP seems preoccupied with the idea of this being a language feature rather than a design feature added by the user. I'm sure he wants to produce an idea using a library or something that he can patent. That seems to be his MO. Oh well....

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209

Hi Rick,
You misunderstand and ~100,000 state machines are even coded as the same but with different input signals and output signals, act differently and you cannot "combine the clock gating to many of the 100,000 FSMs". Each has more than 10 states, so each state machine must have 4 registers to implement and each has its clock gating logic and clock gating device.

Weng
 
On 1/6/19 4:30 PM, Weng Tianxiang wrote:
Hi Rick,
You misunderstand and ~100,000 state machines are even coded as the same but with different input signals and output signals, act differently and you cannot "combine the clock gating to many of the 100,000 FSMs". Each has more than 10 states, so each state machine must have 4 registers to implement and each has its clock gating logic and clock gating device.

Weng

If you are really talking gating for 4 FFs, than my guess is that using
Clock Enabled ffs would be much simpler and probably better than trying
to gate the clock and keeping things synchronized.

The big issue would be that to make the gated clocking work you may need
double the clock distribution tree, one for an 'early' clock that is to
be gated, and a second 'late' clock that ungated parts of the system
used that will line up with the gated clocks. This need for the second
clock distribution tree probably eats up more power than you are saving
by stopping the clock to those flip flops.

The primary alternative to two clocks would be running on opposite edges
(so skew isn't as much of a problem), but that then limits the speed the
system can run at.
 
I want to use my method in all types of circuits. A clock gating device is basically a latch. A FF with a clock enable input is a FF having a latch. Thank you.
 
Am Samstag, 5. Januar 2019 05:30:08 UTC+1 schrieb Weng Tianxiang:
Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

All languages support clock gating when explicit expressed and no language has an implicit statement for it.
This is as the clock is not really anything special in the language [1] and clock gating has several side effects that needs to be dealed with during layout. But in many cases you need to deal with some implications of clock gating during architectural design phase when writing the code.

[1] rising_edge(enable) or rising_edge(clock) have no difference for the language but very different results when using synthesis tools

bye Thomas
 
On Saturday, January 5, 2019 at 8:23:43 PM UTC-5, Weng Tianxiang wrote:
In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

That is your unsubstantiated claim, not a fact.

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Any perceived lower power consumption has very, very little to do with the fact that the state does not change. A flip flop that is clocked but does not happen to change its output does not consume much power. The power is needed to charge/discharge the loads that are being driven. Any decreased power consumption would have to do with the decrease in power in generating the clock input to the flip flop. But shifting from a common clock to adding a gate that generates a clock probably does not lower power since the same number of clock signals are being generated. If the gated clock routing is a higher capacitive route then when using a free-running clock then you can consume more power. This is the result when trying to implement gated clocks in FPGA. ASIC will be different.

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

As I pointed out to you back in 2010 (I think), implementing what you describe in an FPGA results in an increase in power consumption. I provided you with all of the details for your sample design. The results of that analysis are not "because too few state machines are implemented", it is because gated clocks in FPGA use more power, not less. Again, that was with your sample design of that time which appears to be the same thing you are reusing here.

Actually I realized how to implement the power consumption scheme in VHDL as follows after the post is posted:

I noticed that you did not show the actual gating of the clock, only the apparent usage of a possibly free running clock.

a: process(clk)
begin
if rising_edge(clk) then

Also, the following 'elsif' is not necessary even though your comment says it is. No worries though, synthesis tools should optimize out the 'elsif' and leave the assignment 'WState <= WState_NS;' on every clock. If the tool somehow leaves it in, then there will be an increase in power consumption due to use of additional logic required to implement 'elsif WState /= WState_NS then'. That increase would need to be counted against any power savings that you think you're achieving. Again, it would probably be worthwhile for you to do some analysis prior to posting and claiming...but after all these years of not acting on this advice it doesn't appear that you're willing to make that behavioral change.
elsif WState /= WState_NS then -- WState /= WState_NS is necessary!
WState <= WState_NS;
end if;
end if;
end process;

I suspect that you did not actually test any of this prior to posting and claiming since the code is not complete and does not compile...as usual.

Kevin
 
On Monday, January 7, 2019 at 5:11:05 AM UTC-8, KJ wrote:
On Saturday, January 5, 2019 at 8:23:43 PM UTC-5, Weng Tianxiang wrote:

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

That is your unsubstantiated claim, not a fact.


when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.


Any perceived lower power consumption has very, very little to do with the fact that the state does not change. A flip flop that is clocked but does not happen to change its output does not consume much power. The power is needed to charge/discharge the loads that are being driven. Any decreased power consumption would have to do with the decrease in power in generating the clock input to the flip flop. But shifting from a common clock to adding a gate that generates a clock probably does not lower power since the same number of clock signals are being generated. If the gated clock routing is a higher capacitive route then when using a free-running clock then you can consume more power. This is the result when trying to implement gated clocks in FPGA. ASIC will be different.


For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

As I pointed out to you back in 2010 (I think), implementing what you describe in an FPGA results in an increase in power consumption. I provided you with all of the details for your sample design. The results of that analysis are not "because too few state machines are implemented", it is because gated clocks in FPGA use more power, not less. Again, that was with your sample design of that time which appears to be the same thing you are reusing here.

Actually I realized how to implement the power consumption scheme in VHDL as follows after the post is posted:

I noticed that you did not show the actual gating of the clock, only the apparent usage of a possibly free running clock.

a: process(clk)
begin
if rising_edge(clk) then

Also, the following 'elsif' is not necessary even though your comment says it is. No worries though, synthesis tools should optimize out the 'elsif' and leave the assignment 'WState <= WState_NS;' on every clock. If the tool somehow leaves it in, then there will be an increase in power consumption due to use of additional logic required to implement 'elsif WState /= WState_NS then'. That increase would need to be counted against any power savings that you think you're achieving. Again, it would probably be worthwhile for you to do some analysis prior to posting and claiming...but after all these years of not acting on this advice it doesn't appear that you're willing to make that behavioral change.
elsif WState /= WState_NS then -- WState /= WState_NS is necessary!
WState <= WState_NS;
end if;
end if;
end process;

I suspect that you did not actually test any of this prior to posting and claiming since the code is not complete and does not compile...as usual.

Kevin

Hi,

There are several experts responding to my post. Thank you. Noticeably I do not find Hans of www.ht-lab.com giving his opinion. Usually his opinion is reasonable and informative and he knows many things outside the FPGA chips beyond my knowledge.

Here is the background for the purpose of my post:
1. On 12/31/2018 I filed a non-provisional patent application. I asked for earlier publication. The publication will happen about 14 weeks later since its filing date.

2. On 01/06/2019 I sent it in almost the same version as a regular paper to IEEE Transaction of circuits and System for publication. The review process may take up to 3 months.

Because IEEE Transaction strict restriction on the paper's originality, I cannot disclose any details about my invention until the transaction agrees to publish my paper 3 months later or rejects my paper in 1 or 2 weeks.

Here are some facts of my invention:
1. The logic used to generate a state machine with clock gating devices is almost the same as conventional method would generate, or maybe even simpler than conventional method.

2. I don't know how CPU deals with its 100,000*4 FFs clocking scheme used in state machines for the Cache II control. If they don't care about the power saving or they have implemented some scheme in the implementation, my invention would be of few values, or otherwise it would be worth million of dollars.

3. My post's purpose is to test if such invention is of any value, not about how to implement a state machine with clock gating function.

4. After my application is published 3 months later I will immediately register and sell the application at http://www.ast.com/interested-in-selling-to-ast/. I know the website because Google refers to the website and indicates they are a member of the site. I expect that Intel, IBM, AMD, Apple may also be the members of the website. The site asks for the selling price during registration. So it is important for me to assess my invention's value properly.

5. I think no developing persons at Intel, IBM, AMD, Apple would visit this website, not mention taking part in the discussion of my post.

6. I hope I will discuss the invention in more details 3 months later before my registrations in the patent selling website.

7. Xilinx chip has clock enable signal built into its cell block, one CE input for 8 registers in the block. Altera may be in the same situation. So clock enable is never a new thing and we don't have to pay attention to how the clock trees work. For a CPU design, in my opinion, logic design and clock tree design are 2 separated domains one after another, and logic designers never have to pay attention to the clock trees.

Thank you.

Weng
 
On Sunday, January 6, 2019 at 4:30:27 PM UTC-5, Weng Tianxiang wrote:
On Sunday, January 6, 2019 at 11:23:10 AM UTC-8, gnuarm.del...@gmail.com wrote:
On Sunday, January 6, 2019 at 1:20:16 PM UTC-5, Richard Damon wrote:
On 1/6/19 12:08 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
On 1/5/19 8:23 PM, Weng Tianxiang wrote:
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng


One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
if(gate) begin
... state machine here.
end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Thank you.

Weng


One issue with gated clocks is that each gating of the clock needs to be
considered a different clock domain from every other gating of the clock
and from the ungated clock, because the gating (and rebuffering) of the
clock introduces a delay in the clock, so you need to take precautions
when the signal passes from one domain to another. A FPGA might have,
and a gate array may provide a special circuit to generate a set of
gated clocks that will be kept in good enough alignment to not need
this, but then that would be a special application macro that needs to
be instanced.

Second, the power consumption between my first and second method (actual
gating of the clock and using a clock enable) is primarily in the power
to drive the clock line as the clock enable also keeps the state the
same in the 'skipped' clock cycle.

Hi Richard,

There are 2 things to consider on how to generate a clock gating function:
1. Generate CE logic.
2. Make gated clock signal working properly.

You address the part 2) and I emphasize on the part 1).

Is it complex to generate CE logic?

In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.

I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function.

Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.

Only CPU designers know their implementation. I need the information.

Thank you.

Weng


Actually gating the clock is a single gate (but then in an ASIC it can't
drive much logic, so things start to get more complicated). Making it
work gets things much more complicated, and probably gets you out of the
domain of portable Verilog or VHDL. That is the nature of clock trees..

Thus, step one is in a sense trivial if you are ignoring step two, but
doing step one while ignoring step two is worthless.

I personally don't know whether it is simpler/better to add the clock
enable functionality to the flip flops or gate the clock and deal with
all the timing/buffering issues, and it wouldn't surprise me if it
turned out that which is better very much depends on the process and
other criteria.

The only real answer would be to talk to the process people, but my
guess is that the answer is very much proprietary, and unless it looks
like you are willing and planning on spending the big bucks to actually
do this, won't waste their time talking about it.

Sure, in full custom ASICs it is not uncommon to gate the clock. In fast chips the clock tree design can consume half the dynamic power in the chip. So gating the clock can bring significant power savings. However, the clock gating being described here is over far too small a portion of the chip to be effective on many levels if I understand what is going on. The OP is talking about 100,000 identical state machines, one for each cache item. I believe what he is talking about as FSMs are really just a handful of FFs but I'm not sure. If so, the clock gating logic is nearly as large and so would consume nearly as much power and area as the logic it is controlling.

Will it be practical to design 100,000 clock gating circuits to control 100,000 tiny FSMs? Maybe I am wrong about the size of the FSMs. Or maybe it would be practical to combine the clock gating to many of the 100,000 FSMs so they are shut off in large blocks? I don't know, but the OP seems preoccupied with the idea of this being a language feature rather than a design feature added by the user. I'm sure he wants to produce an idea using a library or something that he can patent. That seems to be his MO. Oh well...

Rick C.

- Get 6 months of free supercharging
- Tesla referral code - https://ts.la/richard11209

Hi Rick,
You misunderstand and ~100,000 state machines are even coded as the same but with different input signals and output signals, act differently and you cannot "combine the clock gating to many of the 100,000 FSMs". Each has more than 10 states, so each state machine must have 4 registers to implement and each has its clock gating logic and clock gating device.

So you know what the clock gating circuity would look like? Try comparing that circuit to the FSM circuit. You will see they are comparable in size and the gating circuit adds to the timing delay as well.

Please keep in mind that the 4 FFs in a single FSM can be lumped together with the 4 FFs from another FSM in your analysis to consider them to be a single FSM for the purposes of clock gating. When any one FSM is active you can make the entire circuit active. This still retains the clock power savings for all the remaining 99,998 FSMs not in that circuit.

I'm not sure this will provide much in the way of logic savings. But I am confident no one is going to want to implement clock gating circuits for each 100,000 FSMs independently. But then it seems they are scrounging around for ways to improve power consumption of CPUs these days and there are lots of transistors available. I'm also a guy who thought cell phones would not be widely accepted. lol


Rick C.

+ Get 6 months of free supercharging
+ Tesla referral code - https://ts.la/richard11209
 
If 2 state machines as you suggested may be active on the same clock, how do you handle it using your scheme?
 

Welcome to EDABoard.com

Sponsor

Back
Top