Can I use Verilog or SystemVerilog to write a state machine

On Monday, January 7, 2019 at 2:21:51 PM UTC-5, Weng Tianxiang wrote:
On Monday, January 7, 2019 at 5:11:05 AM UTC-8, KJ wrote:
On Saturday, January 5, 2019 at 8:23:43 PM UTC-5, Weng Tianxiang wrote:

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

That is your unsubstantiated claim, not a fact.


when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.


Any perceived lower power consumption has very, very little to do with the fact that the state does not change. A flip flop that is clocked but does not happen to change its output does not consume much power. The power is needed to charge/discharge the loads that are being driven. Any decreased power consumption would have to do with the decrease in power in generating the clock input to the flip flop. But shifting from a common clock to adding a gate that generates a clock probably does not lower power since the same number of clock signals are being generated. If the gated clock routing is a higher capacitive route then when using a free-running clock then you can consume more power. This is the result when trying to implement gated clocks in FPGA. ASIC will be different.


For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

As I pointed out to you back in 2010 (I think), implementing what you describe in an FPGA results in an increase in power consumption. I provided you with all of the details for your sample design. The results of that analysis are not "because too few state machines are implemented", it is because gated clocks in FPGA use more power, not less. Again, that was with your sample design of that time which appears to be the same thing you are reusing here.

Actually I realized how to implement the power consumption scheme in VHDL as follows after the post is posted:

I noticed that you did not show the actual gating of the clock, only the apparent usage of a possibly free running clock.

a: process(clk)
begin
if rising_edge(clk) then

Also, the following 'elsif' is not necessary even though your comment says it is. No worries though, synthesis tools should optimize out the 'elsif' and leave the assignment 'WState <= WState_NS;' on every clock. If the tool somehow leaves it in, then there will be an increase in power consumption due to use of additional logic required to implement 'elsif WState /= WState_NS then'. That increase would need to be counted against any power savings that you think you're achieving. Again, it would probably be worthwhile for you to do some analysis prior to posting and claiming...but after all these years of not acting on this advice it doesn't appear that you're willing to make that behavioral change.
elsif WState /= WState_NS then -- WState /= WState_NS is necessary!
WState <= WState_NS;
end if;
end if;
end process;

I suspect that you did not actually test any of this prior to posting and claiming since the code is not complete and does not compile...as usual.

Kevin

Hi,

There are several experts responding to my post. Thank you. Noticeably I do not find Hans of www.ht-lab.com giving his opinion. Usually his opinion is reasonable and informative and he knows many things outside the FPGA chips beyond my knowledge.

Here is the background for the purpose of my post:
1. On 12/31/2018 I filed a non-provisional patent application. I asked for earlier publication. The publication will happen about 14 weeks later since its filing date.

2. On 01/06/2019 I sent it in almost the same version as a regular paper to IEEE Transaction of circuits and System for publication. The review process may take up to 3 months.

Because IEEE Transaction strict restriction on the paper's originality, I cannot disclose any details about my invention until the transaction agrees to publish my paper 3 months later or rejects my paper in 1 or 2 weeks.

Here are some facts of my invention:
1. The logic used to generate a state machine with clock gating devices is almost the same as conventional method would generate, or maybe even simpler than conventional method.

I think you missed the mark by a wide margin on this one. The logic needed for the clock gating is this...

elsif WState /= WState_NS then

This is not so trivial compared to the FSM itself, especially in an ASIC. I would estimate it is approximately the same amount of logic in general.


> 2. I don't know how CPU deals with its 100,000*4 FFs clocking scheme used in state machines for the Cache II control. If they don't care about the power saving or they have implemented some scheme in the implementation, my invention would be of few values, or otherwise it would be worth million of dollars.

For a patent to be valid it has to be non-obvious to a practitioner in the field. I don't know how this is non-obvious to someone in the field of CPU design. You may obtain a patent, but then lose a patent defense case in court. But again, I didn't think cell phones would take off and now I have two.


> 3. My post's purpose is to test if such invention is of any value, not about how to implement a state machine with clock gating function.

What exactly is your "invention"??? Clock gating is nothing new. It is applied to many parts of a CPU. Is your invention the idea of applying it to the individual FSMs in a CPU cache? So if someone instead applies it to groupings of FSMs in a CPU cache they will have worked around your patent.


> 4. After my application is published 3 months later I will immediately register and sell the application at http://www.ast.com/interested-in-selling-to-ast/. I know the website because Google refers to the website and indicates they are a member of the site. I expect that Intel, IBM, AMD, Apple may also be the members of the website. The site asks for the selling price during registration. So it is important for me to assess my invention's value properly.

What value have you assessed so far?


5. I think no developing persons at Intel, IBM, AMD, Apple would visit this website, not mention taking part in the discussion of my post.

6. I hope I will discuss the invention in more details 3 months later before my registrations in the patent selling website.

7. Xilinx chip has clock enable signal built into its cell block, one CE input for 8 registers in the block. Altera may be in the same situation. So clock enable is never a new thing and we don't have to pay attention to how the clock trees work. For a CPU design, in my opinion, logic design and clock tree design are 2 separated domains one after another, and logic designers never have to pay attention to the clock trees.

Clock enable and clock gating are not the same thing. Clock enable saves power by not changing the FF state, but if the FF input is the same as the output the state won't change anyway.

Here is something to consider. Clock gating saves power compared to clock enabling by reducing the power consumed in the clock tree. How much of the clock tree will you actually be gating with a fine grained approach? Clock trees are exponential structures with a multiplier for the fan out at each level. With this fine grain approach you are only saving power in the final level and in fact, may be adding a level if your clock gating control is at a finer resolution than the last level of clock drive.

Generally clock gating is used at a high level to gate the clock to sections of a chip. I expect it is seldom if ever used at a low level because the power saved is not optimal and the logic required is maximal.

Rick C.

-- Get 6 months of free supercharging
-- Tesla referral code - https://ts.la/richard11209
 
In article <dd5eaff5-f99d-48f3-99bc-10dfda031747@googlegroups.com>,
Weng Tianxiang <wtxwtx@gmail.com> wrote:
>If 2 state machines as you suggested may be active on the same clock, how do you handle it using your scheme?

Weng - I find your obsession with "state machines" a bit puzzling. I
seem to recall an poster a few years ago asking about the "largest"
state machine in current designs - was this you? It seems likely, in
the event that you consider a CPU cache as a large number (~100,000)
of state machines running in parallel.

I've not designed a CPU cache. But I can pretty much guarantee that
whomever designed that CPU cache you're thinking about did NOT model
the design as such (a lot of state machines running in parallel). To
be frank, I can see the entire design being done without implementing
a "state machine" at all.

A state machine is simply a model to make it easier for humans to
understand and design a circuit. It's not neccesary at all to apply
this model to any or all digital circuits.

One of my co-workers (for whatever reason) abhors "State machine"
design, and won't use them - at all. That's fine, he models things
differently. And he's a very productive engineer - not hindered one bit
by his lack of use of "state machines".

Conversely, one can model an entire ASIC (or FPGA) design as simply
one large state machine. Or many smaller state machines running in
parallel. (Assume a single clock for this analogy). It's just that a
model to aid our (the designers) view of a design.

Take a full schematic of any full ASIC. Draw a random blob around ANY
set of 4-5 FFs. Include some parts of the fanin and fanout logic
of those flip flops. Bam - there's a state machine. Repeat 20,000
times for all FF's in the design. Is this useful - not really - but it
will meet any definition of "State Machine" that you can define.

Regard,

Mark
 
On Tuesday, January 8, 2019 at 12:26:39 PM UTC-8, gtwrek wrote:
In article <dd5eaff5-f99d-48f3-99bc-10dfda031747@googlegroups.com>,
Weng Tianxiang <wtxwtx@gmail.com> wrote:
If 2 state machines as you suggested may be active on the same clock, how do you handle it using your scheme?

Weng - I find your obsession with "state machines" a bit puzzling. I
seem to recall an poster a few years ago asking about the "largest"
state machine in current designs - was this you? It seems likely, in
the event that you consider a CPU cache as a large number (~100,000)
of state machines running in parallel.

I've not designed a CPU cache. But I can pretty much guarantee that
whomever designed that CPU cache you're thinking about did NOT model
the design as such (a lot of state machines running in parallel). To
be frank, I can see the entire design being done without implementing
a "state machine" at all.

A state machine is simply a model to make it easier for humans to
understand and design a circuit. It's not neccesary at all to apply
this model to any or all digital circuits.

One of my co-workers (for whatever reason) abhors "State machine"
design, and won't use them - at all. That's fine, he models things
differently. And he's a very productive engineer - not hindered one bit
by his lack of use of "state machines".

Conversely, one can model an entire ASIC (or FPGA) design as simply
one large state machine. Or many smaller state machines running in
parallel. (Assume a single clock for this analogy). It's just that a
model to aid our (the designers) view of a design.

Take a full schematic of any full ASIC. Draw a random blob around ANY
set of 4-5 FFs. Include some parts of the fanin and fanout logic
of those flip flops. Bam - there's a state machine. Repeat 20,000
times for all FF's in the design. Is this useful - not really - but it
will meet any definition of "State Machine" that you can define.

Regard,

Mark

Hi Mark,

You really has good memory!!!

I posted a post with title: "What is largest number of state machines in a chip" at this FPGA group several years ago.

Here are tons of state machine patents about how to design a L2 cache. I list only the search word "L2 cache inassignee:intel" and you can find through Google there are 4,830 patents filed and issued by Intel, the search word "L2 cache state machine inassignee:intel" and it leads to 4,360, each of them is related to a type of state machines.

I believe that anyone cannot be accounted as a professional digital circuit designer if he does not seriously consider or design a state machine.

One of my hobbies is to look at patents filed by Intel, IBM, AMD, Xilinx and Altera. Reading Xilinx and Altera' patents gives me the knowledge on how they design their FPGA chips. Reading Intel, IBM and AMD' patents gives me the knowledge on how they design something very complex and new technology trend. And through the reading I find many topics for me to further develop..

I disagree with your following opinion:
"I've not designed a CPU cache. But I can pretty much guarantee that
whomever designed that CPU cache you're thinking about did NOT model
the design as such (a lot of state machines running in parallel). To
be frank, I can see the entire design being done without implementing
a "state machine" at all. "

Here is an Intel patent: US8493397B1: "Circuit for placing a cache memory into low power mode in response to special bus cycles executed on the bus"

https://patents.google.com/patent/US8493397?oq=L2+cache+state+machine

https://patents.google.com/patent/US20140156931?oq=L2+cache+state+machine

I agree with your following opinion:
"a lot of state machines running in parallel".

After my invention all state machine design will be benefited to be in lower power status, no matter what type of state machines is, and the logic resource usage is less than a conventional synthesizer would generate.

Rick,
I disagree with your opinion:
"elsif WState /= WState_NS then

This is not so trivial compared to the FSM itself, especially in an ASIC. I would estimate it is approximately the same amount of logic in general. "

In my invention there is no one single logic gate generated for comparison "WState /= WState_NS". Is it obvious to you?

That is the best point of my invention.

Thank you.

Weng
 
In article <d63828fe-673b-4254-a3c1-bd1fa85175ea@googlegroups.com>,
Weng Tianxiang <wtxwtx@gmail.com> wrote:
Here are tons of state machine patents about how to design a L2 cache. I list only the search word "L2 cache inassignee:intel" and you can find through
Google there are 4,830 patents filed and issued by Intel, the search word "L2 cache state machine inassignee:intel" and it leads to 4,360, each of them is
related to a type of state machines.

I believe that anyone cannot be accounted as a professional digital circuit designer if he does not seriously consider or design a state machine.

Proof by counter-example. My coworked is an excellent "professional
digital circuit designer" and has been for over 30 years. He does use a
"state machine" to model any of his designs. He doesn't like the model.
Again, we're talking about using a model as a tool. That model doesn't
work for him. He has others that work quite nicely.

One of my hobbies is to look at patents filed by Intel, IBM, AMD, Xilinx and Altera. Reading Xilinx and Altera' patents gives me the knowledge on how they
design their FPGA chips. Reading Intel, IBM and AMD' patents gives me the knowledge on how they design something very complex and new technology trend. And
through the reading I find many topics for me to further develop.

I disagree with your following opinion:
"I've not designed a CPU cache. But I can pretty much guarantee that
whomever designed that CPU cache you're thinking about did NOT model
the design as such (a lot of state machines running in parallel). To
be frank, I can see the entire design being done without implementing
a "state machine" at all. "

Here is an Intel patent: US8493397B1: "Circuit for placing a cache memory into low power mode in response to special bus cycles executed on the bus"

That's a non-sequiter: There's nothing in that patent search that says
the designer is using a state machine model to design the CPU cache.
That's all part of your imagination.

A "digital circuit" is just FF's and combinatorial gates tied together
in clever ways. Whether you apply a "state machine" model to the circuit
is just something between your ears. Most of the tools see just FF's,
gates (or LUTs), and timing paths.

I assert (without eny evidence whatsoever) that whomever designed that CPU
cache memory at Intel did NOT model it (in his head or otherwise) as
100,000 or more state machines running in parallel. That's just crazy.

https://patents.google.com/patent/US8493397?oq=L2+cache+state+machine

https://patents.google.com/patent/US20140156931?oq=L2+cache+state+machine

I agree with your following opinion:
A quote from me with a lot of context removed...

After my invention all state machine design will be benefited to be in lower power status, no matter what type of state machines is, and the logic resource
usage is less than a conventional synthesizer would generate.

You're still not hearing me. If you have some Super Snazy Algorithm
that does some magic low power thing targetting state machines, then the
Super Snazy Algorithm would also be capable of targetting ANY digital
circuit. (If I recall, most FPGA Low power optimizers run rather late
in the implementation process - i.e. after synthesis and "state machine"
optimizations)

As a skeptical engineer (any engineer that's been around for any time
whatsover fits this description) I have sincere doubts in your Super Snazy
Algorithm. Bright folks have been designing low-power tools for quite
some time. I've doubt there's any room for improvement (at least in
the digital logic sense). And digitally, the problem's not hard to
define at all. The devil is in all the details with respect to timing,
and other optimization metrics. (Hint if your only metric is "logic
resource usage" then you're not understanding the full problem by a long
shot).

On the other hand maybe you're a digital logic savant, and are seeing
new and creative solutions.

Good luck with your further patent googling, and applications.

Regards,

Mark
 
In article <q137f9$f16$1@dont-email.me>, gtwrek <gtwrek@sonic.net> wrote:
In article <d63828fe-673b-4254-a3c1-bd1fa85175ea@googlegroups.com>,
Weng Tianxiang <wtxwtx@gmail.com> wrote:
Here are tons of state machine patents about how to design a L2 cache. I list only the search word "L2 cache inassignee:intel" and you can find through
Google there are 4,830 patents filed and issued by Intel, the search word "L2 cache state machine inassignee:intel" and it leads to 4,360, each of them is
related to a type of state machines.

I believe that anyone cannot be accounted as a professional digital circuit designer if he does not seriously consider or design a state machine.

Proof by counter-example. My coworked is an excellent "professional
digital circuit designer" and has been for over 30 years. He does NOT use a
^^^
---- Arg! Edit to make my point -------------------------------------

"state machine" to model any of his designs. He doesn't like the model.
Again, we're talking about using a model as a tool. That model doesn't
work for him. He has others that work quite nicely.


One of my hobbies is to look at patents filed by Intel, IBM, AMD, Xilinx and Altera. Reading Xilinx and Altera' patents gives me the knowledge on how they
design their FPGA chips. Reading Intel, IBM and AMD' patents gives me the knowledge on how they design something very complex and new technology trend. And
through the reading I find many topics for me to further develop.

I disagree with your following opinion:
"I've not designed a CPU cache. But I can pretty much guarantee that
whomever designed that CPU cache you're thinking about did NOT model
the design as such (a lot of state machines running in parallel). To
be frank, I can see the entire design being done without implementing
a "state machine" at all. "

Here is an Intel patent: US8493397B1: "Circuit for placing a cache memory into low power mode in response to special bus cycles executed on the bus"

That's a non-sequiter: There's nothing in that patent search that says
the designer is using a state machine model to design the CPU cache.
That's all part of your imagination.

A "digital circuit" is just FF's and combinatorial gates tied together
in clever ways. Whether you apply a "state machine" model to the circuit
is just something between your ears. Most of the tools see just FF's,
gates (or LUTs), and timing paths.

I assert (without eny evidence whatsoever) that whomever designed that CPU
cache memory at Intel did NOT model it (in his head or otherwise) as
100,000 or more state machines running in parallel. That's just crazy.


https://patents.google.com/patent/US8493397?oq=L2+cache+state+machine

https://patents.google.com/patent/US20140156931?oq=L2+cache+state+machine

I agree with your following opinion:
A quote from me with a lot of context removed...

After my invention all state machine design will be benefited to be in lower power status, no matter what type of state machines is, and the logic resource
usage is less than a conventional synthesizer would generate.

You're still not hearing me. If you have some Super Snazy Algorithm
that does some magic low power thing targetting state machines, then the
Super Snazy Algorithm would also be capable of targetting ANY digital
circuit. (If I recall, most FPGA Low power optimizers run rather late
in the implementation process - i.e. after synthesis and "state machine"
optimizations)

As a skeptical engineer (any engineer that's been around for any time
whatsover fits this description) I have sincere doubts in your Super Snazy
Algorithm. Bright folks have been designing low-power tools for quite
some time. I've doubt there's any room for improvement (at least in
the digital logic sense). And digitally, the problem's not hard to
define at all. The devil is in all the details with respect to timing,
and other optimization metrics. (Hint if your only metric is "logic
resource usage" then you're not understanding the full problem by a long
shot).

On the other hand maybe you're a digital logic savant, and are seeing
new and creative solutions.

Good luck with your further patent googling, and applications.

Regards,

Mark
 
Hi Mark,

"I assert (without eny evidence whatsoever) that whomever designed that CPU
cache memory at Intel did NOT model it (in his head or otherwise) as
100,000 or more state machines running in parallel. That's just crazy. "

Here are the facts, you are welcome and no matter whether you agree or not:
1. 6M L2 cache, the largest L2 cache I can search for with a commercial CPU;

2. Every 64 bytes in L2 cache constitute a cache line;

3. Each L2 cache line works independently;

4. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

4. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

4. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

4. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

4. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.


1. IBM: Cache-coherency protocol with upstream undefined state
https://patents.google.com/patent/US6374330

2. IBM: Cache-coherency protocol with recently read state for data and instructions
https://patents.google.com/patent/US5996049

3. NVidia: State machine control for a pipelined L2 cache to implement memory transfers for a video processor. https://patents.google.com/patent/US8493397

Thank you.

Weng
 
> In my invention there is no one single logic gate generated for comparison "WState /= WState_NS". Is it obvious to you? 

If this is from the state machine code you posted on Jan 5, I already pointed out that the "WState /= WState_NS" is not necessary in that design even though your code comment said it was needed. Logic synthesis will optimize it out. In can do that because your posted design is not an example of a gated clock design.

However, if you move "WState /= WState_NS" to create logic that is used to generate a gated clock in some fashion that is used to clock the state machine, then there will be extra logic generated to implement "WState /= WState_NS" which will consume power.

So what are you talking about...
1. Your earlier posted code that is not of a gated clock design?
2. Some other unpublished gated clock design where you are making unsubstantiated claims?

That is the best point of my invention.
Well that's too bad.

Kevin
 
Kevin,

In my invention, all state machines will be synthesized to have clock gating function, no matter whether or not it is coded to have clock gating device!

Thank you.

Weng
 
> In my invention, all state machines will be synthesized to have clock gating function, no matter whether or not it is coded to have clock gating device! 

Then anything using your invention...
-Will use additional logic. The power consumed by that logic will have to be subtracted out from whatever power savings might get realized from clocking less frequently.
-Will be impossible to get timing closure in an FPGA environment, maybe ASIC tools can handle it.
-Will consume more power in an FPGA, TBD if it will in an ASIC.
-Will not end up saving much power since state machine consume a relatively small portion of the power... the majority of the power is consumed by the data path that is being controlled.

So, function, performance, and power are all negatively impacted. Is anyone here interested?

You also didn't answer my question about if you were referring to your Jan 5 code or some unpublished code...same 'ol story with your ideas.

Kevin Jennings
 
By the way, the whole idea of not clocking a flip flop except when needed to change state is loooooong ago pre-existing knowledge. The storage device is called a toggle flip flop, the ripple counter being the classic example of a function that is easy to understand and uses the device...you did include that in your description of prior art in your patent disclosure, right?

Kevin Jennings
 
Hi Kevin,

1. No source code is provided for a testing bench except demonstrating my ideas.

Then anything using your invention...
2. "-Will use additional logic."
No additional logic is used except a clock gating device.

3. " The power consumed by that logic will have to be subtracted out from whatever power savings might get realized from clocking less frequently. "
No additional power is consumed on no additional logic.

4. "-Will be impossible to get timing closure in an FPGA environment"
Wrong! Xilinx has a built-in clock enable input for 8 register in a LUT6 block.

5. "-Will consume more power in an FPGA."
Wrong!

6. "TBD if it will in an ASIC." I don't know what "TBD" stands for.

7. "-Will not end up saving much power since state machine consume a relatively small portion of the power"
It is right if for a single state machine, but not correct when dealing with 100,000 state machines.

8. I just mentioned that skipping a cycle pulse would save power. No more than that is mentioned. It is not my business.

Thank you.

Weng
 
On 1/6/19 8:59 PM, Weng Tianxiang wrote:
I want to use my method in all types of circuits. A clock gating device is basically a latch. A FF with a clock enable input is a FF having a latch. Thank you.

Unless you are using the term different than I am used to I would
disagree somewhat.

A "latch" is, to my language, and asynchronous memory unit that copies
it input to its output for one level of the enable, and the output holds
its current value for the other level of the enable. It is one of the
more primitive memory unit.

A latch could be used for clock gating, but is highly inefficient for
doing so, as the properly designed clock gate knows what state the
output should be in the gated off state, so doesn't need to the logic to
maintain current state. The clock gating device is basically a GATE.

There may be a way to use a latch to build a gated ff, but again, there
are simpler methods with better timing.
 
On Tuesday, January 8, 2019 at 8:04:01 PM UTC-5, Weng Tianxiang wrote:
Hi Kevin,

1. No source code is provided for a testing bench except demonstrating my ideas.

You stated in an earlier post "In my invention there is no one single logic gate generated for comparison "WState /= WState_NS". Is it obvious to you?" but the code being referenced was not from a gated clock design so there is nothing 'demonstrating your idea' whatever that may be.

Then anything using your invention...
2. "-Will use additional logic."
No additional logic is used except a clock gating device.

Did you not even notice your use of the word 'except' after you typed it?

No matter. So this 'clock gating device', either has only one input (which is the only thing that would not require logic resource to implement) or it has more than one input and can generate the correct gated clock output without any logic resources, which means it works by magic. The absurdity meter is pegged at the highest setting with this claim of yours.

3. " The power consumed by that logic will have to be subtracted out from whatever power savings might get realized from clocking less frequently. "
No additional power is consumed on no additional logic.

Well of course. Why would the 'operates by magic' clock gating device which is only needed with your "invention" require any power in order to operate? Absurdity meter has gone off scale.

4. "-Will be impossible to get timing closure in an FPGA environment"
Wrong! Xilinx has a built-in clock enable input for 8 register in a LUT6 block.

You've been told this before by others, but a clock enable input is not the same thing as a gated clock. Specifically, in typical electrical engineering parlance, a 'clock enable' signal modifies the data input to a flip flop, not the clock input. 'Clock enable' signals do not modify the clock in any way. Do some more research, this is a pretty basic logic design concept.

5. "-Will consume more power in an FPGA."
Wrong!

I am correct and I sent you the full details back in 2010. The governing NDA for that work is no longer in force but I won't post all the details here that back my claim in order to avoid embarrassing you any further. If you would like to post your actual design, methods and measurements here to provide evidence to justify your stance, feel free. Simply making statements and claims is not evidence.

6. "TBD if it will in an ASIC." I don't know what "TBD" stands for.

You seem to have a lot of outages of Google at your place.

7. "-Will not end up saving much power since state machine consume a relatively small portion of the power"
It is right if for a single state machine, but not correct when dealing with 100,000 state machines.

No, the number of state machines does not matter since they will (or should) be controlling much larger stuff that would consume the bulk of the power.. If you have 100,000 state machines controlling 10,000 things in a data path, you likely have incompetently designed state machines.

8. I just mentioned that skipping a cycle pulse would save power. No more than that is mentioned. It is not my business.

Yes, you stated that but can provide no evidence to back that claim. Without that, you're just making unfounded statements, many of which are clearly incorrect and have been pointed out to you...for many years now.

Kevin Jennings
 
Hi Richard,

I don't think so:
"The clock gating device is basically a GATE"!

Kevin,
"No, the number of state machines does not matter since they will (or should) be controlling much larger stuff that would consume the bulk of the power. If you have 100,000 state machines controlling 10,000 things in a data path, you likely have incompetently designed state machines. "

One state machine controls the status for a 64 bytes L2 cache line, and 100,000 state machines fully control 6M L2 cache status. It does not control data path! Their states will be affect how each of L2 cache line behaves.

If you have time have a look at the following 2 patents, at least you can understand what each of those 1000,000 state machines is and and how it works.

Thank you.

Weng
 
Hi Kevin,

I thank you for your help many years ago.

It is not correct:
"a 'clock enable' signal modifies the data input to a flip flop, not the clock input. 'Clock enable' signals do not modify the clock in any way."

When a CLOCK ENABLE is deasserted, no clock pulse will feed a FF, and the FF will keep unchanged on the next cycle. If a CLOCK ENABLE is asserted, a clock pulse will feed a FF, and the FF will be updated on the next cycle.

Thank you.

Weng
 
Hi Richard and Kevin,

Here is a copy from Wikipedia "clock gating":
https://en.wikipedia.org/wiki/Clock_gating

Clock gating is a popular technique used in many synchronous circuits for reducing dynamic power dissipation. Clock gating saves power by adding more logic to a circuit to prune the clock tree. Pruning the clock disables portions of the circuitry so that the flip-flops in them do not have to switch states. Switching states consumes power. When not being switched, the switching power consumption goes to zero, and only leakage currents are incurred..[1]

Clock gating works by taking the enable conditions attached to registers, and uses them to gate the clocks. A design must contain these enable conditions in order to use and benefit from clock gating. This clock gating process can also save significant die area as well as power, since it removes large numbers of muxes and replaces them with clock gating logic. This clock gating logic is generally in the form of "integrated clock gating" (ICG) cells. However, the clock gating logic will change the clock tree structure, since the clock gating logic will sit in the clock tree.

Clock gating logic can be added into a design in a variety of ways:

Coded into the register transfer level (RTL) code as enable conditions that can be automatically translated into clock gating logic by synthesis tools (fine grain clock gating).

Inserted into the design manually by the RTL designers (typically as module level clock gating) by instantiating library specific integrated clock gating (ICG) cells to gate the clocks of specific modules or registers.
Semi-automatically inserted into the RTL by automated clock gating tools. These tools either insert ICG cells into the RTL, or add enable conditions into the RTL code. These typically also offer sequential clock gating optimisations.

Any RTL modifications to improve clock gating will result in functional changes to the design (since the registers will now hold different values) which need to be verified.

Sequential clock gating is the process of extracting/propagating the enable conditions to the upstream/downstream sequential elements, so that additional registers can be clock gated.

Although asynchronous circuits by definition do not have a "clock", the term perfect clock gating is used to illustrate how various clock gating techniques are simply approximations of the data-dependent behavior exhibited by asynchronous circuitry. As the granularity on which you gate the clock of a synchronous circuit approaches zero, the power consumption of that circuit approaches that of an asynchronous circuit: the circuit only generates logic transitions when it is actively computing.[2]

Chip intended to run on batteries or with very low power such as those used in the mobile phones, wearable devices, etc. would implement several forms of clock gating together. At one end is the manual gating of clocks by software, where a driver enables or disables the various clocks used by a given idle controller. On the other end is automatic clock gating, where the hardware can be told to detect whether there's any work to do, and turn off a given clock if it is not needed. These forms interact with each other and may be part of the same enable tree. For example, an internal bridge or bus might use automatic gating so that it is gated off until the CPU or a DMA engine needs to use it, while several of the peripherals on that bus might be permanently gated off if they are unused on that board.

Weng
 
On Tue, 8 Jan 2019 21:10:58 -0800 (PST)
Weng Tianxiang <wtxwtx@gmail.com> wrote:

It is not correct:
"a 'clock enable' signal modifies the data input to a flip flop, not the clock input. 'Clock enable' signals do not modify the clock in any way."

I heard long ago that the 'clock enable' signal in Xilinx
FPGAs does not affect the clock signal. This is likely to allow
sharing clock edge detection, and to minimise the routing to a
block of flops with shared clock signal. Patent here [1].

> When a CLOCK ENABLE is deasserted, no clock pulse will feed a FF, and the FF will keep unchanged on the next cycle. If a CLOCK ENABLE is asserted, a clock pulse will feed a FF, and the FF will be updated on the next cycle.

This is from a logic user's guide, with simplified explanation
based on the implementation of a single flip-flop, and is not
intended to be a circuit description.

Suggestion: 'The hardest thing to know is (the extent of) what
we do not know.'

Thank you.

Weng

Jan Coombs
--

[1] Clock enable control circuit for flip flops
United States Patent 6466049 [2002]
http://www.freepatentsonline.com/6466049.html
 
In article <20190109105441.5be0a472@t530>,
jenfhaomndgfwutc@murmic.plus.com says...
I heard long ago that the 'clock enable' signal in Xilinx
FPGAs does not affect the clock signal. This is likely to allow
sharing clock edge detection, and to minimise the routing to a
block of flops with shared clock signal. Patent here [1].

Xilinx recommends clock gating be fed through bufgce to prevent skew
and timing issues (you also gain good fanout of course) if feeding large
enough numbers of blocks.,
Vivado automatcaly moves the gating to the enable path for flip flops
or latches (can be manually overriden though I've not done that yet)

As for writing patents based on other peoples patents - this thread confirms
the obvious:

To quote Daniel Whitehall:
"Discovery requires experimentation"
Marvels agents of SHIELD

john

=========================
http://johntech.co.uk
=========================
 
Am Mittwoch, 9. Januar 2019 06:31:03 UTC+1 schrieb Weng Tianxiang:
Hi Kevin,

I thank you for your help many years ago.

It is not correct:
"a 'clock enable' signal modifies the data input to a flip flop, not the clock input. 'Clock enable' signals do not modify the clock in any way."

When a CLOCK ENABLE is deasserted, no clock pulse will feed a FF, and the FF will keep unchanged on the next cycle. If a CLOCK ENABLE is asserted, a clock pulse will feed a FF, and the FF will be updated on the next cycle.

In theory a "clock enable" gates the clock line, but in reality it usually switches only the data path to the FF.
In most technologies the enable of a FF with Clock enable is synchronous used.
If you zoom into a typical clock enable-FF you will find the following hardware implemented.
(use fixed font for view)

_________________________
| |
| +---+ +-------+ |
--| | | | |
|MUX|---|D Q|-----------
D ----| | | |
+---+ | FF |
| | |
Enable------- | |
| |
Clock ___________|\ |
|/ |
+-------+

A clock tree is the tree of buffer (inverter) between clock source and each FF and the gating is often performed on a dedicated branch of the clock tree which is no leaf.
It is ofc possible and most flexible to gate the clock direct before the FF (and therefore at the end of the leaf) but this has the least power saving effect and the worst impact in resource usage.
The best effect is gained when gating as near as possible on to the clock source.
On the other hand this is not trivial as the clock tree without any clock gate would connect maybe 8 FF that are functional close together on same leaf of the clock tree but if of these 8 FF only one should be gated than you need to move the gating FF from non gated branch to a gated branch which might connect this FF to some other FF that are pyhsically located further away increasing routing effort and routing delay.

In many cases the power consumption of the clock tree switching with clock gating only on the FF itself is not smaller than the power consumption of the same tree with synchronous data gating as the FF itself is in both implementations keeping its outputs constant when "gated" and the load of the FF located clock gate is same as the load of the FF.

The synchronous enable has from timing point of view a strong advantage vs clock gating and is therefore easier to handle in layout.

regards,

Thomas
 
On Tuesday, January 8, 2019 at 6:23:40 PM UTC-5, Weng Tianxiang wrote:
Kevin,

In my invention, all state machines will be synthesized to have clock gating function, no matter whether or not it is coded to have clock gating device!

Then your invention will optimally use the toggle flip flop as the fundamental storage device. There are several flavors of basic flip flops: SR (set-reset), JK (improved set-reset), T (toggle) and D. The industry has long since settled on using essentially only the D type and presumably has optimized that one. So to use your invention one would have to either use a non-optimal flip flop or construct it from the D type, which presumably would be less optimal than if it were a true T type.

If the industry had settled on using only T flip flops then we would all be doing gated clock designs now. But just because it hasn't does not mean that the T flip flop and the associated gated clock logic required to use that flip flop type is not already existing prior art. It is simply prior art that is not widely used. A single logic description can be synthesized to use any of the basic flip flop types inherent in the underlying hardware. So the mapping of some VHDL/Verilog source code to be implemented using T flip flops as storage is not novel.

While nearly every invention is a new novel use that builds on prior art your apparent claim here "all state machines will be synthesized to have clock gating function" is nothing more than stating that "all state machines will be synthesized using T flip flops" which is neither new nor novel. The limitation to "all state machines" rather than "all memory storage" is a restriction over what is already existing so that is not novel either.

Kevin Jennings
 

Welcome to EDABoard.com

Sponsor

Back
Top