My invention: Coding wave-pipelined circuits with buffering

Weng Tianxiang · Jan 11, 2018

Hi,

A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path from the input register passing through a piece of computational logic to the output register, and no intermediate registers.

My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.

All coding has 3 steps:
1. Write a Critical Path Component (CPC) with defined interface;

2. Call a Wave-Pipelining Component (WPC) provided by a system library;

3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is.

Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.wave_pipeline_package.all;

-- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
-- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
-- link1(): generation would fail if the circuit cannot accept 1 data per cycle
-- link2(): generation never fails and the circuit is capable of accepting 1 data per
-- INPUT_CLOCK_NUMBER cycles

entity CPC_1_2 is
generic (
input_data_width : positive := 64; -- optional
output_data_width : positive := 128 -- optional
);
port (
CLK : in std_logic;
WE_i : in std_logic; -- '1': write enable to input registers A & B
Da_i : in signed(input_data_width-1 downto 0); -- input data A
Db_i : in signed(input_data_width-1 downto 0); -- input data B
WE_o_i: in std_logic; -- '1': write enable to output register C
Dc_o : out unsigned(output_data_width -1 downto 0) -- output data C
);
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
signal Ra : signed(input_data_width-1 downto 0); -- input register A
signal Rb : signed(input_data_width-1 downto 0); -- input register B
signal Rc : signed(output_data_width-1 downto 0); -- output register C
signal Cl : signed(output_data_width-1 downto 0); -- combinational logic

begin
Cl <= Ra * Rb; -- combinational logic output, key part of CPC
Dc_o <= unsigned(Rc); -- output through output register

p_1 : process(CLK)
begin
if Rising_edge(CLK) then
if WE_i = '1' then -- WE_i = '1' : latch input data
Ra <= Da_i;
Rb <= Db_i;
end if;

if WE_O_I = '1' then -- WE_O_I = '1': latch output data
Rc <= Cl;
end if;
end if;
end process;

--------------------------------------------------------------------------------

end A_CPC_1_2;

In summary, after HDL adopting my system, writing a wave-pipelined circuit is simple as writing a one-cycle logic circuit.

Thank you.

Weng

Weng Tianxiang · Jan 12, 2018

On Wednesday, January 10, 2018 at 5:56:45 PM UTC-8, Weng Tianxiang wrote:

Hi,

A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path from the input register passing through a piece of computational logic to the output register, and no intermediate registers.

My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.

All coding has 3 steps:
1. Write a Critical Path Component (CPC) with defined interface;

2. Call a Wave-Pipelining Component (WPC) provided by a system library;

3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is.

Here is the all code on a 64*64 bits signed integer multiplier C <= A*B..

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.wave_pipeline_package.all;

-- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
-- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
-- link1(): generation would fail if the circuit cannot accept 1 data per cycle
-- link2(): generation never fails and the circuit is capable of accepting 1 data per
-- INPUT_CLOCK_NUMBER cycles

entity CPC_1_2 is
generic (
input_data_width : positive := 64; -- optional
output_data_width : positive := 128 -- optional
);
port (
CLK : in std_logic;
WE_i : in std_logic; -- '1': write enable to input registers A & B
Da_i : in signed(input_data_width-1 downto 0); -- input data A
Db_i : in signed(input_data_width-1 downto 0); -- input data B
WE_o_i: in std_logic; -- '1': write enable to output register C
Dc_o : out unsigned(output_data_width -1 downto 0) -- output data C
);
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
signal Ra : signed(input_data_width-1 downto 0); -- input register A
signal Rb : signed(input_data_width-1 downto 0); -- input register B
signal Rc : signed(output_data_width-1 downto 0); -- output register C
signal Cl : signed(output_data_width-1 downto 0); -- combinational logic

begin
Cl <= Ra * Rb; -- combinational logic output, key part of CPC
Dc_o <= unsigned(Rc); -- output through output register

p_1 : process(CLK)
begin
if Rising_edge(CLK) then
if WE_i = '1' then -- WE_i = '1' : latch input data
Ra <= Da_i;
Rb <= Db_i;
end if;

if WE_O_I = '1' then -- WE_O_I = '1': latch output data
Rc <= Cl;
end if;
end if;
end process;

--------------------------------------------------------------------------------

end A_CPC_1_2;

In summary, after HDL adopting my system, writing a wave-pipelined circuit is simple as writing a one-cycle logic circuit.

Thank you.

Weng

Hi,

The following information is from Wikipedia:

1. The Intel 8087, announced in 1980, was the first x87 floating-point coprocessor for the 8086 line of microprocessors.

2. MMX is a single instruction, multiple data (SIMD) instruction set designed by Intel, introduced in 1997 with its P5-based Pentium line of microprocessors, designated as "Pentium with MMX Technology".[1] It developed out of a similar unit introduced on the Intel i860,[2] and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on recent IA-32 processors by Intel and other vendors.

MMX has subsequently been extended by several programs by Intel and others: 3DNow!, Streaming SIMD Extensions (SSE), and ongoing revisions of Advanced Vector Extensions (AVX).

8087's floating 64-bit multiplier needs 5 cycles to finish a data processing with one input data per cycle.

MMX floating 64-bit floating multiplier needs 4 cycles to finish a data processing with one set of input data per 2 cycles.

Because each multiplier needs one multiplicand A and one multiplier B to get the result C, so naturally many testing benches claim MMX 64-bit floating multiplier is 20% faster than 8087 (4 cycles vs 5 cycles).

With my invention, any college students with knowledge of HDL can write a MMX wave-pipelined 64-bit floating multiplier within half an hour under following conditions:

1. My invented system is fully accepted to HDL;

2. Synthesizer manufacturers have updated their products to handle the generation of related wave-pipelined circuits.
All related technology and algorithms are available off selves.

3. It needs time.

One of wonderful wave-pipelined circuits I think may be 16 channels FFT processor with wave-pipelined technology: the benefits are faster running frequency and a lot of saving in respect of logic area and power consumption.

Thank you.

Weng

Rick C. Hodgin · Jan 13, 2018

Do you have a YouTube example? And an example that wil
synthesize in Icarus? So we can see your method compares to a
standard example.

--
Rick C. Hodgin

Weng Tianxiang · Jan 13, 2018

On Saturday, January 13, 2018 at 1:31:17 PM UTC-8, Rick C. Hodgin wrote:

Do you have a YouTube example? And an example that wil
synthesize in Icarus? So we can see your method compares to a
standard example.

--
Rick C. Hodgin

Hi Rick,

Actually I have got 3 patents issued for the subject:

1. 9,747,252: Systematic method of coding wave-pipelined circuits in HDL.
2. 9,734,127: Systematic method of synthesizing wave-pipelined circuits in HDL.
3. 9,575,929: Apparatus of wave-pipelined circuits.

All 3 patents have the same specification, drawings, abstract with different claims

Here is my new non-provisional patent application 15,861,093 (application, hereafter), "Coding wave-pipelined circuits with buffering function in HDL", filed to USPTO on 2018/01/03.

The non-provisional patent application 15,861,093 has a *txt (*.vhd) file attached so that they are not secrets and any persons who are interested in the subject can email me to get what he wants, I would email the file set to him, even full application set will be published 18 months later.

The following is part of my sell-promotional file to some big companies:

"The new application can be viewed in some extents as the continuation of the 3 patents logically, but legally it is a brand new invention devoting the main attention to coding buffering function for wave-pipelined circuits in HDL, a topic never mentioned in the 3 patents, while it is still paying great attention to improve the 3 patents to make them more robust, friendlier and more complete in point of view from coding designers."

In the 3 previous patents a first version of source code was attached, the new application provides the second version. With the 2nd version of VHDL source code available you can use a VHDL-2002 or above simulator to simulate all workings and generate waves. The source file is also well noted with inserted debugging function code.

Please email me what you want me to send:
for 3 patents:
1.1 Specification

1.2. 3 sets of claims.

1.3. Drawings.

1.4. Source code.

1.5. ZIP file of all above.

For new application:
2.1 Specification.

2.2. claims.

2.3. Drawings.

2.4. Abstract.

2.5. Source code.

2.6. ZIP file of all above.

For the new application, specification has 81 pages, 48 claims have 15 pages and drawings have 24 pages.

If you lack time, the best way to learn all working structures needs only 2..1 Specification; 2.3. Drawings; and 2.4. Abstract.

Because the target of my patents and new application is a) to make my invented system as part of HDL (not only VHDL, but all languages in HDL), and b) to make the source code as part of system library in HDL, I am willing to distribute my code and all related files to any persons who are really interested in how I did it.

Through CPC_1_2 you may know that my scheme needs the least logic information and coding from a designer to resolve a very difficult problem, an almost 50-years open problem.

My Email address is wtx wtx @ gmail . com (please remove spaces between characters)

Thank you.

Weng

Weng Tianxiang · Jan 17, 2018

On Wednesday, January 10, 2018 at 5:56:45 PM UTC-8, Weng Tianxiang wrote:

Hi,

A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path from the input register passing through a piece of computational logic to the output register, and no intermediate registers.

My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.

All coding has 3 steps:
1. Write a Critical Path Component (CPC) with defined interface;

2. Call a Wave-Pipelining Component (WPC) provided by a system library;

3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is.

Here is the all code on a 64*64 bits signed integer multiplier C <= A*B..

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.wave_pipeline_package.all;

-- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
-- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
-- link1(): generation would fail if the circuit cannot accept 1 data per cycle
-- link2(): generation never fails and the circuit is capable of accepting 1 data per
-- INPUT_CLOCK_NUMBER cycles

entity CPC_1_2 is
generic (
input_data_width : positive := 64; -- optional
output_data_width : positive := 128 -- optional
);
port (
CLK : in std_logic;
WE_i : in std_logic; -- '1': write enable to input registers A & B
Da_i : in signed(input_data_width-1 downto 0); -- input data A
Db_i : in signed(input_data_width-1 downto 0); -- input data B
WE_o_i: in std_logic; -- '1': write enable to output register C
Dc_o : out unsigned(output_data_width -1 downto 0) -- output data C
);
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
signal Ra : signed(input_data_width-1 downto 0); -- input register A
signal Rb : signed(input_data_width-1 downto 0); -- input register B
signal Rc : signed(output_data_width-1 downto 0); -- output register C
signal Cl : signed(output_data_width-1 downto 0); -- combinational logic

begin
Cl <= Ra * Rb; -- combinational logic output, key part of CPC
Dc_o <= unsigned(Rc); -- output through output register

p_1 : process(CLK)
begin
if Rising_edge(CLK) then
if WE_i = '1' then -- WE_i = '1' : latch input data
Ra <= Da_i;
Rb <= Db_i;
end if;

if WE_O_I = '1' then -- WE_O_I = '1': latch output data
Rc <= Cl;
end if;
end if;
end process;

--------------------------------------------------------------------------------

end A_CPC_1_2;

In summary, after HDL adopting my system, writing a wave-pipelined circuit is simple as writing a one-cycle logic circuit.

Thank you.

Weng

Hi,

Here is more information on WPC (Wave-Pipelining Component) provided by a system library (I wroted).

1. There are only 2 WPCs to cover all wave-piplined circuits:
a) It is used for the situation under which only one critical path is used.
b) It is used for the situation under which more than one same critical path is used.

2. There are 5 types of structures of all wave-pipelined circuits based on my classification:
a) A one cycle non-pipelining circuit when it is coded as a wave-pipelined circuit, but finally it turns out to be a 1-cycle regular circuit.

b) A wave-pipelined circuit that can accept one input data per cycle with one critical path.

c) A wave-pipelined circuit that can accept one input data per multiple cycles with one critical path.

d) A wave-pipelined circuit that can accept one input data per cycle with more than one critical path, each critical path having an input register and an output register.

e) A wave-pipelined circuit that can accept one input data per cycle with more than one critical path, each critical path having an input register and sharing a sole output register.

3. The method guarantees 100% success rate for generating a specific wave-pipelined circuit.

Thank you.

Weng

Jan Coombs · Jan 17, 2018

On Sat, 13 Jan 2018 13:31:14 -0800 (PST)
"Rick C. Hodgin" <rick.c.hodgin@gmail.com> wrote:

Do you have a YouTube example? And an example that wil
synthesize in Icarus? So we can see your method compares to a
standard example.

There is perhaps some explanation in "Wave-Pipelining: A
Tutorial and Research Survey"[1], and "DESIGN AND TIMING
ANALYSIS OF WAVE PIPELINED CIRCUITS"[2].

Jan Coombs
--

[1] IEEE Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf

[2] Recep Ozgun's MSc thesis
https://soar.wichita.edu/bitstream/handle/10057/383/t06064.pdf?sequence=3

Weng Tianxiang · Jan 17, 2018

On Tuesday, January 16, 2018 at 11:40:55 PM UTC-8, Jan Coombs wrote:

On Sat, 13 Jan 2018 13:31:14 -0800 (PST)
"Rick C. Hodgin" <rick.c.hodgin@gmail.com> wrote:

Do you have a YouTube example? And an example that wil
synthesize in Icarus? So we can see your method compares to a
standard example.

There is perhaps some explanation in "Wave-Pipelining: A
Tutorial and Research Survey"[1], and "DESIGN AND TIMING
ANALYSIS OF WAVE PIPELINED CIRCUITS"[2].

Jan Coombs
--

[1] IEEE Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf

[2] Recep Ozgun's MSc thesis
https://soar.wichita.edu/bitstream/handle/10057/383/t06064.pdf?sequence=3

Hi Jan,

I appreciate your efforts to dig deep into my inventions.I would like to patiently answer all reasonable technical questions.

Your reference [1] is none but what activates my inspiration to resolve the open problem: design both a coding and a synthesizing methods so that any logic design engineers, including college students with basic knowledge in HDL, can code and generate a wave-piplined circuit.

All published materials I have read are centered on how to eliminate data contamination, a special feature which is never heard in any non-wave-pipelined circuit design.

A data contamination is defined as a later entered data catches up an earlier entered data, damaging the earlier entered data.

What my inventions do is to build a bridge between code designers and synthesizers in order to code and generate a wave-pipelined circuit in the easiest way:

If a code designer provides all necessary and sufficient information to a synthesizer, the synthesizer should and can generate a wave-pipelined circuit as it is specified.

Your reference [1] (1998) at page 142 below table 1 indicates that "Last, due to a lack of commercial tools that are directly applicable to designs using wave-pipelining, each group has more or less developed in-house design analysis and optimization tools which enable VLSI design using wave-pipelining."

So I have assumed at the beginning of my project that if a new part on wave-pipelined circuit in HDL standard is well designed and laid out,any synthesizer manufacturers have the ability to generate a wave-pipelined circuit. The assumption was also based on your reference [1] (1998) at table 1 at page 142 where it indicates there are 30 wave-pipelined circuits (20 years ago), none of their authors have any relationships with a professional synthesizer manufacturer.

Furthermore during the development period I found that no matter how many types of wave-pipelined circuits are in the past or in the future, each of all wave-pipelined circuits comprises two part, one is the critical path, presented by CPC (Critical Path Component), all remaining logic is always the same for a group of wave-pipelined circuits WPC (Wave-Pipelining Component), depending on what target a designer wants for his circuit.

In my design no timings related to a wave-pipelined circuit appear, never, because they are within the scope of a synthesizer operation and have nothing to do with their coding.

There is no a commercial synthesizer in the world which can directly generate a wave-pipelined circuit. To prove my WPCs are correct, I coded a CPC which does nothing but passes the data in the critical path obeying a critical path behavior: if the critical path needs 5 cycle for signals to travel, its output would be available in 6 cycles and if the critical path is blocked, a later entered data would have a chance to damage an earlier entered data if design is not right. So essentially I have no very sophisticated tools used, nor timing analysis.

Thank you.

Weng

Weng Tianxiang · Jan 19, 2018

Hi,

I have told that my invention kernel idea is: A designer provides the least information and logic code about the critical path, and leaves all complex logic designs to a synthesizer and a system library that is what an HDL should do.

Here are the technique key points that I have used used to fully develop my technique, assuming that you are an experienced code designer in HDL.

Even though the technique is tricky, but it is easy to understand if you fully understand the concepts in this and next posts, each in 20 or more minutes for 80% engineers here

Here I am using 64*64 bits signed multiplexer as the target circuit example..

1. If my CPC_1_2 code is presented to a synthesizer, the first question you may ask is how do you code your WPC (Wive-Pipelining Component). For clarity, I copied the CPC_1_2 code here again.

By the way, I claim that nobody can further simplify the CPC_1_2 code to deliver full information about a critical path to a synthesizer for generating a wave-pipelined circuit! If you can, please challenge my claim.

entity CPC_1_2 is
generic (
input_data_width : positive := 64; -- optional
output_data_width : positive := 128 -- optional
);
port (
CLK : in std_logic;
WE_i : in std_logic; -- '1': write enable to input registers A & B
Da_i : in signed(input_data_width-1 downto 0); -- input data A
Db_i : in signed(input_data_width-1 downto 0); -- input data B
WE_o_i: in std_logic; -- '1': write enable to output register C
Dc_o : out unsigned(output_data_width -1 downto 0) -- output data C
);
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
signal Ra : signed(input_data_width-1 downto 0); -- input register A
signal Rb : signed(input_data_width-1 downto 0); -- input register B
signal Rc : signed(output_data_width-1 downto 0); -- output register C
signal Cl : signed(output_data_width-1 downto 0); -- combinational logic

begin
Cl <= Ra * Rb; -- combinational logic output, key part of CPC
Dc_o <= unsigned(Rc); -- output through output register

p_1 : process(CLK)
begin
if Rising_edge(CLK) then
if WE_i = '1' then -- WE_i = '1' : latch input data
Ra <= Da_i;
Rb <= Db_i;
end if;

if WE_O_I = '1' then -- WE_O_I = '1': latch output data
Rc <= Cl;
end if;
end if;
end process;
end A_CPC_1_2;

2. Assume 3 situations:
a) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per cycle, you should know how to code the WPC for the circuit. Because we have already assumed that the synthesizer is capable of generating the wave-pipelined circuit for it, leaving most difficult task to the synthesizer. By definition a WPC contains all remaining logic for the circuit except the CPC_1_2.

b) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per 2 cycles, you should know how to code the WPC for the circuit.

c) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per 2 cycles, but the designer wants the circuit to be able of accepting one data per cycle, not one data per 2 cycles, you should know how to code the WPC for the circuit with 2 copies of critical paths and each alternatively accepting an input data per 2 cycles. Actually all CPCs have 2 types of code patterns, CPC_1_2 is one of them and another CPC_3 is slightly complex, but is an off shelf coding pattern either.In this situation CPC_3 code would replace CPC_1_2 with same input and output interfaces.

Now the problem comes: how do you know all 3 unknown parameters before you code the WPC for the 64*64 bits signed multiplexer? I think that this is the key reason why so many wave-pipelined circuits have been generated, but none of the circuits designers can resolve the 50 years old open problem.

And the circuit may, should and can be any type of pipelined circuits!

To be continued.

I would like to listen to your questions and comments!

Weng

rickman · Jan 19, 2018

Weng Tianxiang wrote on 1/10/2018 8:56 PM:

Hi,

A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path from the input register passing through a piece of computational logic to the output register, and no intermediate registers.

My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.

All coding has 3 steps:
1. Write a Critical Path Component (CPC) with defined interface;

2. Call a Wave-Pipelining Component (WPC) provided by a system library;

3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is.

Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.wave_pipeline_package.all;

-- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
-- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
-- link1(): generation would fail if the circuit cannot accept 1 data per cycle
-- link2(): generation never fails and the circuit is capable of accepting 1 data per
-- INPUT_CLOCK_NUMBER cycles

entity CPC_1_2 is
generic (
input_data_width : positive := 64; -- optional
output_data_width : positive := 128 -- optional
);
port (
CLK : in std_logic;
WE_i : in std_logic; -- '1': write enable to input registers A & B
Da_i : in signed(input_data_width-1 downto 0); -- input data A
Db_i : in signed(input_data_width-1 downto 0); -- input data B
WE_o_i: in std_logic; -- '1': write enable to output register C
Dc_o : out unsigned(output_data_width -1 downto 0) -- output data C
);
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
signal Ra : signed(input_data_width-1 downto 0); -- input register A
signal Rb : signed(input_data_width-1 downto 0); -- input register B
signal Rc : signed(output_data_width-1 downto 0); -- output register C
signal Cl : signed(output_data_width-1 downto 0); -- combinational logic

begin
Cl <= Ra * Rb; -- combinational logic output, key part of CPC
Dc_o <= unsigned(Rc); -- output through output register

p_1 : process(CLK)
begin
if Rising_edge(CLK) then
if WE_i = '1' then -- WE_i = '1' : latch input data
Ra <= Da_i;
Rb <= Db_i;
end if;

if WE_O_I = '1' then -- WE_O_I = '1': latch output data
Rc <= Cl;
end if;
end if;
end process;

--------------------------------------------------------------------------------

end A_CPC_1_2;

In summary, after HDL adopting my system, writing a wave-pipelined circuit is simple as writing a one-cycle logic circuit.

Thank you.

Weng

What is SMB?

I think I understand the concept of wave pipelining. It is just eliminating
the intermediate registers of a pipeline circuit and designing the
combinational logic so that the delays are even enough across the many paths
so the output can be clocked at a given time and will receive a stable
result from the input N clocks earlier. In other words, the logic is
designed so that the changes rippling through the logic never catch up to
the changes created by the data entered 1 clock cycle earlier. Nice if you
can do it.

I can see where this would be useful in an ASIC. In ASICs FFs and logic
compete for space within the chip. In FPGAs the ratio between FFs and logic
are fixed and predetermined. So using logic without using the FFs that are
already there is not of much value.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Jan Coombs · Jan 20, 2018

On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

...

I think I understand the concept of wave pipelining. It is
just eliminating the intermediate registers of a pipeline
circuit and designing the combinational logic so that the
delays are even enough across the many paths so the output can
be clocked at a given time and will receive a stable result
from the input N clocks earlier. In other words, the logic is
designed so that the changes rippling through the logic never
catch up to the changes created by the data entered 1 clock
cycle earlier. Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation.

I can see where this would be useful in an ASIC. In ASICs FFs
and logic compete for space within the chip. In FPGAs the
ratio between FFs and logic are fixed and predetermined. So
using logic without using the FFs that are already there is
not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted.

Either way there must be a great deal of complex floor planning
and/or timing constraints needed to make this work. Automating
this would be amazing?

Jan Coombs

rickman · Jan 21, 2018

Jan Coombs wrote on 1/20/2018 2:20 PM:

On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

...

I think I understand the concept of wave pipelining. It is
just eliminating the intermediate registers of a pipeline
circuit and designing the combinational logic so that the
delays are even enough across the many paths so the output can
be clocked at a given time and will receive a stable result
from the input N clocks earlier. In other words, the logic is
designed so that the changes rippling through the logic never
catch up to the changes created by the data entered 1 clock
cycle earlier. Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation.

I can see where this would be useful in an ASIC. In ASICs FFs
and logic compete for space within the chip. In FPGAs the
ratio between FFs and logic are fixed and predetermined. So
using logic without using the FFs that are already there is
not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles.

If your stages are only using 2/3 of a clock, you can regroup the logic to
make it 1 clock each in two stages. There is supposed to be software to
handle that for you although I've never used it.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted.

There's your first mistake, no one uses Actel/Microsemi FPGAs. They long
for the day they are as big as Lattice, lol!

Either way there must be a great deal of complex floor planning
and/or timing constraints needed to make this work. Automating
this would be amazing?

Isn't that what the OP is claiming? I'm surprised he could make this work
over PVT. The actual stable time has to be on a clock edge, the same clock
edge under all conditions. I wouldn't want to try that manually in a simple
circuit.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Weng Tianxiang · Jan 21, 2018

On Saturday, January 20, 2018 at 4:17:02 PM UTC-8, rickman wrote:

Jan Coombs wrote on 1/20/2018 2:20 PM:
On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

...

I think I understand the concept of wave pipelining. It is
just eliminating the intermediate registers of a pipeline
circuit and designing the combinational logic so that the
delays are even enough across the many paths so the output can
be clocked at a given time and will receive a stable result
from the input N clocks earlier. In other words, the logic is
designed so that the changes rippling through the logic never
catch up to the changes created by the data entered 1 clock
cycle earlier. Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation.

I can see where this would be useful in an ASIC. In ASICs FFs
and logic compete for space within the chip. In FPGAs the
ratio between FFs and logic are fixed and predetermined. So
using logic without using the FFs that are already there is
not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles.

If your stages are only using 2/3 of a clock, you can regroup the logic to
make it 1 clock each in two stages. There is supposed to be software to
handle that for you although I've never used it.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted.

There's your first mistake, no one uses Actel/Microsemi FPGAs. They long
for the day they are as big as Lattice, lol!

Either way there must be a great deal of complex floor planning
and/or timing constraints needed to make this work. Automating
this would be amazing?

Isn't that what the OP is claiming? I'm surprised he could make this work
over PVT. The actual stable time has to be on a clock edge, the same clock
edge under all conditions. I wouldn't want to try that manually in a simple
circuit.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Rickďź

SMB stands for Series Master component with Buffering function, one of 2 WPC (Wive-Pipelining Component).

I don't understand what you are saying:
"Isn't that what the OP is claiming? I'm surprised he could make this work
over PVT. "

What do OP and PVT stand for?

My attention on this topic is centered on introduction of my inventions to public and asking for their critical comments, challenge or suspicion from technical point of view, not specially on whether or not they are useful.

Personally I never have a chance to write a pipelined circuit, not mention designing for a wave-pipelined circuit.

What I did is a result of my observation that such an important problem can be perfectly resolved by my insight as a person outside the wave-pipelined design circle, fully based on only one reference
[1] IEEE Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf .

Weng

Jan Coombs · Jan 21, 2018

On Sun, 21 Jan 2018 08:22:45 -0800 (PST)
Weng Tianxiang <wtxwtx@gmail.com> wrote:

[much irrelevant stuff snipped - please help with this]

My attention on this topic is centered on introduction of my
inventions to public and asking for their critical comments,
challenge or suspicion from technical point of view, not
specially on whether or not they are useful.

I was unable to quickly understand the "2 fast reading
materials" which you sent me.

Personally I never have a chance to write a pipelined circuit,
not mention designing for a wave-pipelined circuit.

Why do you have patents. A patent should disclose the method of
the novelty, so would need an implementation. Perhaps this is
what I am missing?

What I did is a result of my observation that such an
important problem can be perfectly resolved by my insight as a
person outside the wave-pipelined design circle, fully based
on only one reference [1] IEEE Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf .

Perhaps if you follow wave-pipelined techniques to the limit, you
will find yourself looking at asynchronous (or self clocked)
logic. There is also much historical work on this, and it may
be easier to test on FPGA chips[1].

Jan Coombs
--
[1] or at least drum up some business for Microsemi/Actel

HT-Lab · Jan 21, 2018

On 21/01/2018 00:16, rickman wrote:

Jan Coombs wrote on 1/20/2018 2:20 PM:
On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

Â Â ...

I think I understand the concept of wave pipelining.Â It is
just eliminating the intermediate registers of a pipeline
circuit and designing the combinational logic so that the
delays are even enough across the many paths so the output can
be clocked at a given time and will receive a stable result
from the input N clocks earlier.Â In other words, the logic is
designed so that the changes rippling through the logic never
catch up to the changes created by the data entered 1 clock
cycle earlier.Â Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation.

I can see where this would be useful in an ASIC.Â In ASICs FFs
and logic compete for space within the chip.Â In FPGAs the
ratio between FFs and logic are fixed and predetermined.Â So
using logic without using the FFs that are already there is
not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles.

If your stages are only using 2/3 of a clock, you can regroup the logic
to make it 1 clock each in two stages.Â There is supposed to be software
to handle that for you although I've never used it.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted.

There's your first mistake, no one uses Actel/Microsemi FPGAs.Â They
long for the day they are as big as Lattice, lol!

Microsemi has been at the number 3 spot for as long as I use FPGA's (+/-
28 years starting with Actel's A1010). They are twice as large as Lattice.

Here is a reference:

https://www.eetimes.com/author.asp?doc_id=1331443

Hans
www.ht-lab.com

Either way there must be a great deal of complex floor planning
and/or timing constraints needed to make this work. Automating
this would be amazing?

Isn't that what the OP is claiming?Â I'm surprised he could make this
work over PVT.Â The actual stable time has to be on a clock edge, the
same clock edge under all conditions.Â I wouldn't want to try that
manually in a simple circuit.

Richard Damon · Jan 21, 2018

On 1/21/18 11:22 AM, Weng Tianxiang wrote:

What do OP and PVT stand for?

OP = Original Poster, the person who started the topic

PVT = Process / Voltage / Temperature (I presume)

The issue being that gate delay isn't a hard fixed value, but changes
slightly (or not so slightly) from device to device and under varying
operating conditions, which brings in to question the designing of a
gate tree that presents results stably and reliably two clock cycles
after application, even with the inputs changing after one clock cycles.

Weng Tianxiang · Jan 21, 2018

On Sunday, January 21, 2018 at 8:44:09 AM UTC-8, Jan Coombs wrote:

On Sun, 21 Jan 2018 08:22:45 -0800 (PST)
Weng Tianxiang <wtxwtx@gmail.com> wrote:

[much irrelevant stuff snipped - please help with this]

My attention on this topic is centered on introduction of my
inventions to public and asking for their critical comments,
challenge or suspicion from technical point of view, not
specially on whether or not they are useful.

I was unable to quickly understand the "2 fast reading
materials" which you sent me.

Personally I never have a chance to write a pipelined circuit,
not mention designing for a wave-pipelined circuit.

Why do you have patents. A patent should disclose the method of
the novelty, so would need an implementation. Perhaps this is
what I am missing?

What I did is a result of my observation that such an
important problem can be perfectly resolved by my insight as a
person outside the wave-pipelined design circle, fully based
on only one reference [1] IEEE Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf .

Perhaps if you follow wave-pipelined techniques to the limit, you
will find yourself looking at asynchronous (or self clocked)
logic. There is also much historical work on this, and it may
be easier to test on FPGA chips[1].

Jan Coombs
--
[1] or at least drum up some business for Microsemi/Actel

Jam,

I don't think you are right: "Perhaps if you follow wave-pipelined techniques to the limit, you will find yourself looking at asynchronous (or self clocked) logic."

I had studied the asynchronous circuit, but found that it is a dead road based on its structural inefficiency and current commercial trend. And coding or synthesizing a wave-pipelined circuit has nothing to do with their counterpart for an asynchronous circuit, and the former is much more complex than asynchronous circuit!

Synthesizing a wave-pipelined circuit needs much more complex algorithms that have been matured since 1969 based on my observation.

My design never considers PVT, it belongs to another specialty field and I have zero knowledge on it.

From my point of view building a bridge between a code designer and a synthesizer is a very important issue to publicize the technology for wave-pipelined circuits:

in 1980 Intel published and developed 8087 for 32-bit floating multiplier; 10 and more years later, in 1997 they claimed MMX technology, including a second version of 64-bit floating multiplier. From my point of view the second version of 64-bit floating multiplier using MMX technology is none but a technology using wave-pipelined circuit.

Regular engineers never have a chance to implement a wave-pipelined circuit because of the complexity of all related PVT.

But according to my scheme, the most complex part of generating a wave-pipelined circuit is fully left to synthesizer manufacturers and a code designer in HDL only focuses his attention to how to code it with zero knowledge about how a wave-pipelined circuit is synthesized and generated that hopefully leads to a situation that any college student with basic knowledge in HDL can generate the second version of 64-bit floating multiplier within half an hour.

As far as 2 fast reading materials are concerned, please communicate with me through private email and let me know what you want: specification, drawing and source code in VHDL. Sorry, I mistakenly thought you were a lawyer, not an engineer.

Thank you.

Weng

rickman · Jan 22, 2018

Weng Tianxiang wrote on 1/21/2018 11:22 AM:

On Saturday, January 20, 2018 at 4:17:02 PM UTC-8, rickman wrote:
Jan Coombs wrote on 1/20/2018 2:20 PM:
On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

...

I think I understand the concept of wave pipelining. It is
just eliminating the intermediate registers of a pipeline
circuit and designing the combinational logic so that the
delays are even enough across the many paths so the output can
be clocked at a given time and will receive a stable result
from the input N clocks earlier. In other words, the logic is
designed so that the changes rippling through the logic never
catch up to the changes created by the data entered 1 clock
cycle earlier. Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation.

I can see where this would be useful in an ASIC. In ASICs FFs
and logic compete for space within the chip. In FPGAs the
ratio between FFs and logic are fixed and predetermined. So
using logic without using the FFs that are already there is
not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles.

If your stages are only using 2/3 of a clock, you can regroup the logic to
make it 1 clock each in two stages. There is supposed to be software to
handle that for you although I've never used it.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted.

There's your first mistake, no one uses Actel/Microsemi FPGAs. They long
for the day they are as big as Lattice, lol!

Either way there must be a great deal of complex floor planning
and/or timing constraints needed to make this work. Automating
this would be amazing?

Isn't that what the OP is claiming? I'm surprised he could make this work
over PVT. The actual stable time has to be on a clock edge, the same clock
edge under all conditions. I wouldn't want to try that manually in a simple
circuit.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Rickďź

SMB stands for Series Master component with Buffering function, one of 2 WPC (Wive-Pipelining Component).

I don't understand what you are saying:
"Isn't that what the OP is claiming? I'm surprised he could make this work
over PVT. "

What do OP and PVT stand for?

OP means "original poster" and is a common abbreviation in newsgroups. PVT
means Process, Voltage, Temperature and are the three main factors causing
variations in delay times in silicon chip. If you don't account for these
effects in your timing calculations you wave pipelining idea won't work. If
you aren't aware of this, I suspect you don't really understand how to
design FPGA devices. It isn't all text book analysis.

My attention on this topic is centered on introduction of my inventions to public and asking for their critical comments, challenge or suspicion from technical point of view, not specially on whether or not they are useful.

Personally I never have a chance to write a pipelined circuit, not mention designing for a wave-pipelined circuit.

What I did is a result of my observation that such an important problem can be perfectly resolved by my insight as a person outside the wave-pipelined design circle, fully based on only one reference
[1] IEEE Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf .

Then I think you have not solved anything. The problem with wave pipelining
is that the timing can vary so much that the output of the combinational
circuit won't be stable during the clock edges. If you haven't tested your
ideas by designing a circuit and running it on an FPGA, you don't know any
of this will work in the real world.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

rickman · Jan 22, 2018

Weng Tianxiang wrote on 1/21/2018 2:15 PM:

On Sunday, January 21, 2018 at 8:44:09 AM UTC-8, Jan Coombs wrote:
On Sun, 21 Jan 2018 08:22:45 -0800 (PST)
Weng Tianxiang <wtxwtx@gmail.com> wrote:

[much irrelevant stuff snipped - please help with this]

My attention on this topic is centered on introduction of my
inventions to public and asking for their critical comments,
challenge or suspicion from technical point of view, not
specially on whether or not they are useful.

I was unable to quickly understand the "2 fast reading
materials" which you sent me.

Personally I never have a chance to write a pipelined circuit,
not mention designing for a wave-pipelined circuit.

Why do you have patents. A patent should disclose the method of
the novelty, so would need an implementation. Perhaps this is
what I am missing?

What I did is a result of my observation that such an
important problem can be perfectly resolved by my insight as a
person outside the wave-pipelined design circle, fully based
on only one reference [1] IEEE Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf .

Perhaps if you follow wave-pipelined techniques to the limit, you
will find yourself looking at asynchronous (or self clocked)
logic. There is also much historical work on this, and it may
be easier to test on FPGA chips[1].

Jan Coombs
--
[1] or at least drum up some business for Microsemi/Actel

Jam,

I don't think you are right: "Perhaps if you follow wave-pipelined techniques to the limit, you will find yourself looking at asynchronous (or self clocked) logic."

I had studied the asynchronous circuit, but found that it is a dead road based on its structural inefficiency and current commercial trend. And coding or synthesizing a wave-pipelined circuit has nothing to do with their counterpart for an asynchronous circuit, and the former is much more complex than asynchronous circuit!

Synthesizing a wave-pipelined circuit needs much more complex algorithms that have been matured since 1969 based on my observation.

My design never considers PVT, it belongs to another specialty field and I have zero knowledge on it.

From my point of view building a bridge between a code designer and a synthesizer is a very important issue to publicize the technology for wave-pipelined circuits:

in 1980 Intel published and developed 8087 for 32-bit floating multiplier; 10 and more years later, in 1997 they claimed MMX technology, including a second version of 64-bit floating multiplier. From my point of view the second version of 64-bit floating multiplier using MMX technology is none but a technology using wave-pipelined circuit.

Regular engineers never have a chance to implement a wave-pipelined circuit because of the complexity of all related PVT.

But according to my scheme, the most complex part of generating a wave-pipelined circuit is fully left to synthesizer manufacturers and a code designer in HDL only focuses his attention to how to code it with zero knowledge about how a wave-pipelined circuit is synthesized and generated that hopefully leads to a situation that any college student with basic knowledge in HDL can generate the second version of 64-bit floating multiplier within half an hour.

The multiplier is not a good example to use as many FPGAs contain multiplier
blocks. But then they are pipelined and so won't work in a non-pipelined
solution, so maybe you can show your technique even if it has little
practical value in this case.

The problem is "the most complex part of generating a wave-pipelined circuit
is fully left to synthesizer manufacturers". Your method leaves me
wondering what your software is doing??? Asking the synthesizer companies
to solve your problems of making it work is a bit of a stretch. What makes
you think they will even take on your idea rather than provide their own
solution.

If your patent only covers the idea of writing simple HDL to describe the
circuit desired and leaving the implementation details to the synthesis
companies, I don't think you have actually patented anything. This part if
very obvious. The *real* work is in synthesizing a circuit that will work
in the FPGA.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

rickman · Jan 22, 2018

Richard Damon wrote on 1/21/2018 1:24 PM:

On 1/21/18 11:22 AM, Weng Tianxiang wrote:
What do OP and PVT stand for?

OP = Original Poster, the person who started the topic

PVT = Process / Voltage / Temperature (I presume)

The issue being that gate delay isn't a hard fixed value, but changes
slightly (or not so slightly) from device to device and under varying
operating conditions, which brings in to question the designing of a gate
tree that presents results stably and reliably two clock cycles after
application, even with the inputs changing after one clock cycles.

MUCH more than slightly. The numbers I have been told is 2:1 is not
uncommon. That's why overclockers can get CPU chips to run *much* faster
than they are rated. They provide very excellent cooling, tweak the PSU
voltage and select their special chips.

This is also why we use synchronous logic with registers for pipelines.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

rickman · Jan 22, 2018

HT-Lab wrote on 1/21/2018 1:19 PM:

On 21/01/2018 00:16, rickman wrote:
Jan Coombs wrote on 1/20/2018 2:20 PM:
On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

...

I think I understand the concept of wave pipelining. It is
just eliminating the intermediate registers of a pipeline
circuit and designing the combinational logic so that the
delays are even enough across the many paths so the output can
be clocked at a given time and will receive a stable result
from the input N clocks earlier. In other words, the logic is
designed so that the changes rippling through the logic never
catch up to the changes created by the data entered 1 clock
cycle earlier. Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation.

I can see where this would be useful in an ASIC. In ASICs FFs
and logic compete for space within the chip. In FPGAs the
ratio between FFs and logic are fixed and predetermined. So
using logic without using the FFs that are already there is
not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles.

If your stages are only using 2/3 of a clock, you can regroup the logic to
make it 1 clock each in two stages. There is supposed to be software to
handle that for you although I've never used it.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted.

There's your first mistake, no one uses Actel/Microsemi FPGAs. They long
for the day they are as big as Lattice, lol!

Microsemi has been at the number 3 spot for as long as I use FPGA's (+/- 28
years starting with Actel's A1010). They are twice as large as Lattice.

Here is a reference:

https://www.eetimes.com/author.asp?doc_id=1331443

There's some BS somewhere...

http://www.fpgadeveloper.com/2011/07/list-and-comparison-of-fpga-companies.html

More importantly, look at the numbers in your link. The Actell/Microsemi
numbers are going in the wrong direction! X, A and L are headed upward
year-to-year and Actel is headed down!

While looking this up I found a link indicating the JTAG interface of the
ProASIC3 devices has a back door which would allow their security to be
bypassed. Security was their claim to fame and this could be a major blow
to the company.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

My invention: Coding wave-pipelined circuits with buffering

Weng Tianxiang

Guest

Weng Tianxiang

Guest

Rick C. Hodgin

Guest

Weng Tianxiang

Guest

Weng Tianxiang

Guest

Jan Coombs

Guest

Weng Tianxiang

Guest

Weng Tianxiang

Guest

rickman

Guest

Jan Coombs

Guest

rickman

Guest

Weng Tianxiang

Guest

Jan Coombs

Guest

HT-Lab

Guest

Richard Damon

Guest

Weng Tianxiang

Guest

rickman

Guest

rickman

Guest

rickman

Guest

rickman

Guest

Log in

Welcome to EDABoard.com

Sponsor