Generating a desired synthesizable binary pulse train on FPG

chaitanya163 · Jul 22, 2014

Hello Everyone

I am new to VHDL programming and FPGA.
I have a Virtex - 4 FPGA and I wish to generate a binary pulse train of 1
pulses from FPGA using VHDL programming. My desired pulse train will b
like "1011100111101110". (min pulse width should be 30ns).
I have a clock of 100 MHz and I am able to divide the clock frequency t
get the clock of 10MHz (clock frequency required for my application). Als
I am aware of the fact that "Wait for" statement can not be used fo
synthesizing as it can only be used for test bench and simulation purposes

So I am struggling with this problem. I am wondering if I can use "afte
Xns" command in my VHDL code or if there is any other way to do it.

I will be very thankful if any feedback or advice is provided. You
response will truly be appreciated. Kindly provide your valuabl
suggestions.

Thanking you
Regards
Chaitanya Mauskar

---------------------------------------
Posted through http://www.FPGARelated.com

glen herrmannsfeldt · Jul 22, 2014

rickman <gnuarm@gmail.com> wrote:

On 7/22/2014 12:49 PM, chaitanya163 wrote:

I am new to VHDL programming and FPGA.

(snip)

So I am struggling with this problem. I am wondering if I can use "after
Xns" command in my VHDL code or if there is any other way to do it.

(snip)

In order to use an HDL (Hardware Description Language) you need to
understand hardware enough that you can then use the HDL to describe it.
Think about how you would do this in hardware if you were drawing a
schematic. Then you can figure out how to describe that circuit in
hardware.

I used to say that you should think about how you would built it using
TTL gates, but maybe not everyone knows about TTL by now.

You want to think about AND gates and flip-flops. For problems like
yours, the most important part could be a shift register, which
is a series of flip-flops.

-- glen

Aylons Hazzud · Jul 22, 2014

Le mardi 22 juillet 2014 16:27:41 UTC-3, Jon Elson a écrit :

chaitanya163 wrote:

You will not be able to get 30 ns pulses with a 10 MHz clock.

At least one part of your logic will have to run at a higher clock

rate. If you run the logic at 100 MHz, then you will have 10 ns

resolution on the bit timing.

I think 30ns is the minimum pulse size - it may be larger. If so, 10MHz will meet his specifications.

rickman · Jul 22, 2014

On 7/22/2014 12:49 PM, chaitanya163 wrote:

Hello Everyone

I am new to VHDL programming and FPGA.
I have a Virtex - 4 FPGA and I wish to generate a binary pulse train of 16
pulses from FPGA using VHDL programming. My desired pulse train will be
like "1011100111101110". (min pulse width should be 30ns).
I have a clock of 100 MHz and I am able to divide the clock frequency to
get the clock of 10MHz (clock frequency required for my application). Also
I am aware of the fact that "Wait for" statement can not be used for
synthesizing as it can only be used for test bench and simulation purposes.

So I am struggling with this problem. I am wondering if I can use "after
Xns" command in my VHDL code or if there is any other way to do it.

I will be very thankful if any feedback or advice is provided. Your
response will truly be appreciated. Kindly provide your valuable
suggestions.

In order to use an HDL (Hardware Description Language) you need to
understand hardware enough that you can then use the HDL to describe it.
Think about how you would do this in hardware if you were drawing a
schematic. Then you can figure out how to describe that circuit in
hardware.

So how would you design a circuit using gates and FFs to do this task?

--

Rick

mnentwig · Jul 23, 2014

Hi,

I can give you a quick-and-dirty skeleton in Verilog, just to get yo
started.

module myPulse(input wire clk, input wire rst, output wire sig);

always @(posedge clk) begin

end

endmodule

On 7/22/2014 12:49 PM, chaitanya163 wrote:
Hello Everyone

I am new to VHDL programming and FPGA.
I have a Virtex - 4 FPGA and I wish to generate a binary pulse train o
16
pulses from FPGA using VHDL programming. My desired pulse train will be
like "1011100111101110". (min pulse width should be 30ns).
I have a clock of 100 MHz and I am able to divide the clock frequenc
to
get the clock of 10MHz (clock frequency required for my application)
Also
I am aware of the fact that "Wait for" statement can not be used for
synthesizing as it can only be used for test bench and simulatio
purposes.

So I am struggling with this problem. I am wondering if I can us
"after
Xns" command in my VHDL code or if there is any other way to do it.

I will be very thankful if any feedback or advice is provided. Your
response will truly be appreciated. Kindly provide your valuable
suggestions.

In order to use an HDL (Hardware Description Language) you need to
understand hardware enough that you can then use the HDL to describe it.
Think about how you would do this in hardware if you were drawing a
schematic. Then you can figure out how to describe that circuit in
hardware.

So how would you design a circuit using gates and FFs to do this task?

--

Rick

---------------------------------------
Posted through http://www.FPGARelated.com

mnentwig · Jul 23, 2014

Hi,

(browser got trigger-happy, ignore my unfinished previous post if there i
any)

here's a quick-and-dirty skeleton in Verilog. There are many ways how t
approach this, for example use a state machine if it needs to be mor
complex.

This one will load "1" to the output as long as rst is asserted. When rs
goes low, the output will play back the sequence and continue with 0.

module myPulse(input wire clk, input wire rst, output wire pulseOut);

reg [15:0] myReg = 0;
assign pulseOut = myReg[15];

always @(posedge clk) begin
if (rst) begin
myReg <= 16'b1011100111101110;
end else begin
myReg <= myReg << 1;
end
end
endmodule

---------------------------------------
Posted through http://www.FPGARelated.com

Jon Elson · Jul 23, 2014

chaitanya163 wrote:

My desired pulse train will be
like "1011100111101110". (min pulse width should be 30ns).
I have a clock of 100 MHz and I am able to divide the clock frequency to
get the clock of 10MHz (clock frequency required for my application).

You will not be able to get 30 ns pulses with a 10 MHz clock.
At least one part of your logic will have to run at a higher clock
rate. If you run the logic at 100 MHz, then you will have 10 ns
resolution on the bit timing. You could have a 4-bit counter
running at 100 MHz/3 = 33 MHz or 30 ns period, and the desired bit
pattern entered into a 16:1 multiplexer. The counter selects the
inputs in the proper sequence to the multiplexer. Of course, the
synthesis tools will do a massive optimization of your description
and probably reduce it to about 5 LUTs or so.

Jon

Jon Elson · Jul 23, 2014

mnentwig wrote:

Hi,

(browser got trigger-happy, ignore my unfinished previous post if there is
any)

here's a quick-and-dirty skeleton in Verilog. There are many ways how to
approach this, for example use a state machine if it needs to be more
complex.

This one will load "1" to the output as long as rst is asserted. When rst
goes low, the output will play back the sequence and continue with 0.

module myPulse(input wire clk, input wire rst, output wire pulseOut);

reg [15:0] myReg = 0;
assign pulseOut = myReg[15];

always @(posedge clk) begin
if (rst) begin
myReg <= 16'b1011100111101110;
end else begin
myReg <= myReg << 1;
end
end
endmodule

Ummm, that looks like Verilog, the OP requested VHDL. While the
VHDL would not be vastly different, I don't think you can do the
<< operator quite so concisely in VHDL. I think you can do a
loop over the bits and assign myReg<n> <- myReg<n+1>

Jon

rickman · Jul 23, 2014

On 7/22/2014 3:30 PM, Jon Elson wrote:

mnentwig wrote:

Hi,

(browser got trigger-happy, ignore my unfinished previous post if there is
any)

here's a quick-and-dirty skeleton in Verilog. There are many ways how to
approach this, for example use a state machine if it needs to be more
complex.

This one will load "1" to the output as long as rst is asserted. When rst
goes low, the output will play back the sequence and continue with 0.

module myPulse(input wire clk, input wire rst, output wire pulseOut);

reg [15:0] myReg = 0;
assign pulseOut = myReg[15];

always @(posedge clk) begin
if (rst) begin
myReg <= 16'b1011100111101110;
end else begin
myReg <= myReg << 1;
end
end
endmodule
Ummm, that looks like Verilog, the OP requested VHDL. While the
VHDL would not be vastly different, I don't think you can do the
operator quite so concisely in VHDL. I think you can do a
loop over the bits and assign myReg<n> <- myReg<n+1

I'm sure that would work, but I find constructing a loop to be a bit
wordy. Here is a one line shift register.

myReg <= myReg(myReg'high-1 downto 0) & '0';

That's not so bad is it?

However, this is not an efficient use of resources in an FPGA using up
16 FFs along with the control logic, if any. If it were any larger I
would use a direct address of an array constant would use a four bit
counter and a single LUT used as memory.

constant SerialDataLength : integer := 16;
constant SerialData : unsigned (SerialDataLength-1 downto 0) :=
{'0', '1', '0', '1', '0', '1', '0', '1',
'0', '1', '0', '1', '0', '1', '0', '1'};
signal AddrReg : integer range 0 to SerialDataLength-1;
signal Start : std_logic;
signal CntrEn : std_logic;

AddrGen : process (clk, rst) begin
if (rst = '1') then
Start <= '1';
CntrEn <= '0';
AddrReg <= 0;
elsif (rising_edge(clk)) then
CntrEn <= Start;

if (AddrReg = SerialDataLength-1) then
Start <= '0';
CntrEn <= '0';
end if;

if (CntrEn = '1') then
AddrReg <= (AddrReg + 1) mod SerialDataLength;
end if;
end if;
end process AddrGen ;

Dout <= SerialData (AddrReg) when (Start or Stop = '0') else '0';

This should give you four FF/LUTs for the address register, three for
the control logic and one for the mux selecting the output for a total
of seven LUT/FFs, less than half of what it takes for the shift
register. For longer lengths of shift register the savings are more
pronounced.

--

Rick

Aleksandar Kuktin · Jul 23, 2014

On Tue, 22 Jul 2014 18:56:05 +0000, glen herrmannsfeldt wrote:

I used to say that you should think about how you would built it using
TTL gates, but maybe not everyone knows about TTL by now.

Indeed.

mnentwig · Jul 25, 2014

Hi,

However, this is not an efficient use of resources in an FPGA using up
16 FFs along with the control logic, if any. If it were any larger I
would use a direct address of an array constant would use a four bit
counter and a single LUT used as memory.

would this still apply if my design uses proportionally more LUTs tha
registers?
For example, here is a synthesis report for a minimal "medium" ZP
processor on Spartan 6 LX9 (that is most enthusiastically blinking its LE
as I write this):

Slice Logic Utilization:
Number of Slice Registers: 284 out of 11,440 2%
Number used as Flip Flops: 284
...
Number of Slice LUTs: 934 out of 5,720 16%
Number used as logic: 915 out of 5,720 15%
Number used as Memory: 9 out of 1,440 1%

This is not to argue the point, I just want to understand the possibl
trade-offs. For example, I wonder if it would make sense to replace smal
counters with one-hot shift registers in such a situation?

---------------------------------------
Posted through http://www.FPGARelated.com

rickman · Jul 25, 2014

On 7/25/2014 11:39 AM, mnentwig wrote:

Hi,

However, this is not an efficient use of resources in an FPGA using up
16 FFs along with the control logic, if any. If it were any larger I
would use a direct address of an array constant would use a four bit
counter and a single LUT used as memory.

would this still apply if my design uses proportionally more LUTs than
registers?
For example, here is a synthesis report for a minimal "medium" ZPU
processor on Spartan 6 LX9 (that is most enthusiastically blinking its LED
as I write this):

Slice Logic Utilization:
Number of Slice Registers: 284 out of 11,440 2%
Number used as Flip Flops: 284
...
Number of Slice LUTs: 934 out of 5,720 16%
Number used as logic: 915 out of 5,720 15%
Number used as Memory: 9 out of 1,440 1%

This is not to argue the point, I just want to understand the possible
trade-offs. For example, I wonder if it would make sense to replace small
counters with one-hot shift registers in such a situation?

First, my comment was about going the other direction, from a long shift
register to an encoded counter and memory. You are asking if it makes
sense to go from a state encoded counter to a one-hot register. I don't
see how that can save resources of any type. The one-hot register will
need at minimum one LUT per FF.

A counter is a very efficient use of the FPGA resources, however that is
not a useful FSM. To be useful there needs to be inputs which add logic
to the counter. In the simplest case this input is just a hold input
which comes free other than the logic to generate the hold signal. In a
more general case the counter will need to jump around rather than just
progressing through the states linearly. In this case the FSM is not
just a counter anymore and the LUT count increases.

So to answer your question, "it depends". lol But in general I would
not expect a one-hot implementation to use any fewer LUTs at the expense
of more FFs, but it is possible.

I've been watching the ZPU over the years and I would like to know what
your LUT count includes. Does that include I/O such as a UART? Any
idea how much is just for the CPU? Early on the ZPU people claimed a
*very* low LUT count of around 500 or less, IIRC. I believe the Spartan
6 has 6 input LUTs, so your LUT count is hard to compare to the LUT
counts using 4 input LUTs. Still, 900 is a fair amount more than 500.
I assume you have optimized for performance at the expense of size?

--

Rick

mnentwig · Jul 26, 2014

Hi,

I don't
see how that can save resources of any type. The one-hot register will
need at minimum one LUT per FF.

isn't a one-hot counter just a simple ring shift register? I can build i
from FFs without any further logic.
A simple experiment:

reg [1023:0] test = 1024'd1;
always @(posedge clk) begin
test <= {test[1022:0], test[1023]};
LED <= |test[1023:1];

The final "or" forces (mostly) use of physical FFs instead of LUTs in shif
register configuration

Number of Slice Registers: 1,252 out of 11,440 10%
Number used as Flip Flops: 1,252
Number of Slice LUTs: 573 out of 5,720 10%
Number used as logic: 216 out of 5,720 3%
Number used as Memory: 44 out of 1,440 3%
Number used as Shift Register: 44
Number used exclusively as route-thrus: 313

I've been watching the ZPU over the years and I would like to know what
your LUT count includes.

the one in my previous mail includes only the processor with on-chip RA
and a single "GPIO" on the bus for the LED. It's the so-called "medium
variant with some options changed. I use a simple "for" loop as benchmar
that controls the LED and it manages around 2M hardware writes / second.

There is also the "small" ZPU which is about half the size:
Number of Slice Registers: 258 out of 11,440 2%
Number of Slice LUTs: 596 out of 5,720 10%
This one includes a UART, 500 LUTs after setting options sounds correct.
It is, however, very slow, maybe 10 % of "medium".
I haven't optimized the settings, for example LUT sharing might reduce siz
further.
There are newer variants (ZPUino, "extreme" core) that are probably faster
especially with external memory.

If anybody knows a good, free CPU, I'd love to hear about. Those two wor
pretty well for me.
Faster CPUs exist, for example MICO32 was mentioned. I did some trials wit
that one, but it used too much space on the LX9, maybe three times as bi
as the "medium" ZPU if I remember correctly.

I don't use a CPU for high-performance computing, but mainly to chang
functionality quickly without rebuilding RTL: Compiling my test code
merging it to the bitstream and uploading takes only 750 ms,

---------------------------------------
Posted through http://www.FPGARelated.com

rickman · Jul 26, 2014

On 7/26/2014 2:24 AM, mnentwig wrote:

Hi,

I don't
see how that can save resources of any type. The one-hot register will
need at minimum one LUT per FF.

isn't a one-hot counter just a simple ring shift register? I can build it
from FFs without any further logic.

That's only if it is a simple counter with no other transitions or
controls other than an enable. Usually they need some sort of sync
reset which may or may not be supported by the FF primitive without a LUT.

A simple experiment:

reg [1023:0] test = 1024'd1;
always @(posedge clk) begin
test <= {test[1022:0], test[1023]};
LED <= |test[1023:1];

The final "or" forces (mostly) use of physical FFs instead of LUTs in shift
register configuration

Number of Slice Registers: 1,252 out of 11,440 10%
Number used as Flip Flops: 1,252
Number of Slice LUTs: 573 out of 5,720 10%
Number used as logic: 216 out of 5,720 3%
Number used as Memory: 44 out of 1,440 3%
Number used as Shift Register: 44
Number used exclusively as route-thrus: 313

I've been watching the ZPU over the years and I would like to know what
your LUT count includes.

the one in my previous mail includes only the processor with on-chip RAM
and a single "GPIO" on the bus for the LED. It's the so-called "medium"
variant with some options changed. I use a simple "for" loop as benchmark
that controls the LED and it manages around 2M hardware writes / second.

There is also the "small" ZPU which is about half the size:
Number of Slice Registers: 258 out of 11,440 2%
Number of Slice LUTs: 596 out of 5,720 10%
This one includes a UART, 500 LUTs after setting options sounds correct.
It is, however, very slow, maybe 10 % of "medium".

Yes, this is the one that I thought was impressive in terms of the tiny
size, but as you note, at a price of extreme lack of speed. I believe
the slowness comes from the architecture rather than the clock being a
lot slower. That is, the clock is still a reasonable speed, but it
needs a lot more of them to get the work done because of having fewer
data paths.

I haven't optimized the settings, for example LUT sharing might reduce size
further.

LUT sharing? Is that where the logic is broken into pieces which can be
shared between different paths when there is some overlap? I've never
bothered with that as I think the savings are typically pretty small.

There are newer variants (ZPUino, "extreme" core) that are probably faster,
especially with external memory.

If anybody knows a good, free CPU, I'd love to hear about. Those two work
pretty well for me.
Faster CPUs exist, for example MICO32 was mentioned. I did some trials with
that one, but it used too much space on the LX9, maybe three times as big
as the "medium" ZPU if I remember correctly.

I don't use a CPU for high-performance computing, but mainly to change
functionality quickly without rebuilding RTL: Compiling my test code,
merging it to the bitstream and uploading takes only 750 ms,

I'm not familiar with the MICO32... do you mean the one from Lattice,
maybe named MICRO32? I don't recall for sure. Just about any standard
RISC CPU will be a lot bigger than the ZPU. OpenCores has one they call
OpenRISC which has been around a while. I think it is fairly large
though. ZPU was designed specifically to be as small as possible for
code that needs very little speed. Then they decided to develop a few
faster variants which are totally binary compatible. I think they
achieved their objective and I have heard of it being used in some
business apps.

The other day I did see another soft core that is supported by a C
compiler, at least a beta version. I don't recall the name, but I
expect I could come up with it if you are interested. Everything else I
have seen are stack processors intended to run a Forth like language.
That can make for a very simple machine... like the ZPU.

--

Rick

rickman · Jul 26, 2014

Here is the info on the YARD-1 processor I was trying to remember. He
is doing an LCC backend so it has a C compiler, albeit in the early
stages still... This is the only other (than the ZPU) open source
softcore CPU I know of with C support.

To: <fpga-cpu@yahoogroups.com>
From: "brimdavis@aol.com [fpga-cpu]" <fpga-cpu@yahoogroups.com>
Subject: [fpga-cpu] State of the YARD, July 2014

Another in an occasional series of updates on the YARD-1 processor.

Cleanup:
Since my last status post[1], I've made some headway in cleaning up
the code and documentation; the repository now contains all the core
sources and some demo designs, in addition to the cross assembler tools
and ISA verification code.

Things are working well enough to use for small assembly projects,
although not all processor features are implemented or working yet.

Docs:
Google recently disabled the Downloads feature of Google Code, so I've
added a wiki page[2] directly linking to the documentation files in the
repository.

I've also added some wiki pages[3] summarizing the build results for
the Xilinx Spartan3 and Lattice XO2 demo designs.

ISA Changes:
Other than some minor encoding changes, the only instruction set
alterations of note were the {reluctant} replacement of the nifty bit
counting instructions with register-register 8|16 bit sign|zero
extending MOVes to better support LCC's code generator for char and
short operations on registers.

LCC:
The experimental YARD LCC port[4] now has a nearly complete (but not
well tested) integer back-end, but neither floating point support nor a
C library as of yet.

-Brian

[1] 2011 status post
http://groups.yahoo.com/neo/groups/fpga-cpu/conversations/topics/3362

[2] doc wiki link
http://code.google.com/p/yard-1/wiki/Documentation_Links

[3] build wiki links
http://code.google.com/p/yard-1/wiki/Lattice_XP2_Brevia
http://code.google.com/p/yard-1/wiki/Digilent_S3_Starter_Board

[4] lcc-homebrew link
http://code.google.com/p/lcc-homebrew

__._,_.___

--

Rick

mnentwig · Jul 26, 2014

HJi,

That's only if it is a simple counter with no other transitions or
controls other than an enable. Usually they need some sort of sync
reset which may or may not be supported by the FF primitive without
LUT.

thanks. Maybe I'll just leave it to the synthesis tool...

Yes, this is the one that I thought was impressive in terms of the tiny
size, but as you note, at a price of extreme lack of speed. I believe
the slowness comes from the architecture rather than the clock being a
lot slower. That is, the clock is still a reasonable speed, but it
needs a lot more of them to get the work done because of having fewer
data paths.

Yes, the achievable clock speed is even marginally higher for the small on
(~110 MHz vs 100 MHz, possibly faster if I'd tweak the settings).
It doesn't have registers, so every operand goes to the stack, if
remember correctly. The "medium" variant has a hardware cache for the las
two levels.

LUT sharing? Is that where the logic is broken into pieces which can be
shared between different paths when there is some overlap? I've never
bothered with that as I think the savings are typically pretty small.

There is an option to duplicate registers to reduce routing delay. But wha
I meant is to put several independent logic functions into the same LUT
i.e. four-input plus two-input to make it smaller. I haven't really rea
the manual too carefully here. The one optimization option that I foun
important is pipeline register balancing.

This is the MICO32 I meant:
http://en.wikipedia.org/wiki/LatticeMico32

I just got feedback in another forum that the openRisc processor was to
limited in terms of clock speed.
There is also an ARM clone (amber), but it seems quite big, 90 % of an LX
(compared to 20 and 10 % for the ZPUs)

I'll have a look at the YARD processor, thanks. Never heard about i
before.

For example, Ettus uses ZPUs in their SDR products, so I think I'm on th
right track with the ZPU. It doesn't have to be perfect, still beats th
alternative of running a separate MBED or raspberry board with a SPI lin
to the FPGA.

Cheers

Markus

---------------------------------------
Posted through http://www.FPGARelated.com

mnentwig · Jul 27, 2014

Hi,

>> So instead it seems they give you a 4LUT and a 2LUT. Better tha
nothing.

that's how I understand it, yes. Anyway, I'll come back to the options onc
I have some code that is worth optimizing...

A genuine ARM, with the hardware multiplier option, would be nice. Those d
one 32x32=>32 bit multiplication per clock cycle. But, I think an FPG
can't do that because I have to cascade two 18x18 multipliers and tha
needs pipeline registers or a slower clock.
So I'll use the softcore for control purposes, and do the "heavy lifting
in RTL. Too bad, there is a lot of audio C code out there that could b
adapted.

BTW, never use clock speed alone as a measure of performance. I can't
say if the openrisc processor is fast or not. I find it funny that you
would consider using the ZPU if you are looking for speed. I believe
the ZPU is the slowest processor I have ever seen.

Right. The reason is simply that I want to run it synchronously with th
DSP stuff at around 100 MHz (at least unless someone comes up with a bette
plan). That means, it will limit the maximum clock frequency of the whol
design.

Even if demoted to front panel controller, the ZPU would still be my choic
over the 8051 simply because it's 32 bit (got the T-shirt for "8051 fron
panel control" in hand-crafted assembler, a long time ago...)

-Markus

---------------------------------------
Posted through http://www.FPGARelated.com

rickman · Jul 27, 2014

On 7/27/2014 8:00 AM, mnentwig wrote:

A genuine ARM, with the hardware multiplier option, would be nice. Those do
one 32x32=>32 bit multiplication per clock cycle. But, I think an FPGA
can't do that because I have to cascade two 18x18 multipliers and that
needs pipeline registers or a slower clock.

I think an ARM CPU would be rather large although they have the M1 (or
is it the M0?) intended for FPGA use. I wonder if anyone has cloned
that yet?

Why would you need it to be cycle accurate? The multiplier is already
pipelined even if you just use one by itself. It comes with an output
register like the block memory so you can't send the results anywhere
until the next clock cycle. Using four of them to produce a 64 bit
result and save the result in a register would take 2 clocks; one for
the multiplies and one for the adds and save... unless you do some
hardware register renaming... set a flag that says the output of the
multiplier is Rxx instead of the register file. Hmmmm... I need to
think about that one. It takes an extra mux which is not cheap in FPGAs
though. The ARM has any number of multi-clock cycle instructions, why
couldn't the multiply be one of them?

I have this problem in my stack CPU design. It was originally done in
an older part where the block RAM can be run async and so a read can be
written to the top of stack in one clock cycle - *all* instructions are
1 clock cycle, this is a primary design goal. With a sync RAM the data
is not available until the next clock cycle, so I have to find tricks to
make it work. One is to use two instructions to read memory, one to
start the read and one to grab the output - repercussions for
exceptions, now there is another register to save. Or I have considered
grabbing the input to the address register rather than the output and
doing a read on every clock cycle... somewhat wasteful of power and I
intend to use this in a low power design.

So I'll use the softcore for control purposes, and do the "heavy lifting"
in RTL. Too bad, there is a lot of audio C code out there that could be
adapted.

BTW, never use clock speed alone as a measure of performance. I can't
say if the openrisc processor is fast or not. I find it funny that you
would consider using the ZPU if you are looking for speed. I believe
the ZPU is the slowest processor I have ever seen.

Right. The reason is simply that I want to run it synchronously with the
DSP stuff at around 100 MHz (at least unless someone comes up with a better
plan). That means, it will limit the maximum clock frequency of the whole
design.

Even if demoted to front panel controller, the ZPU would still be my choice
over the 8051 simply because it's 32 bit (got the T-shirt for "8051 front
panel control" in hand-crafted assembler, a long time ago...)

I wouldn't consider using an 8051 myself if there were good
alternatives. But I am in the stack processor crowd (which the ZPU is a
member of oddly enough) and am happy programming in Forth or something
like it. I like working close to the hardware and I find it very useful
to have a processor with all instructions 1 clock cycle long. The ZPU
would drive me batty and I would never want to program it in C.

--

Rick

rickman · Jul 28, 2014

On 7/27/2014 8:00 AM, mnentwig wrote:

Right. The reason is simply that I want to run it synchronously with the
DSP stuff at around 100 MHz (at least unless someone comes up with a better
plan). That means, it will limit the maximum clock frequency of the whole
design.

Even if demoted to front panel controller, the ZPU would still be my choice
over the 8051 simply because it's 32 bit (got the T-shirt for "8051 front
panel control" in hand-crafted assembler, a long time ago...)

BTW, do you know about the ZPU mailing list?

zylin-zpu mailing list
zylin-zpu@zylin.com
http://zylin.com/mailman/listinfo/zylin-zpu_zylin.com

--

Rick

mnentwig · Jul 30, 2014

Yes, thanks. I registered once, maybe I'll have a look through th
archives.

---------------------------------------
Posted through http://www.FPGARelated.com

Generating a desired synthesizable binary pulse train on FPG

chaitanya163

Guest

glen herrmannsfeldt

Guest

Aylons Hazzud

Guest

rickman

Guest

mnentwig

Guest

mnentwig

Guest

Jon Elson

Guest

Jon Elson

Guest

rickman

Guest

Aleksandar Kuktin

Guest

mnentwig

Guest

rickman

Guest

mnentwig

Guest

rickman

Guest

rickman

Guest

mnentwig

Guest

mnentwig

Guest

rickman

Guest

rickman

Guest

mnentwig

Guest

Log in

Welcome to EDABoard.com

Sponsor