EDK : FSL macros defined by Xilinx are wrong

Jim Granville · Apr 21, 2006

Eric Smith wrote:

Mike Treseler <mike_treseler@comcast.net> writes:

Problem 2.

The average software designer couldn't describe
two gates and flip flop in vhdl or verilog.

Problem 3.

The average software designer couldn't describe two gates
and a flip-flop in C (or any other programming language), but
would instead describe something that synthesizes to a large
collection of gates and flip-flops.

3b, Without realising it.

-jg

Brian Davis · Apr 21, 2006

john wrote:

It is for a bidirectionnal signal: input is registered into IOB, output is also registered there,
but the duplicated tristate_enable registers don't want to go inside the OLOGIC (Virtex 4).
Each of them is not that far, but not into the IOB!

Last time I tried this with XST 6.3 / Spartan-3, I had to try a few
coding variants before all the data registers and tristate controls
were properly stuffed into the IOBs from non-structural HDL code.

Below are some simplified (hand edited,uncompiled!!) code snippets
from a S3 eval kit RAM test that I posted last fall, for the whole
thing see :
ftp://members.aol.com/fpgastuff/ram_test.zip

Code Snippets:

<ports>

ram_addr : out std_logic_vector(17 downto 0);
ram_dat : inout std_logic_vector(15 downto 0);

<signals>

--
-- internal ram signals
--
signal addr : std_logic_vector(17 downto 0);
signal din : std_logic_vector(15 downto 0);
signal ram_dat_reg : std_logic_vector(15 downto 0);

signal wdat_oe_l : std_logic;

--
-- IOB attribute needed to replicate tristate enable FFs in each IOB
--
attribute iob of wdat_oe_l : signal is "true";

<code>

--
-- output data bus tristate
--
-- XST seems to want tristates coded like this to push both
-- the tristate control register and the data register into IOB
-- ( had previously been coded as clocked tristate assignment )
--
ram_dat <= ram_dat_reg when wdat_oe_l = '0' else ( others => 'Z'
);

--
-- registered RAM I/O
--
process(clk)
begin

if rising_edge(clk) then

--
-- IOB registers
--
ram_dat_reg <= tdat(15 downto 0);
ram_addr <= taddr;

--
-- registered tristate control signal
-- coded this way, with IOB attribute on wdat_oe_l, so
-- XST will replicate tristate control and push into IOBs
--
if (done_p1 = '0') and ( read_write_p1 = '0') then
wdat_oe_l <= '0';
else
wdat_oe_l <= '1';
end if;

--
-- register input data
--
din <= ram_dat;

end if;

end process;

Apr 21, 2006

3b, Without realising it.

The interesting point in this process, is that the tools are evolving
to hide
design issues that are seldom a worry for typical cases like
reconfigurable
computing on FPGA compute engines.

Every programmer decides how big each and every variable should be.
For a machine that has a few Gigabytes of memory, and a 64bit native
word size, using 64bit variable may be either free, or faster as it may
not
take an extra step to sign extend the memory on register load.

When programmers move to smaller processors, they quickly learn that
when
programming a PIC micro, that 64bit word sizes just don't work well.

When programmers encounter FPGA compute engines the same processes
quickly come into play, and a short mentoring of the newbies to size
variables
by the bit, or be careful and use char, int, long, and long long
properly isn't that
difficult, or even unexpected.

if the fpga is a single toy sized fpga, it's no different that
programming a PIC
micro, as resources are tight, and the programmer will adapt.

If the fpga system is 4,096 tightly interconnected XC4VLX200's and the
application
isn't particularly large, I suspect the programmer writing applications
for this
fpga based super computer will not have to worry about fit. If they are
fine tuning
the bread and butter simulations at places like Sandia Labs, I suspect
the programmers
will have more than enough experience and skill to size variables
properly and be
very much in tune with space time tradeoffs for applications far more
complex
than even a typical programmer would consider.

it's reconfigurable computing projects where libraries of designs
become very
useful, particularly for SoC designs that used to be an EE design task,
and
is rapidly becoming mainstreamed that software engineers are the most
likely
target as the market continues to mature and expand.

There will be some dino's that stand in the tar pits admiring the bits
as the sun
sets on that segment of their employment history.

Eric Smith · Apr 21, 2006

air_bits@yahoo.com writes:

There is a small setup overhead for the main, but for example
this certainly does NOT synthesize "to a large collection of
gates and flip-flps" as you so errantly assert cluelessly:

main()
{

int a:1,b:1,c:1,d:1;
#pragma inputport (a);
#pragma inputport (b);
#pragma inputport (c);
#pragma inputport (d);

int sum_of_products:1;
#pragma outputport (sum_of_products);

while(1) {
sum_of_products = (a&b) | (c&d);
}
}

Why should a C programmer expect that to synthesize any flip-flops
at all? It looks purely combinatorial.

How would you write it if you did NOT want a flip-flop, but only
a combinatorial output?

Anyhow, I wasn't suggesting that the language couldn't represent a few
gates and a flip-flop. My point is that C programmers don't think
in those terms, so anything they write is likely to result in really
inefficient hardware designs.

For example, typical C code for a discrete cosine transform can be found
here:

http://www.bath.ac.uk/elec-eng/pages/sipg/resource/c/fastdct.c

But I suspect that code will synthesize to something at least an
order of magnitude larger and an order of magnitude slower than a
typical HDL implementation.

That doesn't mean that you couldn't write a DCT in C that would
synthesize to something efficient; it just means that a normal C
programmer *wouldn't* do that. You'd have to train the C programmer
to be hardware designer first, and by the time you've done that
there's little point to using C as the HDL, since the whole point
of using C as an HDL was to take advantage of the near-infinite
pool of C programmers.

Eric

Eric Smith · Apr 21, 2006

I wrote:

Problem 3.
The average software designer couldn't describe two gates
and a flip-flop in C (or any other programming language), but
would instead describe something that synthesizes to a large
collection of gates and flip-flops.

Jim Granville <no.spam@designtools.co.nz> writes:

3b, Without realising it.

Exactly so. It's perhaps less commonly seen in C, since C *only* has
low-level constructs, but the vast majority of C++ and Java programmers
seem to have no conception of what the compiler is likely to emit for
the programming constructs they use.

A former coworker once tried to write C++ code to talk to Dallas one-wire
devices. He spend days trying to debug it before someone took pity on
him and pointed out that by the time the constructor for one of his
objects executed the entire transaction had timed out.

Eric

Apr 21, 2006

Actually you are quite wrong on this point. Programmers write tight
and efficient code for embedded micro applications all the time.

Exactly the same processes come into play as we mentor programmers
on projects that use an FPGA as the compute engine.

Good tools, reasonable training/mentoring, and design standards avoid
gross mistakes when they matters.

When they don't, who cares?

Apr 21, 2006

So what's the point other than a gross management failure to properly
train and mentor the responsible engineers .... maybe so they can
beat their chests?

I've seen similar failures with EE's that thought they could write
programs
just because they learned to code C, or Fortran, or Basic, and then
fell
flat on their faces trying to implement trivial algorithms that any
third year
Computer Science student would do in their sleep.

I've seen similar failures with both EE's and CSc students fresh out of
school
lock up when faced with their first real development job simply because
the
lacked the confidence and experience (and mentoring management) to
learn
their first job.

So, what's your point really? .... that FPGA's are only for REAL
EXPERIENCED
EE's?

sorry ...

Eric Smith · Apr 21, 2006

I wrote:

For example, typical C code for a discrete cosine transform can be found
here:

http://www.bath.ac.uk/elec-eng/pages/sipg/resource/c/fastdct.c

But I suspect that code will synthesize to something at least an
order of magnitude larger and an order of magnitude slower than a
typical HDL implementation.

To be fair, I wasn't trying to compare floating point C code to integer-only
HDL code. So I would make the same claim for a typical C DCT implementation
that does NOT use floating point:

http://www.brouhaha.com/~eric/software/jpeg-6b/jfdctint.c

Eric Smith · Apr 21, 2006

air_bits@yahoo.com writes:

Actually you are quite wrong on this point. Programmers write tight
and efficient code for embedded micro applications all the time.

Certainly. That's what I do for a living. But the tight and efficient
code I write for execution on a microcontroller is not likely to
synthesize to an efficient FPGA implementation, because that wasn't
what I was targetting. When I design hardware, I use very different
optimization criteria.

Jim Granville · Apr 21, 2006

air_bits@yahoo.com wrote:

There is a small setup overhead for the main, but for example
this certainly does NOT synthesize "to a large collection of
gates and flip-flps" as you so errantly assert cluelessly:

main()
{

int a:1,b:1,c:1,d:1;
#pragma inputport (a);
#pragma inputport (b);
#pragma inputport (c);
#pragma inputport (d);

int sum_of_products:1;
#pragma outputport (sum_of_products);

while(1) {
sum_of_products = (a&b) | (c&d);
}
}
snip XNF

That's a good example; can you add to this, code for a HC161/HC163,
which is a 4 bit binary UP counter, with Sync preload, and Async/Sync
reset ?

Are there other output choices, besides XNF ?

How do you verify operation ?

-jg

Apr 21, 2006

Eric Smith writes:

How would you write it if you did NOT want a flip-flop, but only
a combinatorial output?

Depends on the tool. TMCC/FpgaC registers the state of every
variable by default, and it takes a minor edit of the output netlist
to remove the register. Or since it's open source, add a pragma
option for output port to skip the register and customize the compiler
for the project if there we more than a few that a script would
easily fix.

Ray Andraka · Apr 21, 2006

For what it is worth, I've seen some pretty dismal FPGA designs come out
of people who are supposedly digital designers too. It generally does
take someone seasoned to turn out an FPGA design that uses the resources
somewhat efficiently, that is going to be reliable, and that isn't going
to spend a large part of the time to market in some lab chasing down
countless naive design errors.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

Eric Smith · Apr 21, 2006

air_bits@yahoo.com writes:

Depends on the tool. TMCC/FpgaC registers the state of every
variable by default, and it takes a minor edit of the output netlist
to remove the register. Or since it's open source, add a pragma
option for output port to skip the register and customize the compiler
for the project if there we more than a few that a script would
easily fix.

How many real-world designs have NO combinatorial outputs?

How does TMCC/FpgaC know *which* clock to use to register a given
variable?

John Adair · Apr 21, 2006

I'll add to that and say that the USB controller in the PC is probably
sitting on a PCI bus or at least affected by the traffic on it. Also worth
saying fastest USB is 480 MBit/s excluding overheads and 32bit/33MHz PCI is
1056 MBit/s excluding overheads.

John Adair
Enterpoint Ltd. - Home of Broaddown2. The Ultimate Spartan3 Development
Board.
http://www.enterpoint.co.uk

"Nial Stewart" <nial@nialstewartdevelopments.co.uk> wrote in message
news:43746c57$0$355$da0feed9@news.zen.co.uk...

Hi
Does anyone have some advice for the fastest say to get many MBytes of
data from a Spartan3 fifo to the hard disk of a PC via usb. I assume
that it is a combination of the best USB interface next to the FPGA and
perhaps a USB chipset in the PC that can do some very clever DMA.
I don't want to mess with custom RAID stuff I just want to dump it to a
standard hard disk & controller.
Any pointers appreciated.
Colin

Colin,

I think a PCI interface capable of bus mastering will still be faster than
a USB2.0 interface.

The PCI performance is slightly un-deterministic as it depends what else
is
on your bus, but from what I've read here over the years I think 80MB/s is
a reasonable expectation.

I half remember reading that although USB 2.0 gives you 460(?) Mb/s that
this doesn't directly translate ~ 460/8 MB/s of data through-put (ie
60MB/s).

Nial.

-------------------------------------------------------------
Nial Stewart Developments Ltd
FPGA and High Speed Digital Design
www.nialstewartdevelopments.co.uk

Kryten · Apr 21, 2006

"Ray Andraka" <ray@andraka.com> wrote in message
news:F0Scf.2230$Mi5.2093@dukeread07...

This is true provided you access every single row,
well at least every row you have data in, within
the refresh time. This can be used to advantage in video frame buffers,
for example as long as the frame time does not exceed the refresh time.
So yes, it can be useful. It doesn't save a lot of memory bandwidth or
time, but it can substantially simplify the DRAM controller in your
design.

Good point.

The BBC micro interleaved access with the 6845 CRTC, and the regular
sequential video accesses did the refresh.

johnp · Apr 21, 2006

We recently implemented a design that moved to a PC using USB 2.0

We used a Spartan3 coupled to a Cypress FX2 part and were able to
achieve
30MB/sec transfer rates. It took careful driver development on the PC
to get the
performance level up high enough.

It was a fun project, but the Windows driver development took a lot
longer than
expected - it was tricky to get the performance up to where we needed
it.

John Providenza

Mike Treseler · Apr 21, 2006

Martin Thompson wrote:

Personally I'd love to get
away from writing low-level VHDL for *some* of the things I do, but I
wouldn't fancy writing an SDRAM controller in a C-like HDL either.

There is a middle ground already available
using standard vhdl synthesis that
doesn't cost a dime or a gate or a flop.

I use single process entities
with no signal declarations.
I declare process variables for every
local register and output port.
I use functions to create values
and procedures to collect and name
all repeated command sequences.

I use exactly this template of
three top procedures in every
new design entity:

begin -- process template
if reset = '1' then
init_all_regs; -- no port init required
elsif rising_edge(clock) then
update_process_regs;
end if;
update_output_ports; -- wires ok for reset or clk
end process template;

This style feels somewhat C-like but the
template keeps me in "think hardware" mode.

-- Mike Treseler

Apr 21, 2006

Mike Treseler wrote:

This style feels somewhat C-like but the
template keeps me in "think hardware" mode.

Yep. Then the big difference starts, debugging in "think hardware"
mode, or debugging in "think software" mode using source level
debuggers with break points, single stepping and variable watching
at a high level.

Apr 21, 2006

Jim Granville wrote:

It would requires more of the designer, but what about a pragma like
the ASM one, in many C's ? - it would accept VHDL (or verilog), and
pass on to the downstream tools.
The advantage is it would understand the variable names, and scopes, of
the other source ( much like in line asm does now ? ).
If a LOT of HDL code was needed, then separate code modules would be
better.

That's a hard one, and I've considered it several times, then backed
off to
using perl post processing of the XNF.

My current feeling is that it's probably wrong to do this in FpgaC,
except
maybe for the XNF outputs. The real problem here is that any feature
such
as an ASM, probably needs to be reasonably respresentable in any of the
output forms for the net list. As we move from XNF to EDIF netlists, it
gets even more difficult, because small changes need to be reflected
into
several different portions of the EDIF output, where at least XNF is
pretty
linear in that respect, just as machine language statements would be
into
an assembler. To be truthful, to make ASM work, the logic optimizations
would have to be turned off, and the resulting netlist would get pretty
bloated. There would also have to be some specific hacks to make the
internal symbols visible, since there are not architecturally specified
resources like machine registers that ASM would target in a traditional
ISA machine.

The last arguement against it, is that it breaks the reason for making
FpgaC
as close to std C on a CPU/Memory, as the code is no longer directly
testable on a traditional programming platform. So from this
perspective
it's better to push things that don't directly fit the std C model,
into another
HDL (Celoxica-C, VHDL, Verilog) and debug those parts separately using
hardware simulators and hardware debuggers, and protect the ability to
debug the rest of the C HLL system with stubs for the hardware
interfaces
and use more efficient HLL debugging tools.

It would not suprise me to see one or more people fork FpgaC to become
an HDL for specific projects, or even take TMCC which was intended as
an HDL and similarly extend it. If it makes sense at some point, maybe
the FpgaC team will go ahead and do it, creating a second FpgaC_HDL
tool chain.

There is likely to always be this dual headed monster of is FpgaC an
HLL for reconfigurable computing, or an HDL for hardware design on
FPGA's? My perspective is that doing the HDL side right requires a
lot of optimizations for target FPGA's that are not particularly
general.
I believe it is possible to do FpgaC for reconfigurable computing uses
that is a lot more general, less tied to target FPGAs, and this is
the use that is not well supported by other HLL/HDL tools, paticularly
when you consider test and debugging and the relatively horrible mess
of using gate/timing simulators to debug HLL algorithms and multiple
threaded applications with communications.

Noway2 · Apr 21, 2006

Can you get some testing documentation from the battery company that
certifies that it meets the requirements?

I know that FM likes to do their own testing, but they may be willing
to accept data that proves that it meets their requirments, especially
if fault testing is hampered by a safety feature of the device and to
bypass it would be dangerous.

EDK : FSL macros defined by Xilinx are wrong

Jim Granville

Guest

Brian Davis

Guest

Guest

Eric Smith

Guest

Eric Smith

Guest

Guest

Guest

Eric Smith

Guest

Eric Smith

Guest

Jim Granville

Guest

Guest

Ray Andraka

Guest

Eric Smith

Guest

John Adair

Guest

Kryten

Guest

johnp

Guest

Mike Treseler

Guest

Guest

Guest

Noway2

Guest

Welcome to EDABoard.com

Sponsor

Online statistics

Forum statistics

EDK : FSL macros defined by Xilinx are wrong

Jim Granville

Guest

Brian Davis

Guest

Guest

Eric Smith

Guest

Eric Smith

Guest

Guest

Guest

Eric Smith

Guest

Eric Smith

Guest

Jim Granville

Guest

Guest

Ray Andraka

Guest

Eric Smith

Guest

John Adair

Guest

Kryten

Guest

johnp

Guest

Mike Treseler

Guest

Guest

Guest

Noway2

Guest

Log in

Welcome to EDABoard.com

Sponsor