PHB FPGA question...

S

server

Guest
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.






--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.
 
Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I think, if that would be possible in a reliable way, Xilinx would
spec it that way and would ask for more money.

Gerhard
 
Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I think, if that would be possible in a reliable way, Xilinx would
spec it that way and would ask for more money.

Gerhard
 
Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I think, if that would be possible in a reliable way, Xilinx would
spec it that way and would ask for more money.

Gerhard
 
Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I think, if that would be possible in a reliable way, Xilinx would
spec it that way and would ask for more money.

Gerhard
 
Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I think, if that would be possible in a reliable way, Xilinx would
spec it that way and would ask for more money.

Gerhard
 
On Sun, 27 Dec 2020 23:12:15 +0100, Gerhard Hoffmann <dk4xp@arcor.de>
wrote:

Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.

The clock into the FPGA is from a differential PECL comparator, so
using an LVDS input makes sense. The output will be 3.3v cmos.

Rumor has it that, in general, lvds i/o is about a ns faster than
cmos.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.

That\'s the standard answer: it depends.

There are a zillion appnotes and class notes, none of which include
the word \"nanosecond.\"





The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I want my output to transition immediately after the first clock edge.
Or maybe before.


--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.
 
On Sun, 27 Dec 2020 23:12:15 +0100, Gerhard Hoffmann <dk4xp@arcor.de>
wrote:

Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.

The clock into the FPGA is from a differential PECL comparator, so
using an LVDS input makes sense. The output will be 3.3v cmos.

Rumor has it that, in general, lvds i/o is about a ns faster than
cmos.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.

That\'s the standard answer: it depends.

There are a zillion appnotes and class notes, none of which include
the word \"nanosecond.\"





The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I want my output to transition immediately after the first clock edge.
Or maybe before.


--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.
 
On Sun, 27 Dec 2020 23:12:15 +0100, Gerhard Hoffmann <dk4xp@arcor.de>
wrote:

Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.

The clock into the FPGA is from a differential PECL comparator, so
using an LVDS input makes sense. The output will be 3.3v cmos.

Rumor has it that, in general, lvds i/o is about a ns faster than
cmos.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.

That\'s the standard answer: it depends.

There are a zillion appnotes and class notes, none of which include
the word \"nanosecond.\"





The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I want my output to transition immediately after the first clock edge.
Or maybe before.


--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.
 
On Sun, 27 Dec 2020 23:12:15 +0100, Gerhard Hoffmann <dk4xp@arcor.de>
wrote:

Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.

The clock into the FPGA is from a differential PECL comparator, so
using an LVDS input makes sense. The output will be 3.3v cmos.

Rumor has it that, in general, lvds i/o is about a ns faster than
cmos.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.

That\'s the standard answer: it depends.

There are a zillion appnotes and class notes, none of which include
the word \"nanosecond.\"





The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I want my output to transition immediately after the first clock edge.
Or maybe before.


--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.
 
On Sunday, December 27, 2020 at 1:51:05 PM UTC-7, jla...@highlandsniptechnology.com wrote:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.






--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.

I\'m not exactly sure what your goal is, but if you want to subtract out the clock routing delay, use an MMCM so that the clock to the flipflop will have nearly the same phase as the clock at the input. You can also make sure that the flipflop is an output flop packed in the IOB so that the flop output -> pin delay will be short and more deterministic. You can use a directive in the HDL to ensure the flop is in an IOB. You can also set a constraint for the max delay out; otherwise the tools assume they have an entire clock period to get the signal from the flop to the pin. If the input is really asynchronous, you really ought to use a 2-flop synchronizer.

Some of the Xilinx parts have a separate column in the datasheet for a lower core voltage, which saves power but degrades timing. I definitely wouldn\'t try increasing the voltage to beyond the spec. It might work, for a while... The best way to get better speed would probably be to ensure that the junction temperature stays low.
 
On Sunday, December 27, 2020 at 1:51:05 PM UTC-7, jla...@highlandsniptechnology.com wrote:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.






--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.

I\'m not exactly sure what your goal is, but if you want to subtract out the clock routing delay, use an MMCM so that the clock to the flipflop will have nearly the same phase as the clock at the input. You can also make sure that the flipflop is an output flop packed in the IOB so that the flop output -> pin delay will be short and more deterministic. You can use a directive in the HDL to ensure the flop is in an IOB. You can also set a constraint for the max delay out; otherwise the tools assume they have an entire clock period to get the signal from the flop to the pin. If the input is really asynchronous, you really ought to use a 2-flop synchronizer.

Some of the Xilinx parts have a separate column in the datasheet for a lower core voltage, which saves power but degrades timing. I definitely wouldn\'t try increasing the voltage to beyond the spec. It might work, for a while... The best way to get better speed would probably be to ensure that the junction temperature stays low.
 
On Sunday, December 27, 2020 at 1:51:05 PM UTC-7, jla...@highlandsniptechnology.com wrote:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.






--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.

I\'m not exactly sure what your goal is, but if you want to subtract out the clock routing delay, use an MMCM so that the clock to the flipflop will have nearly the same phase as the clock at the input. You can also make sure that the flipflop is an output flop packed in the IOB so that the flop output -> pin delay will be short and more deterministic. You can use a directive in the HDL to ensure the flop is in an IOB. You can also set a constraint for the max delay out; otherwise the tools assume they have an entire clock period to get the signal from the flop to the pin. If the input is really asynchronous, you really ought to use a 2-flop synchronizer.

Some of the Xilinx parts have a separate column in the datasheet for a lower core voltage, which saves power but degrades timing. I definitely wouldn\'t try increasing the voltage to beyond the spec. It might work, for a while... The best way to get better speed would probably be to ensure that the junction temperature stays low.
 
On Sunday, December 27, 2020 at 1:51:05 PM UTC-7, jla...@highlandsniptechnology.com wrote:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.






--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.

I\'m not exactly sure what your goal is, but if you want to subtract out the clock routing delay, use an MMCM so that the clock to the flipflop will have nearly the same phase as the clock at the input. You can also make sure that the flipflop is an output flop packed in the IOB so that the flop output -> pin delay will be short and more deterministic. You can use a directive in the HDL to ensure the flop is in an IOB. You can also set a constraint for the max delay out; otherwise the tools assume they have an entire clock period to get the signal from the flop to the pin. If the input is really asynchronous, you really ought to use a 2-flop synchronizer.

Some of the Xilinx parts have a separate column in the datasheet for a lower core voltage, which saves power but degrades timing. I definitely wouldn\'t try increasing the voltage to beyond the spec. It might work, for a while... The best way to get better speed would probably be to ensure that the junction temperature stays low.
 
On Sun, 27 Dec 2020 23:12:15 +0100, Gerhard Hoffmann <dk4xp@arcor.de>
wrote:

Am 27.12.20 um 21:50 schrieb jlarkin@highlandsniptechnology.com:
OK, pointy-haired boss question.

Given a ZYNQ 7020, speed grade 1. A 3.3 volt i/o bank gets a clock
from an LVDS input. We have a resync flop in an i/o cell, clocked by
this, with a D input from somewhere. Output is the strongest/fastest
3.3 volt option.

Maybe LVDS is not the way to go. The LV means they want it somewhat
fast, but power still does matter. The game is different with LVPECL
or CML.

The clock into the FPGA is from a differential PECL comparator, so
using an LVDS input makes sense. The output will be 3.3v cmos.

Rumor has it that, in general, lvds i/o is about a ns faster than
cmos.


About what would be the typical prop delay from the clock to the
output pin?

Online search yields a lot of words and no numbers. Experts say useful
things like \"it depends.\" The tools apparently give a range of timings
over worst-case supply voltage, process, and temperature that vary by
about 4:1 with no typical.

Because it really depends.

That\'s the standard answer: it depends.

There are a zillion appnotes and class notes, none of which include
the word \"nanosecond.\"





The clock could be routed in a thousand
different ways on the chip. It is usually best to use one of the
typically 4 global clock nets, but there may be local ones that are
placed nicely to your outputs.
And avoid tri-state buffers. They are sloooow, alone for the logic
they contain.

I have not used Zyncs, but mostly Virtexes.

The canonic way to get short clock to out delays would be to use
a global clock net without any logic in front of it and then write
a constraints file with the specs you need. Leave the work to the
router.
Don\'t be too greedy in the first round, you can put on the
thumb screws later. First make sure that what you spec is
that what you want. Sometimes is is not intuitive to specify that.

You can see in the static timing verifier where the ps are lost
and where work for improvements is futile.



Second question: has anyone ever pushed an FPGA core voltage up to get
more speed? In one little test I did, on an Artix 7, a simple case
changed chip prop delay by 1 ns, from about 8.5 to about 7.5 ns, with
a 70 mV core supply increase. That delay was essentially all
combinational.

Never tried that. But I have built pipelines 24 stages deep.
Do less combinatorial stuff in one stage and start early/parallel
enough.

I want my output to transition immediately after the first clock edge.
Or maybe before.


--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.
 

Welcome to EDABoard.com

Sponsor

Back
Top