pipelined divider

Y

ykagarwal

Guest
would like to know which is the best algorithm to
make a pipelined divider in hardware. newton raphson,
goldshmit .. srt(is it possible?)
if i have space as much as to have as much as 5 radix-4
srt dividers in a xilinx v2 fpga..

thanks in advance--
 
"ykagarwal" <yog_aga@yahoo.co.in> wrote in message
news:4d05e2c6.0309090919.261490a1@posting.google.com...
would like to know which is the best algorithm to
make a pipelined divider in hardware. newton raphson,
goldshmit .. srt(is it possible?)
if i have space as much as to have as much as 5 radix-4
srt dividers in a xilinx v2 fpga..
Pipelined dividers have been used on machines like the IBM 360/91 and the
Cray-1, and are well described in pipelined computer architecture books for
many years after those machines were built.

Though in both cases they are used for floating point, where the
requirements are different. The 360/91, for example, rounds the low bit
instead of truncating as the architecture specifies, and would be usual in
fixed point. I don't know how hard that would be to change.

-- glen
 
"Glen Herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message news:<F7q7b.408266$uu5.74285@sccrnsc04>...
"ykagarwal" <yog_aga@yahoo.co.in> wrote in message
news:4d05e2c6.0309090919.261490a1@posting.google.com...
would like to know which is the best algorithm to
make a pipelined divider in hardware. newton raphson,
goldshmit .. srt(is it possible?)
if i have space as much as to have as much as 5 radix-4
srt dividers in a xilinx v2 fpga..

Pipelined dividers have been used on machines like the IBM 360/91 and the
Cray-1, and are well described in pipelined computer architecture books for
many years after those machines were built.

Though in both cases they are used for floating point, where the
requirements are different. The 360/91, for example, rounds the low bit
instead of truncating as the architecture specifies, and would be usual in
fixed point. I don't know how hard that would be to change.

-- glen
well my requirement is too for double precision .. would u like to
suggest me a pipelined
comp arch book for this purpose.. anyway what is the best way, that's
what i want to explore first.

Xilinx coregen divider core doesn't offer that much width in its
pipelined divider .. don't know why
may be xilinx gurus can justify .. anybody knows which algorithm they
are using ?

regards
--yka
 
Look up online arithmetic.

Steve

well my requirement is too for double precision .. would u like to
suggest me a pipelined
comp arch book for this purpose.. anyway what is the best way, that's
what i want to explore first.

Xilinx coregen divider core doesn't offer that much width in its
pipelined divider .. don't know why
may be xilinx gurus can justify .. anybody knows which algorithm they
are using ?

regards
--yka
 
"ykagarwal" <yog_aga@yahoo.co.in> wrote in message
news:4d05e2c6.0309092246.2ead33f0@posting.google.com...

(snip regarding pipelined divider)

well my requirement is too for double precision .. would u like to
suggest me a pipelined
comp arch book for this purpose.. anyway what is the best way, that's
what i want to explore first.
The one I have here is "The Architecture of Pipelined Computers" by Kogge.

Xilinx coregen divider core doesn't offer that much width in its
pipelined divider .. don't know why
may be xilinx gurus can justify .. anybody knows which algorithm they
are using ?
I don't know that, either. It might be because they didn't imagine anyone
wanting to put something like that into an FPGA. They are likely pretty
big, but in some cases it might be worth the size.

-- glen
 
Ray Andraka <ray@andraka.com> wrote in message news:<3F6B8B64.F3EBCA2B@andraka.com>...
Depends on how clever the designer is. I'd wager that better than 95%t of the
hardware engineers today couldn't design the 360/91 from scratch with 10 times
the logic resources of the original.

Glen Herrmannsfeldt wrote:


I do wonder how many Virtex devices it would take to implement a 360/91.

-- glen

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
hello,

360/91 machine and associated history is really an inspiration to
younger designers like me ..
and your comments too :)


unnecessarily jumped
--yka
 
Changing times... Logic resources are cheap compared to a designer's
time (and time to market considerations). Same argument can be made
with software. How many current software engineers could write a full
game (or complete programming language) that fits on an 8kbyte
cartridge?

It's certainly an interesting question.

Jake


Ray Andraka <ray@andraka.com> wrote in message news:<3F6B8B64.F3EBCA2B@andraka.com>...
Depends on how clever the designer is. I'd wager that better than 95%t of the
hardware engineers today couldn't design the 360/91 from scratch with 10 times
the logic resources of the original.

Glen Herrmannsfeldt wrote:


I do wonder how many Virtex devices it would take to implement a 360/91.

-- glen

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
But my PC which runs the latest version of the CAD tools I brought 10 years
ago.. has the power dissipation of a small heater.. and the software runs
slower.. good thing I don't live in California where there's not enough
power :)

Simon

"Jake Janovetz" <jakespambox@yahoo.com> wrote in message
news:d6ad3144.0309201026.78874571@posting.google.com...
Changing times... Logic resources are cheap compared to a designer's
time (and time to market considerations). Same argument can be made
with software. How many current software engineers could write a full
game (or complete programming language) that fits on an 8kbyte
cartridge?

It's certainly an interesting question.

Jake


Ray Andraka <ray@andraka.com> wrote in message
news:<3F6B8B64.F3EBCA2B@andraka.com>...
Depends on how clever the designer is. I'd wager that better than 95%t
of the
hardware engineers today couldn't design the 360/91 from scratch with 10
times
the logic resources of the original.

Glen Herrmannsfeldt wrote:


I do wonder how many Virtex devices it would take to implement a
360/91.

-- glen

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
 
passing thought ~~~

there exists one ultimate natural machine,
design of which can't even be copied :)

philosophy is a junk isn't it.
--yka
 
Not yet, anyway.

ykagarwal wrote:

there exists one ultimate natural machine,
design of which can't even be copied :)
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin
Franklin, 1759
 
"Glen Herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message news:<HHM7b.410438$Ho3.64641@sccrnsc03>...
"ykagarwal" <yog_aga@yahoo.co.in> wrote in message
news:4d05e2c6.0309092246.2ead33f0@posting.google.com...

(snip regarding pipelined divider)

well my requirement is too for double precision .. would u like to
suggest me a pipelined
comp arch book for this purpose.. anyway what is the best way, that's
what i want to explore first.

The one I have here is "The Architecture of Pipelined Computers" by Kogge.

Xilinx coregen divider core doesn't offer that much width in its
pipelined divider .. don't know why
may be xilinx gurus can justify .. anybody knows which algorithm they
are using ?

I don't know that, either. It might be because they didn't imagine anyone
wanting to put something like that into an FPGA. They are likely pretty
big, but in some cases it might be worth the size.

-- glen
fine, thanks i cud find the book (bit old edition probably)
here but there is no detail abt pipelined divider as such ..
anyway if somebody comes across the thing may suggest.
and xilinx probably shud give a sequential version at least for
larger width
(i've made it anyway)

--yka
 
Check these IEEE references:

Efficient designs of unified 2's complement division and square root
algorithm and architecture
Sau-Gee Chen; Chieh-Chih Li;
TENCON '94. IEEE Region 10's Ninth Annual International Conference.
Theme: 'Frontiers of Computer Technology'. Proceedings of 1994 , 22-26
Aug. 1994
Page(s): 943 -947 vol.2

A new pipelined divider with a small lookup table
Jong-Chul Jeong; Woong Jeong; Hyun-Jae Woo; Seung-Ho Kwak; Woo-Chan
Park; Moon-Key Lee; Tak-don Han;
ASIC, 2002. Proceedings. 2002 IEEE Asia-Pacific Conference on , 6-8
Aug. 2002
Page(s): 33 -36


Efficient semisystolic architectures for finite-field arithmetic
Jain, S.K.; Song, L.; Parhi, K.K.;
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on ,
Volume: 6 Issue: 1 , March 1998
Page(s): 101 -113
 
"ykagarwal" <yog_aga@yahoo.co.in> wrote in message
news:4d05e2c6.0309110200.71793e02@posting.google.com...

(snip)

fine, thanks i cud find the book (bit old edition probably)
here but there is no detail abt pipelined divider as such ..
anyway if somebody comes across the thing may suggest.
and xilinx probably shud give a sequential version at least for
larger width
(i've made it anyway)
The references for the 360/91 are to the IBM Research and Development
Journal, I believe Vol. 11.,
January 1967.

-- glen
 
soar2morrow@yahoo.com (Tom Seim) wrote in message news:<6c71b322.0309111000.5458aeee@posting.google.com>...
Check these IEEE references:

Efficient designs of unified 2's complement division and square root
algorithm and architecture
Sau-Gee Chen; Chieh-Chih Li;
TENCON '94. IEEE Region 10's Ninth Annual International Conference.
Theme: 'Frontiers of Computer Technology'. Proceedings of 1994 , 22-26
Aug. 1994
Page(s): 943 -947 vol.2

A new pipelined divider with a small lookup table
Jong-Chul Jeong; Woong Jeong; Hyun-Jae Woo; Seung-Ho Kwak; Woo-Chan
Park; Moon-Key Lee; Tak-don Han;
ASIC, 2002. Proceedings. 2002 IEEE Asia-Pacific Conference on , 6-8
Aug. 2002
Page(s): 33 -36


Efficient semisystolic architectures for finite-field arithmetic
Jain, S.K.; Song, L.; Parhi, K.K.;
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on ,
Volume: 6 Issue: 1 , March 1998
Page(s): 101 -113
thanks for the pointers .. i have found some of them. looking into the
NR and its variants .. whether it's possible to fit it into some 3000 slices
in virtex-ii .. may be i'll have to increase no of iteration per div step ..
 
"ykagarwal" <yog_aga@yahoo.co.in> wrote in message
news:4d05e2c6.0309112255.3dbc30e4@posting.google.com...
soar2morrow@yahoo.com (Tom Seim) wrote in message
news:<6c71b322.0309111000.5458aeee@posting.google.com>...
Check these IEEE references:

Efficient designs of unified 2's complement division and square root
algorithm and architecture
Sau-Gee Chen; Chieh-Chih Li;
TENCON '94. IEEE Region 10's Ninth Annual International Conference.
Theme: 'Frontiers of Computer Technology'. Proceedings of 1994 , 22-26
Aug. 1994
Page(s): 943 -947 vol.2

A new pipelined divider with a small lookup table
Jong-Chul Jeong; Woong Jeong; Hyun-Jae Woo; Seung-Ho Kwak; Woo-Chan
Park; Moon-Key Lee; Tak-don Han;
ASIC, 2002. Proceedings. 2002 IEEE Asia-Pacific Conference on , 6-8
Aug. 2002
Page(s): 33 -36


Efficient semisystolic architectures for finite-field arithmetic
Jain, S.K.; Song, L.; Parhi, K.K.;
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on ,
Volume: 6 Issue: 1 , March 1998
Page(s): 101 -113

thanks for the pointers .. i have found some of them. looking into the
NR and its variants .. whether it's possible to fit it into some 3000
slices
in virtex-ii .. may be i'll have to increase no of iteration per div step
...

The 360/91 was built from transistors glued onto ceramic substrates, and
wired together. It did double precision floating point divide in 18 clock
cycles, though. I think it is three clock cycles per iteration, so six
iterations.

I do wonder how many Virtex devices it would take to implement a 360/91.

-- glen
 
"Glen Herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message news:<bXf8b.419911$YN5.284114@sccrnsc01>...
"ykagarwal" <yog_aga@yahoo.co.in> wrote in message
news:4d05e2c6.0309112255.3dbc30e4@posting.google.com...
soar2morrow@yahoo.com (Tom Seim) wrote in message
news:<6c71b322.0309111000.5458aeee@posting.google.com>...
Check these IEEE references:

Efficient designs of unified 2's complement division and square root
algorithm and architecture
Sau-Gee Chen; Chieh-Chih Li;
TENCON '94. IEEE Region 10's Ninth Annual International Conference.
Theme: 'Frontiers of Computer Technology'. Proceedings of 1994 , 22-26
Aug. 1994
Page(s): 943 -947 vol.2

A new pipelined divider with a small lookup table
Jong-Chul Jeong; Woong Jeong; Hyun-Jae Woo; Seung-Ho Kwak; Woo-Chan
Park; Moon-Key Lee; Tak-don Han;
ASIC, 2002. Proceedings. 2002 IEEE Asia-Pacific Conference on , 6-8
Aug. 2002
Page(s): 33 -36


Efficient semisystolic architectures for finite-field arithmetic
Jain, S.K.; Song, L.; Parhi, K.K.;
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on ,
Volume: 6 Issue: 1 , March 1998
Page(s): 101 -113

thanks for the pointers .. i have found some of them. looking into the
NR and its variants .. whether it's possible to fit it into some 3000
slices
in virtex-ii .. may be i'll have to increase no of iteration per div step
..

The 360/91 was built from transistors glued onto ceramic substrates, and
wired together. It did double precision floating point divide in 18 clock
cycles, though. I think it is three clock cycles per iteration, so six
iterations.

I do wonder how many Virtex devices it would take to implement a 360/91.

-- glen
hello,
just curious how much hardware did ur implementation take ?

thinking now of 3rd/4th order NR with 14/11 bit lut approximation with
unrolled loop (not independent sqr cubing units) .. giving a fully
pipelined thing with some tolerable latency don't know
whether it will fit.
 

Welcome to EDABoard.com

Sponsor

Back
Top