Design of a Router

O

O. Olson

Guest
Hi,

I am not sure if I am explaining this correctly. And, yes this is HW
(or school project work).

I need to design a four way router in Verilog i.e. four inputs and
outputs. (Actually it is a five way router – because one set of inputs
and outputs would connect to the “internal node”). This is for an FPGA
on a set/mesh of FPGAs if you want it that way. Right now I am not too
much concerned about Speed or Area – but optimizing for these would be
good and I would be looking at those in my final version. Any ideas on
how to design this? I would be grateful if there is something free out
there.

My plans for now:
1. From the above I have 5 inputs and 5 output ports. I am going to
have each of these ports have 3 signals i.e. one to carry the actual
data (having a certain port width i.e. a Bus) and two others for
signaling:
• Valid Signal – from Sender to Receiver indicating the data is valid
• Stop Signal – from Receiver to Sender indicating to stop
transmitting the data (possibly due to a temporary congestion)
2. The data would be transmitted by fixed size packets. I intend to
use worm-hole routing i.e. a long packet can block other packets.
3. Right now I am not concerned about those very special cases of
deadlocks. I would look at them once I get a basic implementation to
work.
4. Each node/router would have an ID. The first two elements of the
packet would contain the destination Node ID. From the destination ID
each router would know where to forward the packet i.e. up, down, left
or right.

My Questions/Problems:
1. How do I handle the case when two packets arrive at two input ports
of a router – but want to go to the same output? I am not concerned
about the delay i.e. temporarily blocking one of them. I would like to
know how do I save this information that one of them has been blocked
and to start it later. I would prefer to do some sort of round-robin –
but how do I do this in Verilog? (That would preferably be fast and
synthesize into a small area?)
2. I might need to buffer some of the inputs – I would be grateful if
you could show me some sort of a way to implement a buffer in Verilog,
that would not have a big penalty during synthesis.

If I was unclear in any of the above let me know. Also let me know if
you want to see my code that I have so far. I thought that this might
be unnecessary for now.

Thanks to you guys for your help.
O.O.
 
O. Olson wrote:

1. How do I handle the case when two packets arrive at two input ports
of a router – but want to go to the same output?
arbitration logic and a sync fifo.

2. I might need to buffer some of the inputs – I would be grateful if
you could show me some sort of a way to implement a buffer in Verilog,
that would not have a big penalty during synthesis.
a sync fifo
 
On Jul 1, 10:34 am, Mike Treseler <mike_trese...@comcast.net> wrote:
arbitration logic and a sync fifo.

a sync fifo
Thanks Mike. Could you provide me some pointers on how to implement
his?
O.O.
 
On Jun 30, 8:00 pm, "O. Olson" <olson_...@yahoo.it> wrote:
Hi,

        I am not sure if I am explaining this correctly. And, yes this is HW
(or school project work).

        I need to design a four way router in Verilog i.e. four inputs and
outputs. (Actually it is a five way router – because one set of inputs
and outputs would connect to the “internal node”). This is for an FPGA
on a set/mesh of FPGAs if you want it that way. Right now I am not too
much concerned about Speed or Area – but optimizing for these would be
good and I would be looking at those in my final version. Any ideas on
how to design this? I would be grateful if there is something free out
there.

My plans for now:
1.      From the above I have 5 inputs and 5 output ports. I am going to
have each of these ports have 3 signals i.e. one to carry the actual
data (having a certain port width i.e. a Bus) and two others for
signaling:
•     Valid Signal – from Sender to Receiver indicating the data is valid
•     Stop Signal – from Receiver to Sender indicating to stop
transmitting the data (possibly due to a temporary congestion)
2.      The data would be transmitted by fixed size packets. I intend to
use worm-hole routing i.e. a long packet can block other packets.
3.      Right now I am not concerned about those very special cases of
deadlocks. I would look at them once I get a basic implementation to
work.
4.      Each node/router would have an ID. The first two elements of the
packet would contain the destination Node ID. From the destination ID
each router would know where to forward the packet i.e. up, down, left
or right.

My Questions/Problems:
1.      How do I handle the case when two packets arrive at two input ports
of a router – but want to go to the same output? I am not concerned
about the delay i.e. temporarily blocking one of them. I would like to
know how do I save this information that one of them has been blocked
and to start it later. I would prefer to do some sort of round-robin –
but how do I do this in Verilog? (That would preferably be fast and
synthesize into a small area?)
2.      I might need to buffer some of the inputs – I would be grateful if
you could show me some sort of a way to implement a buffer in Verilog,
that would not have a big penalty during synthesis.

If I was unclear in any of the above let me know. Also let me know if
you want to see my code that I have so far. I thought that this might
be unnecessary for now.

Thanks to you guys for your help.
O.O.

You might want to look into the issue you consider as "will deal with
them later" now as it might make you decide you need to redesign
everything from scratch.

For example you can make a real simple and "dump" router where you
issue stop to all ports and than open one at a time and if it have
packet forward it and than close the port and move to the next and so
on.
really not efficient but probably the simplest from design and least
code but also least amount of data it can handle and effective BW.

On the other scale you can have Buffer as big as worse time each port
can block and big enough to compensate and desired burst which can be
most effcient with max BW but much more logic and bigger buffers.

Also those buffer if big enough might require you to use external type
of memorey especialy if you don't want to use huge FPGA just to get
enough memorey blocks.

the firs option you might even be able to handle in CPLD.

Also when you say two element will have ID since you use the word
element and not bits even though 4 port can be 2 bits it might
indicate that behind each port might sit several ID's and so for
example all packet with destination 1 2 3 5 7 8 and 54 should go to
port 2 and in such case you will need also some sort of lookup table
and the size of it depend on what does those two element mean.

also how do you learn/fill this lookup table, do you get it "pre
programed" or do you learn it so when packet arrive with source let
say 50 in port 1 you learn and from that time you send all packet with
destination 50 to port 1.

in this case what should happen when packet have destination which is
not in your lookup, do you broadcast it to all ports ?
in such case you now need to be able to handle broadcasting while some
port might be busy and some not.

in routing do yo need to supprt for example simultanus sending from 1-
2 and 3->4 or when one packe is trasfer all other ports can ? need ?
to be waiting ?

You might therefore want to first define better all condition might be
less "exciting" than writing quickly some verilog line but writing any
verilog before knowing what you want will most likely result in
spending more time overall.

have fun

BW.
 
Thanks Berty for your detailed post. I am sorry I did not notice it
earlier because I was busy with other things.

On Jul 1, 10:50 pm, Berty <wooster.be...@gmail.com> wrote:
You might want to look into the issue you consider as "will deal with
them later" now as it might make you decide you need to redesign
everything from scratch.

For example you can make a real simple and "dump" router where you
issue stop to all ports and than open one at a time and if it have
packet forward it and than close the port and move to the next and so
on.
really not efficient but probably the simplest from design and least
code but also least amount of data it can handle and effective BW.
Yes, I am not looking for the “simple and dump” router that you
describe. I would like to include as much efficiency as possible
without making it too complicated to design and especially to debug. I
would ideally like the router as small and as fast as possible. But
this may make the design too complicated.

On the other scale you can have Buffer as big as worse time each port
can block and big enough to compensate and desired burst which can be
most effcient with max BW but much more logic and bigger buffers.

Also those buffer if big enough might require you to use external type
of memorey especialy if you don't want to use huge FPGA just to get
enough memorey blocks.
I have not yet taken a decision on how large the buffers in the router
are going to be – but I think I would be tweaking this in the future.


the firs option you might even be able to handle in CPLD.

Also when you say two element will have ID since you use the word
element and not bits even though 4 port can be 2 bits it might
indicate that behind each port might sit several ID's and so for
example all packet with destination 1 2 3 5 7 8 and 54 should go to
port 2 and in such case you will need also some sort of lookup table
and the size of it depend on what does those two element mean.

also how do you learn/fill this lookup table, do you get it "pre
programed" or do you learn it so when packet arrive with source let
say 50 in port 1 you learn and from that time you send all packet with
destination 50 to port 1.

in this case what should happen when packet have destination which is
not in your lookup, do you broadcast it to all ports ?
in such case you now need to be able to handle broadcasting while some
port might be busy and some not.

in routing do yo need to supprt for example simultanus sending from 1->2 and 3->4 or when one packe is trasfer all other ports can ? need ?

to be waiting ?

You might therefore want to first define better all condition might be
less "exciting" than writing quickly some verilog line but writing any
verilog before knowing what you want will most likely result in
spending more time overall.

have fun

BW.

You have raised good questions regarding the routing – so I would need
to describe more details regarding my plans – to help you understand
my problems. This description may be a bit unclear because I don’t
have much experience in this area as yet.

For now I assume that this would be for a FPGA, and this would be a
mesh of routers connected to each other horizontally and vertically.
Each router would know its own coordinates. When an incoming packet
comes with the coordinates of the destination node – the current
router knows if the destination node is left, right, top or bottom of
it. It can correspondingly forward the packet in that direction.
I am assuming that first the X direction would be transversed and
then the Y to keep this simple.

The packets would not be completely buffered. They would only be
buffered sufficiently in order that their header containing the
destination node coordinates can be read. They would only be buffered
to the extent that I am assuming wormhole routing so there can be
deadlocks in this scheme.

In addition to my description of the interconnecting signals in my
previous post, I would like to say that the “data” bus would be of
width 4 or 8 depending upon my needs. This would be set via a
parameter in my code.

My problem is how do I handle synchronization at the output ports i.e.
how do I handle two input ports wanting to send their packets through
the same output port. As mentioned above I can use the stop signal to
signal to the other routers to stop transmitting the data – this would
eliminate the need for buffering. My question is how to ensure that
two input ports do not write at the same time? i.e. some kind of
synchronization?

Thanks a lot.
O.O.
 
O. Olson wrote:

Thanks Mike. Could you provide me some pointers on how to implement
http://yuba.stanford.edu/cs344_public/docs/CS344%20Intro.ppt
 
On Jul 4, 9:11 pm, "O. Olson" <olson_...@yahoo.it> wrote:
Thanks Berty for your detailed post. I am sorry I did not notice it
earlier because I was busy with other things.

On Jul 1, 10:50 pm, Berty <wooster.be...@gmail.com> wrote:

You might want to look into the issue you consider as "will deal with
them later" now as it might make you decide you need to redesign
everything from scratch.

For example you can make a real simple and "dump" router where you
issue stop to all ports and than open one at a time and if it have
packet forward it and than close the port and move to the next and so
on.
really not efficient but probably the simplest from design and least
code but also least amount of data it can handle and effective BW.

Yes, I am not looking for the “simple and dump” router that you
describe. I would like to include as much efficiency as possible
without making it too complicated to design and especially to debug. I
would ideally like the router as small and as fast as possible. But
this may make the design too complicated.



On the other scale you can have Buffer as big as worse time each port
can block and big enough to compensate and desired burst which can be
most effcient with max BW but much more logic and bigger buffers.

Also those buffer if big enough might require you to use external type
of memorey especialy if you don't want to use huge FPGA just to get
enough memorey blocks.

I have not yet taken a decision on how large the buffers in the router
are going to be – but I think I would be tweaking this in the future.





the firs option you might even be able to handle in CPLD.

Also when you say two element will have ID since you use the word
element and not bits even though 4 port can be 2 bits it might
indicate that behind each port might sit several ID's and so for
example all packet with destination 1 2 3 5 7 8 and 54 should go to
port 2 and in such case you will need also some sort of lookup table
and the size of it depend on what does those two element mean.

also how do you learn/fill this lookup table, do you get it "pre
programed" or do you learn it so when packet arrive with source let
say 50 in port 1 you learn and from that time you send all packet with
destination 50 to port 1.

in this case what should happen when packet have destination which is
not in your lookup, do you broadcast it to all ports ?
in such case you now need to be able to handle broadcasting while some
port might be busy and some not.

in routing do yo need to supprt for example simultanus sending from 1->2 and 3->4 or when one packe is trasfer all other ports can ? need ?

to be waiting ?

You might therefore want to first define better all condition might be
less "exciting" than writing quickly some verilog line but writing any
verilog before knowing what you want will most likely result in
spending more time overall.

have fun

   BW.

You have raised good questions regarding the routing – so I would need
to describe more details regarding my plans – to help you understand
my problems. This description may be a bit unclear because I don’t
have much experience in this area as yet.

For now I assume that this would be for a FPGA, and this would be a
mesh of routers connected to each other horizontally and vertically.
Each router would know its own coordinates. When an incoming packet
comes with the coordinates of the destination node – the current
router knows if the destination node is left, right, top or bottom of
it. It can correspondingly forward the packet in that direction.
        I am assuming that first the X direction would be transversed and
then the Y to keep this simple.

        The packets would not be completely buffered. They would only be
buffered sufficiently in order that their header containing the
destination node coordinates can be read. They would only be buffered
to the extent that I am assuming wormhole routing so there can be
deadlocks in this scheme.

In addition to my description of the interconnecting signals in my
previous post, I would like to say that the “data” bus would be of
width 4 or 8 depending upon my needs. This would be set via a
parameter in my code.

My problem is how do I handle synchronization at the output ports i.e.
how do I handle two input ports wanting to send their packets through
the same output port. As mentioned above I can use the stop signal to
signal to the other routers to stop transmitting the data – this would
eliminate the need for buffering. My question is how to ensure that
two input ports do not write at the same time? i.e. some kind of
synchronization?

Thanks a lot.
O.O.- Hide quoted text -

- Show quoted text -

First a side note - I seem to recall a part from National though it
was I believe a cross connect and National use some nice X-Y Scheme
where when the packet pass through the X asix every part deduct one
from it addr and similar on the Y and when X and Y both show 0 the
part new the packet belong to him.

As for handling two ports sending to the same destination than in the
complet generic solution both source can have infinit number of packet
to send which mean that the buffer you pick will simple delay the
stop.

On the other hand if you know that realstic let say a port A send to
port B no more than consectutive 10 packet size X or that this is what
you want to allow before you issue a stop than this will determine the
size of your buffer (just keep in mind that if you have 4 ports than
you can have up to 3 (or 4 depened on your system such as do you
supposr A send to A maybe even for loopback purpose.) that target the
same output.

Since you say thay the X and Y wll be in the header you still need to
decide how you learn the system to know to where to forward.

keep in mind that the table most likely will not be the same on all
the routers.

for example if packet come into the syetm and need to go first router
LEFT next router also LEFT and than third router UP.
even though the same packet go though all this 3 routers and have the
same header the forwarding of the first two router was LEFT while the
third was UP.

Either way regardless of the packet forwarding table and learning
mechnisem the classic router will be usualy base on one of the two :

1. buffer on each of the input which will help store at least the
beggining of the packet while the header is extract and get parse and
the destination is decided.
than the packet will be routed to the output buffer which will send
the packet out.
the input buffer will help you have some limit capability to not use
the stop when destination is full while the output buffer will have
some limited prevention to "head of the line" problem.
In addition you can have let say in your case for symplicity 4 buffer
for each output so each input can write to the output and than some
sort of arbiter base on round robin, priority, BW etc etc read from
this buffers. also this output buffer will signal the input should a
stop be generated and when.


2. more smarter system the packet as it come will be stored on big
poll of memorey usualy external and the header plus a desctiptor if
where the packet was stored will be send to the parse which will than
send the descriptor to the output processor which will read the packet
from the memorey.
this require memorey handler to keep track of what memorey part are
used and not but it also give you the capability to have smaller
overall memorey assume in real system not all the ports send packet
all the time.

The second case have many advantage which also inculde being able to
see more of the system in order to decide if you descard packet or
drop a packet that hold the system after certain time such as some of
the early discard mechnisem and so on.

the second option while superior might be a bit harder to do so you
might want to simple hav 16 buffers to help you will full mesh
(including loopback) where each input get the packet and as soon as it
figure where it should go will write to buffer output let say 1a,1b,1c,
1d. similar second input will write to 2a,2b,2c,2c and so on.
on the output you will have simple round robin arbiter that go for
exaample for output 1 it will arbiter between buffer 1a,2a,3a and 4a
and send packet if there is something or move on to next. (Don't
forget to inculde gap if packet come with gap as when you read from
the output buffer the gap is not stored).
More over base on the size of the buffer when there is space to only
one more packet you issue a stop where input 1 will look on buffer 1a,
1b,1c,1d and if any you write the last packet you issue stop at the
end of the packet and release as soon as they all have at least one
packet space.
This will give you reletive simple solution with only buffer on the
output and a delay on the input to extract and parse the header and
you still support mesh , loopback and also simultaneus trasfer between
port that are not use such as 1->2 and 3->4 on the same time.

as for being 4 bit or 8 bit you might want to consider design base on
8 bit and than have a small interface on the input and output than
when the input is 8 bit just let it go while if it is 4 bit you get
the data of two 4 bit make an 8 bit of it and send it into your
system.

This will require some data enable bit which in 4 bit will be
toggeling but it will give you more generic solution for both case.
also you need to see if in 4 bit the data is not byte aligned you will
need maybe to have 2 bit of data enable to handle the last data.

Have fun

BW
 
Thank you for your detailed post Berty. I think I would need to read
this more than once – to ensure that I don’t misunderstand anything. I
am just about to sleep right now – and I would let you know if I have
any questions once I get back home tomorrow evening.

Thanks again.
O.O.
 
Thanks for your detailed post Berty.

On Jul 6, 2:56 pm, Berty <wooster.be...@gmail.com> wrote:
First a side note - I seem to recall a part from National though it
was I believe a cross connect and National use some nice X-Y Scheme
where when the packet pass through the X asix every part deduct one
from it addr and similar on the Y and when X and Y both show 0 the
part new the packet belong to him.
I think I am intending to do something similar to this. However
instead of a subtraction I would be doing a comparison with the X, Y
coordinates of the current router to determine where the packet would
go.

As for handling two ports sending to the same destination than in the
complet generic solution both source can have infinit number of packet
to send which mean that the buffer you pick will simple delay the
stop.

On the other hand if you know that realstic let say a port A send to
port B no more than consectutive 10 packet size X or that this is what
you want to allow before you issue a stop than this will determine the
size of your buffer (just keep in mind that if you have 4 ports than
you can have up to 3 (or 4 depened on your system such as do you
supposr A send to A maybe even for loopback purpose.) that target the
same output.

Since you say thay the X and Y wll be in the header you still need to
decide how you learn the system to know to where to forward.

keep in mind that the table most likely will not be the same on all
the routers.

for example if packet come into the syetm and need to go first router
LEFT next router also LEFT and than third router UP.
even though the same packet go though all this 3 routers and have the
same header the forwarding of the first two router was LEFT while the
third was UP.

Either way regardless of the packet forwarding table and learning
mechnisem the classic router will be usualy base on one of the two :

1. buffer on each of the input which will help store at least the
beggining of the packet while the header is extract and get parse and
the destination is decided.
than the packet will be routed to the output buffer which will send
the packet out.
the input buffer will help you have some limit capability to not use
the stop when destination is full while the output buffer will have
some limited prevention to "head of the line" problem.
In addition you can have let say in your case for symplicity 4 buffer
for each output so each input can write to the output and than some
sort of arbiter base on round robin, priority, BW etc etc read from
this buffers. also this output buffer will signal the input should a
stop be generated and when.

2. more smarter system the packet as it come will be stored on big
poll of memorey usualy external and the header plus a desctiptor if
where the packet was stored will be send to the parse which will than
send the descriptor to the output processor which will read the packet
from the memorey.
this require memorey handler to keep track of what memorey part are
used and not but it also give you the capability to have smaller
overall memorey assume in real system not all the ports send packet
all the time.
I think I would be going with option I as compared to option II,
because I feel that it would be easier to implement. I might keep
option II for the next version or something like that.

Thanks again for your input. I would start coding this over the next
week or two and would post further questions to the Verilog groups. I
hope you would be still around.
O.O.
 

Welcome to EDABoard.com

Sponsor

Back
Top