A question on simulating concurrent processes

J

Jie Zhang

Guest
The following sentence is taken from page 73 of "The Verilog Hardware
Description Language, third edition"

.... the simulator is simulating concurrent processes in a sequential
manner and only switching between simulating the concurrent processes
when a wait for a FALSE condition, delay, or event control is encountered.

Is this behavior of simulator defined in the language spec of verilog?
Or can we have a verilog simulator using time slicing to schedule the
concurrent processes?

Thanks.

Jie
 
Jie:
This behavior is not defined by verilog - but is a practical issue. If
you had multiple processors you could simulate
concurrent processes concurrently ( assuming you could sequence them
correctly. - I'm sure this is done by some.. it can't be that hard)
This is a big issue relating to the use of blocking and un-blocking
assignments.
You need to use non-blocking assignments that trigger "concurrently" off an
event
(like a clock edge) to correctly model concurrent processes -- or the
arbitrary execution order of the simulator could generate
very strange results.
--
bj Porcella
http://pages.sbcglobal.net/bporcella/
"Jie Zhang" <zhangjie@magima.com.cn> wrote in message
news:c1mb38$1n7b$1@mail.cn99.com...
The following sentence is taken from page 73 of "The Verilog Hardware
Description Language, third edition"

... the simulator is simulating concurrent processes in a sequential
manner and only switching between simulating the concurrent processes
when a wait for a FALSE condition, delay, or event control is encountered.

Is this behavior of simulator defined in the language spec of verilog?
Or can we have a verilog simulator using time slicing to schedule the
concurrent processes?

Thanks.

Jie
 
Jie Zhang <zhangjie@magima.com.cn> wrote in message news:<c1mb38$1n7b$1@mail.cn99.com>...
Is this behavior of simulator defined in the language spec of verilog?
No. The language spec specifically allows processes to be suspended in
favor of other processes in the simulation.

Or can we have a verilog simulator using time slicing to schedule the
concurrent processes?
You can. It is hard to see why you would want to. It would make
the simulation less efficient by adding unnecessary context switches.
It could make simulation behavior nonrepeatable in the presence of
race conditions. And since the concurrent processes share data, you
would have to deal with "critical regions" where it was not safe to
context switch between them.

While time slicing on a single processor is pointless, it has often
been suggested that running concurrent processes in parallel on a
multiprocessor system might provide increased performance. It sounds
good in theory, but doesn't actually help in practice.
 
sharp@cadence.com (Steven Sharp) wrote in message news:<3a8e124e.0402271520.2ad55f61@posting.google.com>...
Jie Zhang <zhangjie@magima.com.cn> wrote in message news:<c1mb38$1n7b$1@mail.cn99.com>...
Steven Sharp wrote

While time slicing on a single processor is pointless, it has often
been suggested that running concurrent processes in parallel on a
multiprocessor system might provide increased performance. It sounds
good in theory, but doesn't actually help in practice.
If you are refering to the inner workings of HDL simulation on a
single cpu, then slicing the processes beyond what the compiler has
already done is pointless indeed, but then again maybe not.

If the cpu its running on is singlethreaded and all ops proceed
through the cpu without interference and hazards, then thats fine. But
if the cpu is highly threaded as is likely to be the case for most all
high perf future cpus (and can switch HT threads in 0 time), then
what? Will you be supporting that or will you just use 1 of the
threads and coast along at 1/n of the cycles. If you have N actual
cpus that do not have process communication in the CSP model, thats
almost the same as having a simulation of a single cpu but N times
faster but with N threads. More likely those N cpus will also be n way
threaded too.



If you were refering to the general case of large parallel SW problems
and not HDL internals, then I would have to disagree with you. The
Transputer model and Occam demonstrated quite well how to divide large
problems across multiple cpus. It helped alot if the process activity
was well understood and balanced across the cpus fairly and that the
messaging between cpus reflect local "wiring" between processes. If an
arbitrary parallel SW is partitioned badly across multiple cpus or HW
threads, then thats not unlike a place and route tool placing cells in
all the wrong places maximising wiring and delays instead of
minimizing it.

It turns out most of the best developers in the Transputer Occam
parallel space had to think like HW engineers and include partitioning
and placement as part of their design. Many of those people
transferred into FPGA space to do equiv development for many of the
apps transputers once claimed. Indeed Inmos stated that processes
model HW, from that one would have to treat them like HW entities when
designing. I think quite a few SW people (esp C crowd) failed that
part, hence huge disappointment.

Now what would happen when multiple cpus each at very low cost and
also highly threaded becomes affordable. The cpu I am working on is
4way HT and at 300MHz may cost lunch $ for an sp3-50. If I have lots
of them in a PC case cpu what do I program it in. Well the answer
isn't C is it. I would prefer a combination of Cxxx, Occam (just the
par/seq/alt/?! part) and Verilog. Since the cpu will support in HW the
same messaging P communication as Transputers then that can be used
for HDL simulation too but with a better schedular for a large time
wheel needed, (the original Transputer scheduler was too simple for HW
simulation, it was tried and proved pointless when faster x86 came
along).

Now the internal HDL processes as compiled will run till completion of
fragment or until branching & resuming soon afterwards. Also it seems
as if most everything you could do in Occam, can be done with Verilog
non synth event model, just seems right to HW world, but unknown to SW
world. And in reverse, what can be done with HDL event code can be
done with Occam or say HandelC. But As you said, HDL event code would
permit critical regions and data sharing, these would be unwelcome in
Occam CSP world.

its a mixed up world

johnjakson_usa_com
 
Steven Sharp wrote:

Jie Zhang <zhangjie@magima.com.cn> wrote in message news:<c1mb38$1n7b$1@mail.cn99.com>...

Or can we have a verilog simulator using time slicing to schedule the
concurrent processes?


You can. It is hard to see why you would want to. It would make
the simulation less efficient by adding unnecessary context switches.
It could make simulation behavior nonrepeatable in the presence of
race conditions. And since the concurrent processes share data, you
would have to deal with "critical regions" where it was not safe to
context switch between them.
Consider the following two concurrent simple processes:

module p1;
always
begin
#1 $display("p1");
end
endmodule

module p2;
always
begin
#1 $display("p2");
end
endmodule

If the simulator schedules p1 first, p2 will never get a chance to run.
It appears that p1 and p2 are not concurrent, but sequential. If
simulator schedules the concurrent processes sequentially, how can the
concurrent behavior be correctly simulated?

Jie
 
Jie Zhang <zhangjie@magima.com.cn> wrote in message news:<4042AEC9.7030803@magima.com.cn>...
If the simulator schedules p1 first, p2 will never get a chance to run.
It appears that p1 and p2 are not concurrent, but sequential. If
simulator schedules the concurrent processes sequentially, how can the
concurrent behavior be correctly simulated?
Just because two processes are concurrent does not mean that they can
run independently without synchronization. When a process executes a
delay control (e.g. #1), it must suspend execution until simulation
time has advanced to an appropriate time. A process cannot execute
behavior for a given simulation time until all behavior for earlier
simulation times has been executed. At the start, both p1 and p2
have behavior to be executed at time 0. If p1 is scheduled first,
it will run and reach the #1. It cannot proceed until time 1, which
cannot happen until p2 has executed all its behavior for time 0 and
is ready to proceed to time 1.

In a sequential simulation, when a process suspends itself to wait
for a delay or an event, the simulator switches to a different process
that is ready to run. When there are no processes ready to run at the
current time, it advances time to the earliest time at which anything
needs to run.

Note that a concurrent simulation would still have to synchronize
these processes. When one of the processes reaches a #1, it still
has to wait for the other process to reach the appropriate simulation
time before it can proceed. (Yes, it is possible that in some
situations, a process might be able to run ahead, but that is only
if the result is guaranteed to be equivalent to synchronized execution).

It sounds like you have some basic misunderstandings about the
semantics of the Verilog language.
 
Steven Sharp wrote:
It sounds like you have some basic misunderstandings about the
semantics of the Verilog language.
I see now:)

Thank you for your enlightening replies!

Jie
 

Welcome to EDABoard.com

Sponsor

Back
Top