'begin_keywords/`end_keywords?

E

Evan Lavelle

Guest
Does anyone actually implement `begin_keywords/`end_keywords? To
handle these, your lexer has to dynamically change its keyword list on
the fly, which precludes anything which is lex-like. The spec also
says "The directives do not affect the semantics, tokens, and other
aspects of the Verilog language." So, presumably, the user could
request 1995 keywords, use 'generate' as an identifer, and then expect
a generate statement to work?

These can also only be specified "outside of a design element (module,
primitive, or configuration)". A lexer doesn't have any semantic
information - how is it going to know?

Evan
 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Evan Lavelle wrote:
Does anyone actually implement `begin_keywords/`end_keywords? To
handle these, your lexer has to dynamically change its keyword list on
the fly, which precludes anything which is lex-like.
It's nowhere near as bad as you think. lexors typically match
identifiers then test the identifier against a table to see if
it is a keyword. If it is, it returns the keyword parse code to
the parser; otherwise it returns the IDENTIFIER parse code along
with the actual string.

This could be implemented in Icarus Verilog in an afternoon, for
example. Really, it's no big deal. (It's just that no-one's asked
for it.)

The spec also
says "The directives do not affect the semantics, tokens, and other
aspects of the Verilog language." So, presumably, the user could
request 1995 keywords, use 'generate' as an identifer, and then expect
a generate statement to work?
That seems far fetched. If a "generate" keyword (which is lexically
distinct from an identifier with the same letters) is never found,
then the generate syntax will not be recognized. But you can put
the "generate" keyword back into the lexicon whenever you want.

These can also only be specified "outside of a design element (module,
primitive, or configuration)". A lexer doesn't have any semantic
information - how is it going to know?
You're thinking too hard. If the generate keywords disappears from
the lexicon, then generate blocks are not parsed. Simple.

Also, the `begin_keywords/`end_keywords would be handled by the
lexical analyzer, so it can easily know when the keyword table
is changing. The parser would never see this, and would never care.

- --
Steve Williams "The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFGO11brPt1Sc2b3ikRAtizAJ4q8/cpQykZdSBsmCn8u2gB0pXUSACcDqrf
Dsw91/ykBXDP739bgfXfrC4=
=xWNs
-----END PGP SIGNATURE-----
 
On Fri, 04 May 2007 09:20:43 -0700, Stephen Williams
<spamtrap@icarus.com> wrote:

Evan Lavelle wrote:
Does anyone actually implement `begin_keywords/`end_keywords? To
handle these, your lexer has to dynamically change its keyword list on
the fly, which precludes anything which is lex-like.

It's nowhere near as bad as you think. lexors typically match
identifiers then test the identifier against a table to see if
it is a keyword. If it is, it returns the keyword parse code to
the parser; otherwise it returns the IDENTIFIER parse code along
with the actual string.

This could be implemented in Icarus Verilog in an afternoon, for
example. Really, it's no big deal. (It's just that no-one's asked
for it.)
I thought you used flex? You don't have any access to tables of this
sort in (f)lex. You could potentially change start states, and match
some keywords in only some states, but that doesn't affect the issue
with the parser (below). You can't hack the lex output code, either:

switch( yy_act )
....
case 38:
YY_RULE_SETUP
#line 115 "vlog_lexer.l"
return GENERATE;

presumably 'yy_act' is generated by some automata that has matched
"generate", but it's certainly not obvious how it did it.

The spec also
says "The directives do not affect the semantics, tokens, and other
aspects of the Verilog language." So, presumably, the user could
request 1995 keywords, use 'generate' as an identifer, and then expect
a generate statement to work?

That seems far fetched. If a "generate" keyword (which is lexically
distinct from an identifier with the same letters) is never found,
then the generate syntax will not be recognized. But you can put
the "generate" keyword back into the lexicon whenever you want.
It *is* far-fetched, but it seems to be the intent. Note that I'm
suggesting that the user switches off V2K keywords, for example, and
then uses generate, for example, as an identifier, *and* also expects
generate statements to work, even though the keyword's gone. That's
the bit that's hard.

I say "seems to be the intent" because the spec says "The directives
do not affect the semantics, tokens, and other aspects of the Verilog
language." If the intent had simply been to operate a V2K compiler in
1995 mode, then it would have been simpler to state that a V2K
compiler should have the ability to operate 1995 mode when required.
Instead, this section talks about accepting or rejecting keywords,
rather than actually supporting 1995, 2001, 2001-noconfig, and 2005
functionality. And, if the intent had instead actually been to allow a
V2K compiler to operate in these other modes, then this surely has to
be a *static* requirement (a command-line switch, for example).
Specifying that this has to done at runtime, potentially many times
over in a stack of different modes (2nd para from the bottom on p361),
is just bizarre.

These can also only be specified "outside of a design element (module,
primitive, or configuration)". A lexer doesn't have any semantic
information - how is it going to know?

You're thinking too hard. If the generate keywords disappears from
the lexicon, then generate blocks are not parsed. Simple.

Also, the `begin_keywords/`end_keywords would be handled by the
lexical analyzer, so it can easily know when the keyword table
is changing. The parser would never see this, and would never care.
Yes, but the lexer normally wouldn't know that it's in a module,
primitive, or whatever; it just handles the tokens. If you really
wanted to follow the letter of the LRM and check that the
`begin/end_keywords were in the right part of the source code, the
lexer would have to add a note to the AST that it had found the
directives, and then later semantic analysis would have to check that
they were in the appropriate part of the AST, which seems a bit
far-fetched.

Evan
 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I still think a mountain is being made out of a mole-hill.
See below.

Evan Lavelle wrote:
On Fri, 04 May 2007 09:20:43 -0700, Stephen Williams
spamtrap@icarus.com> wrote:

Evan Lavelle wrote:
Does anyone actually implement `begin_keywords/`end_keywords? To
handle these, your lexer has to dynamically change its keyword list on
the fly, which precludes anything which is lex-like.
It's nowhere near as bad as you think. lexors typically match
identifiers then test the identifier against a table to see if
it is a keyword. If it is, it returns the keyword parse code to
the parser; otherwise it returns the IDENTIFIER parse code along
with the actual string.

This could be implemented in Icarus Verilog in an afternoon, for
example. Really, it's no big deal. (It's just that no-one's asked
for it.)

I thought you used flex? You don't have any access to tables of this
sort in (f)lex. You could potentially change start states, and match
some keywords in only some states, but that doesn't affect the issue
with the parser (below). You can't hack the lex output code, either:
The flex based lexor in Icarus Verilog matches identifiers. Then
it looks up the identifier in a keyword table, and if it finds a
match in that table it returns the lexical code for that keyword.
Creating a lexical pattern for every keyword would be inefficient.

It *is* far-fetched, but it seems to be the intent. Note that I'm
suggesting that the user switches off V2K keywords, for example, and
then uses generate, for example, as an identifier, *and* also expects
generate statements to work, even though the keyword's gone. That's
the bit that's hard.
I see no problem. In the stretch of code where the keywords of generate
syntax are not enabled, generate specific keywords are not matched
as keywords. So? Shouldn't you expect this to work:

generate
genvar i;
`begin_keywords "1995"
for (i = 0 ; i < 10 ; i=i+1) begin :generate
assign foo = genvar;
end
`endkeywords
endgenerate

The purpose of turning off 2001/2005 keywords is to eliminate name
collisions for programs written before the keywords came to exist,
and not to turn off semantic features that use those keywords.


Yes, but the lexer normally wouldn't know that it's in a module,
primitive, or whatever; it just handles the tokens.
What does that have to do with anything? Between the `begin_keywords
and `end_keywords, only the spefied set of keywords is interpreted
as keywords. How can that be complicated.

- --
Steve Williams "The woods are lovely, dark and deep.
steve at icarus.com But I have promises to keep,
http://www.icarus.com and lines to code before I sleep,
http://www.picturel.com And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFGO3ZArPt1Sc2b3ikRAjcdAJ9ja9eJvs+1102dkwtQ9ArmFfMrBQCePF2M
JsnpP4wBKLUV8b1BjQJUBeY=
=kIVk
-----END PGP SIGNATURE-----
 
On Fri, 04 May 2007 11:06:56 -0700, Stephen Williams
<spamtrap@icarus.com> wrote:

I still think a mountain is being made out of a mole-hill.
On reflection, I think you're right.

The flex based lexor in Icarus Verilog matches identifiers. Then
it looks up the identifier in a keyword table, and if it finds a
match in that table it returns the lexical code for that keyword.
Creating a lexical pattern for every keyword would be inefficient.
Interesting; you're manually checking identifiers for keywords, rather
than using a pattern for each keyword? I'd never even thought of
matching keywords that way, and I haven't seen anyone else do that.
Lex claims to be very efficient, and I'm pretty sure it is. On the
other hand, I'm sure that you can be very efficient as well... :)

I see no problem. In the stretch of code where the keywords of generate
syntax are not enabled, generate specific keywords are not matched
as keywords. So? Shouldn't you expect this to work:

generate
genvar i;
`begin_keywords "1995"
for (i = 0 ; i < 10 ; i=i+1) begin :generate
assign foo = genvar;
end
`endkeywords
endgenerate

Yes, I would. But my problem is with

`begin_keywords "1364-1995"
// possibly hundreds of source files later:
generate // seen as an identifier, not a keyword: error
...
endgenerate

It seems to me that the wording in the LRM indicates that you still
have to support generate even in this case, which is next to
impossible. But, as you say,

The purpose of turning off 2001/2005 keywords is to eliminate name
collisions for programs written before the keywords came to exist,
Which is why I (no longer) think this is a problem. If someone wants
to compile old code in a 2001/2005 analyser, and they may have used
identifiers which are now keywords, then they'll need to do this (or,
much better, they specify something equivalent on the command line and
don't modify the source). If they run into the scenario above, then I
guess that's their problem, and they have to expect a syntax error,
whatever the LRM may or may not say.

Yes, but the lexer normally wouldn't know that it's in a module,
primitive, or whatever; it just handles the tokens.

What does that have to do with anything? Between the `begin_keywords
and `end_keywords, only the spefied set of keywords is interpreted
as keywords. How can that be complicated.
The LRM specifies that these configuration keywords "must be outside
of a design element". In other words, you have to check where they
are, and report an error if they're in the wrong place. How are you
going to find or report this error if all `begin/end_keyword
processing is carried out in the lexer? The lexer doesn't, or
shouldn't, know if the current position in the token stream is inside
a module, primitive, or configuration.

Evan
 
Evan Lavelle wrote:
Does anyone actually implement `begin_keywords/`end_keywords?
NC-Verilog does. As Stephen Williams says, it is not hard.

To
handle these, your lexer has to dynamically change its keyword list on
the fly, which precludes anything which is lex-like.
Again as Stephen says, compilers generally use a FSM to recognize
identifiers and then do a table lookup to recognize keywords. I think
every production compiler I have ever worked on has done it this way.
And if you want a compiler that has different keywords in different
modes, independent of whether that is controlled by a directive or a
command line option, this is the only practical approach.

The spec also
says "The directives do not affect the semantics, tokens, and other
aspects of the Verilog language." So, presumably, the user could
request 1995 keywords, use 'generate' as an identifer, and then expect
a generate statement to work?
The "generate" keyword is a poor example, since it is completely
optional in Verilog-2005.

But if you request 1995 keywords, then "generate" is an identifier and
not a keyword. Any grammar productions that involve the generate
keyword can no longer be matched. That will probably make any attempt
to use it as a keyword produce a syntax error.

The intent was to avoid having anything but keywords changed by this
directive. If the directive changed other things to try to get an
exact match to the earlier language behavior, then the specification
of the directive would need to list all of its effects. It would also
be hard to implement.

Since I proposed the addition of this (though not the exact wording),
I can verify that this was the intent.

These can also only be specified "outside of a design element (module,
primitive, or configuration)". A lexer doesn't have any semantic
information - how is it going to know?
There are other directives with this restriction, so if you want to
enforce it, directive processing has to know this.

If you process directives in the parser, then you need a mechanism for
the parser to communicate the keyword set change to the lexer. This
could be access to the lexer keyword table, or a shared variable that
tells the lexer what the current keyword set is, or a nicely
encapsulated function in the lexer that the parser can call to tell it
that the dialect has changed.
 
sharp@cadence.com wrote:
Evan Lavelle wrote:
The spec also
says "The directives do not affect the semantics, tokens, and other
aspects of the Verilog language."

Since I proposed the addition of this (though not the exact wording),
I can verify that this was the intent.
Re-reading this, I realized it could be misinterpreted.

The part I meant that I proposed was the restriction of the directive
to affecting keywords, i.e. the intent (but not the wording) of the
statement quoted from the LRM.

I was not the original proposer of the directives, though I was
involved in the discussions.
 
I can't comment on the begin_keyword discussion on which the following
quote was extract from.

Evan Lavelle <nospam@nospam.com> writes:

Interesting; you're manually checking identifiers for keywords, rather
than using a pattern for each keyword? I'd never even thought of
matching keywords that way, and I haven't seen anyone else do that.
Lex claims to be very efficient, and I'm pretty sure it is. On the
other hand, I'm sure that you can be very efficient as well... :)
Actually, this "manual" checking can be automated. When you lex an
identifier, you immediately look it up in the appropriate symbol table
and if it is a keyword immediate create a keyword token (instead of an
identifier token). This is a standard technique, which to the best of
my knowledge was pioneered by Frank Deremer (of SLR and LALR fame) and
was "well-known" in the mid 70's when it was taught to me in compiler
class.

It's actually a very effective technique and has the nice property
that it is easy to customize and extend. For example, to add new
keywords, just add new strings into the symbol table. To change
languages, simply translate the strings. You can easily read the
strings in from a file.

My experience with the technique is so positive that we incorporated
it as a "built-in" feature in Yacc++ so that all our customers would
be encouraged to use it.
 
On 4 May 2007 16:12:24 -0700, sharp@cadence.com wrote:

There are other directives with this restriction, so if you want to
enforce it, directive processing has to know this.

If you process directives in the parser, then you need a mechanism for
the parser to communicate the keyword set change to the lexer. This
could be access to the lexer keyword table, or a shared variable that
tells the lexer what the current keyword set is, or a nicely
encapsulated function in the lexer that the parser can call to tell it
that the dialect has changed.
I can't do directive processing in the parser because (as you know, of
course) the directives aren't part of the main language grammar, and
so must be gone by parse time. I could instead try lexical feedback
from the parser to the lexer (ie. pass current context information
from the parser back to the lexer), but this doesn't work well in
lex/yacc (since lex may already have a lookahead token by the time
that yacc tells it where it is, and lex has no way to unget tokens [I
had a quick look at yacc++, but I didn't notice that it had fixed this
problem - Chris?]). I've probably missed something, but it seems to me
that if you want to enforce this restriction you need to do some
potentially nasty messing about.

In the worst case, you may need to keep a table of source code line
start and end numbers for interesting things such as modules and
primitives, taking into account `include directives, and ignoring
`line directives, and then you need to find out where your other
directives fit into this table.

Anyway, this is only a simple parser for extracting module
information, so I'm not going to lose any sleep over it.

Thanks -

Evan
 
Evan Lavelle <nospam@nospam.com> writes:

I can't do directive processing in the parser because (as you know, of
course) the directives aren't part of the main language grammar, and
so must be gone by parse time. I could instead try lexical feedback
from the parser to the lexer (ie. pass current context information
from the parser back to the lexer), but this doesn't work well in
lex/yacc (since lex may already have a lookahead token by the time
that yacc tells it where it is, and lex has no way to unget tokens [I
had a quick look at yacc++, but I didn't notice that it had fixed this
problem - Chris?]). I've probably missed something, but it seems to me
that if you want to enforce this restriction you need to do some
potentially nasty messing about.
Oh, my gosh--so much to answer.

1) In Yacc++, when you have action code in the parser, the parser
won't request a token until after it has processed the action code,
to avoid this "lexical feedback" problem.

2) We don't have an "unget" function in a Yacc++ lexer to take back a
token. However, if one really wanted, one could add one (or hire
us to add one, we do custom work like that), the source code to the
library is included.

3) We have a specific feature "ignore" that allows one to specify
rules (non-terminals to be precise) that are parsed but whose
results don't participate in the rest of the parse. That feature
was specifically designed to handle "preprocessors".

In fact, I believe it was this last feature that I used when writing a
Verilog grammar in Yacc++. I also used lexer classes (the Yacc++
version of start states) to handle the 'b/'o/'d'/'h numbers since the
grammar makes the digits following them a separate "token" (and you
can use `defines to get macro expansion). There were some other
"tricks" I used to get the grammar to parse, but it wasn't that bad.
The grammar reads pretty close to the 1995 standard (on which it was
based), although some 2001 features have since been added and I'm not
currently enhancing it as the customer is not taking their use farther
at the moment. (It was used in a Carbon Design like tool.)

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
 

Welcome to EDABoard.com

Sponsor

Back
Top