Lexing the ' char

O

Ole Nielsby

Guest
I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?
 
On Nov 2, 7:58 am, "Ole Nielsby" <ole.niel...@tekare-you-
spamminglogisk.dk> wrote:
I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?
This is where you need a few characters of lookahead in your lex
buffer. If you match ( TICK, char, TICK) you have a character literal.
Otherwise it's the TICK token (attribute or type qualifier).

- Kenn
 
<kennheinrich@sympatico.ca> wrote:
Ole Nielsby wrote:
I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to
::
or .
in C++ if I understand correctly. How would a lexer decide?

This is where you need a few characters of lookahead in your lex
buffer. If you match ( TICK, char, TICK) you have a character literal.
Otherwise it's the TICK token (attribute or type qualifier).
Thanks. That's what I already implemented but I wasn't sure...
 
On Nov 3, 12:58 am, "Ole Nielsby" <ole.niel...@tekare-you-
spamminglogisk.dk> wrote:
I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?
On Nov 3, 12:58 am, "Ole Nielsby" <ole.niel...@tekare-you-
spamminglogisk.dk> wrote:
I'm writing a lexer for VHDL and I don't know how to treat the ' char.
It can be used both for character literals and as an operator similar to ::
or .
in C++ if I understand correctly. How would a lexer decide?
case '\'': /* IR1045 check */

if ( last_token == DELIM_RIGHT_PAREN ||
last_token == DELIM_RIGHT_BRACKET ||
last_token == KEYWD_ALL ||
last_token == IDENTIFIER_TOKEN ||
last_token == STR_LIT_TOKEN ||
last_token == CHAR_LIT_TOKEN || !
(buff_ptr<BUFSIZ-2) )
token_flag = DELIM_APOSTROPHE;
else if (is_graphic_char(NEXT_CHAR) &&
line_buff[buff_ptr+2] == '\'') {
CHARACTER_LITERAL:
buff_ptr+= 3; /* lead,trailing \'
and char */
last_token = CHAR_LIT_TOKEN;
token_strlen = 3;
return (last_token);
}
else token_flag = DELIM_APOSTROPHE;
break;

See Issue Report IR1045:
http://www.eda-stds.org/isac/IRs-VHDL-93/IR1045.txt

As you can see from the above code fragment, the last token can be
captured and used to di"sambiguate something like:

foo <= std_logic_vector'('a','b','c');

without a large look ahead or backtracking.

Mind you you could try to argue that LRM 13.2:

...

"In some cases an explicit separator is required to separate adjacent
lexical elements (namely when, without separation, interpretation as a
single lexical element is possible). A separator is either a space
character (SPACE or NBSP),a format effector, or the end of a line. A
space character (SPACE or NBSP) is a separator except within a
comment, a string literal, or a space character literal."

could simply require the inclusion of disambiguating whitespace. The
accepted practice would be against you, however.
 

Welcome to EDABoard.com

Sponsor

Back
Top