FPGA implementation of a lexer and parser - feasible?

E

Ekalavya Nishada

Guest
Greetings,

I am new to hardware design and hoping to get a reality check on
building a lexical analyzer and parser using FPGAs. I can see the
following options.

1. A hardwired implementation of some or all of lexer&parser to
maximize the performance.

2. Build a CPU with instructions optimized for interpreting parse
tables and drive it with the output of a parser generator similar to
YACC.

3. Have a RISC-like CPU implemented on the FPGA executing a parser.

I see no advantage to option 3 over doing the same thing on a general
purpose CPU. I don't know if the second option has any potential for
increased performance. As far as the first option, I am wondering if
the FPGAs currently available have the capacity to implement the
hardwired parser for a language of low to moderate complexity and how
hard it might be model it using a language like VHDL.

Please post any thoughts/comments/pointers about the various options.

Thanks
 
In article <6ae7649c.0309261310.11abb60b@posting.google.com>,
Ekalavya Nishada <enishada@yahoo.com> wrote:
Greetings,

I am new to hardware design and hoping to get a reality check on
building a lexical analyzer and parser using FPGAs. I can see the
following options.

1. A hardwired implementation of some or all of lexer&parser to
maximize the performance.

2. Build a CPU with instructions optimized for interpreting parse
tables and drive it with the output of a parser generator similar to
YACC.

3. Have a RISC-like CPU implemented on the FPGA executing a parser.

I see no advantage to option 3 over doing the same thing on a general
purpose CPU. I don't know if the second option has any potential for
increased performance. As far as the first option, I am wondering if
the FPGAs currently available have the capacity to implement the
hardwired parser for a language of low to moderate complexity and how
hard it might be model it using a language like VHDL.

Please post any thoughts/comments/pointers about the various options.

Well, it's an ambitious project, anyway....

So are you trying to have a hardcoded interpreter? From what I think
you're saying, you want to build this parser and then (I'm guessing) you
want to either produce some bytecode for a VM (or in this case a real
machine (RM) implemented in the FPGA) or build some kind of AST and walk
it in your FPGA (?). Otherwise I can't see how it makes sense to just
have a parser in an FPGA.

Fist off, what's the all-fire hurry? Parsing a simple language is pretty
quick as is in software. It would be _lot_ easier to implement your CPU
in the FPGA and then use software to parse and compile the frontend
language into machine code that runs on your CPU.

Seconly, assuming that I'm guessing wrong and you just want a parser for a
programming language implemented in an FPGA: I think this could be pretty
difficult, but perhaps not impossible especially if you have some large
amount of memory available either inside of or external to your FPGA.
You'd have a bytestream coming in which represents characters of your
tokens, a tokenizer (a state machine) and then another big state machine
that implements your parser. But again, after you've parsed this
language, what do you intend to do with it?

Phil
 
"Phil Tomson" <ptkwt@aracnet.com> wrote in message
news:bl2dji0svq@enews1.newsguy.com...
In article <6ae7649c.0309261310.11abb60b@posting.google.com>,
Ekalavya Nishada <enishada@yahoo.com> wrote:
Greetings,

I am new to hardware design and hoping to get a reality check on
building a lexical analyzer and parser using FPGAs. I can see the
following options.
(snip of options)

Please post any thoughts/comments/pointers about the various options.

Well, it's an ambitious project, anyway....

So are you trying to have a hardcoded interpreter? From what I think
you're saying, you want to build this parser and then (I'm guessing) you
want to either produce some bytecode for a VM (or in this case a real
machine (RM) implemented in the FPGA) or build some kind of AST and walk
it in your FPGA (?). Otherwise I can't see how it makes sense to just
have a parser in an FPGA.

Fist off, what's the all-fire hurry? Parsing a simple language is pretty
quick as is in software. It would be _lot_ easier to implement your CPU
in the FPGA and then use software to parse and compile the frontend
language into machine code that runs on your CPU.
It would be nice to know the reason for the question. As far as I know,
parsing of existing languages isn't limiting compilation times. Most
languages were designed when machines were much slower than they are today,
and if it was a problem then, they might have designed the language to help
speed up parsing. I know many cases where that wasn't even true.

Seconly, assuming that I'm guessing wrong and you just want a parser for a
programming language implemented in an FPGA: I think this could be pretty
difficult, but perhaps not impossible especially if you have some large
amount of memory available either inside of or external to your FPGA.
You'd have a bytestream coming in which represents characters of your
tokens, a tokenizer (a state machine) and then another big state machine
that implements your parser. But again, after you've parsed this
language, what do you intend to do with it?
As a large part of parsing is finite state machines, and they are pretty
easy to build in FPGA's, though with external RAM if they get really huge,
that part seems pretty easy. Now, why would you want to do that? The
state tables could be generated externally, on a more ordinary machine.
Storing a symbol table would be a little more challenging, but I don't think
even that should be too hard. Hash algorithms should be easy to implement,
for example, or trees if that turns out better.

One possibility that I could think of would be in pattern matching. If you
consider a program like grep a parser, which signals the point at which it
recognizes a correct match, then one could use it for high speed pattern
matching.

-- glen
 
Ekalavya Nishada wrote:
Greetings,

I am new to hardware design and hoping to get a reality check on
building a lexical analyzer and parser using FPGAs.
Jan Gray, begetter of the excellent but more or less defunct
fpgacpu.org site, has discussed this. Search in Google.
 
ptkwt@aracnet.com (Phil Tomson) wrote in message >
Well, it's an ambitious project, anyway....

So are you trying to have a hardcoded interpreter? From what I think
you're saying, you want to build this parser and then (I'm guessing) you
want to either produce some bytecode for a VM (or in this case a real
machine (RM) implemented in the FPGA) or build some kind of AST and walk
it in your FPGA (?). Otherwise I can't see how it makes sense to just
have a parser in an FPGA.

I apologize for not providing the full context. What I am thinking of
is a parser for XML data. I am aware of the possibility of using
regular expressions but the power of regular expressions is not
sufficient for the processing I am thinking of. What I envision is a
lot of XML data being quickly parsed and some action taken.

Fist off, what's the all-fire hurry? Parsing a simple language is pretty
quick as is in software. It would be _lot_ easier to implement your CPU
in the FPGA and then use software to parse and compile the frontend
language into machine code that runs on your CPU.

I agree with you here. It does not make sense to optimize parsing when
the time spent there is a small portion of the total time as in
compilers or interpreters. But, if the tokenizing and parsing time
dominate, then it makes a lot of sense to make it as fast as possible.
I expect most of the data processing time to be spent in parsing and
comparitively little in processing the parser output. (No hard data on
that; that is my hypothesis still)

I might also add that I am drawn to FPGAs due to the possibility of
doing things in parallel. Processing of UTF encoded Unicode and
tokenizng should be an order of magnitude faster than doing them on
general purpose computers. Then there is always the possibility of
doing pattern matches on the parse tree in parallel.

Seconly, assuming that I'm guessing wrong and you just want a parser for a
programming language implemented in an FPGA: I think this could be pretty
difficult, but perhaps not impossible especially if you have some large
amount of memory available either inside of or external to your FPGA.
You'd have a bytestream coming in which represents characters of your
tokens, a tokenizer (a state machine) and then another big state machine
that implements your parser. But again, after you've parsed this
language, what do you intend to do with it?

As should be evident by now, it is not a programming language parser.
My question was really about the feasibility of doing a hardwired
parser. I just wanted to know if I am completely crazy or just a
little :)

Thanks!
 
"Glen Herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message news:<RK3db.592141$Ho3.113886@sccrnsc03>...
"Phil Tomson" <ptkwt@aracnet.com> wrote in message
news:bl2dji0svq@enews1.newsguy.com...
In article <6ae7649c.0309261310.11abb60b@posting.google.com>,
Ekalavya Nishada <enishada@yahoo.com> wrote:

It would be nice to know the reason for the question. As far as I know,
parsing of existing languages isn't limiting compilation times. Most
languages were designed when machines were much slower than they are today,
and if it was a problem then, they might have designed the language to help
speed up parsing. I know many cases where that wasn't even true.

Apologies again as I have not been clear in my questions. As you
mention later in your response, I am exploring this as an approach to
parsing of XML data. More to the point, my question was to ask the
FPGA experts here a) is it feasible to build a hardwired parser for a
small language? b) what sort of performance improvements one might see
even while interpreting parser tables as compared to doing the same in
a regular CPU?

One possibility that I could think of would be in pattern matching. If you
consider a program like grep a parser, which signals the point at which it
recognizes a correct match, then one could use it for high speed pattern
matching.

-- glen
Please see my response to Phil's questions also.

Thanks!
 
enishada@yahoo.com (Ekalavya Nishada) wrote in message news:<6ae7649c.0309261310.11abb60b@posting.google.com>...
Greetings,

I am new to hardware design and hoping to get a reality check on
building a lexical analyzer and parser using FPGAs. I can see the
following options.

1. A hardwired implementation of some or all of lexer&parser to
maximize the performance.

Cool. But why? :) Don't programmers spend 99% of their time debugging
instead of compiling?
 
I apologize for not providing the full context. What I am thinking of
is a parser for XML data. I am aware of the possibility of using
regular expressions but the power of regular expressions is not
sufficient for the processing I am thinking of. What I envision is a
lot of XML data being quickly parsed and some action taken.
Where is all your XML data coming from? How are you going to get
it to the FPGA?

I'm not a parsing wizard. I'm pretty sure you could do much of
the work in an FPGA. But what do you do when you find a
name that needs a new entry in a symbol table as compared to
an atom that can be turned into a simple code number?

Assume the data is coming off the net. That's slow. You can
parse it with the CPU. Assume it's coming off the disk. Why
not parse it once and cache the answer?



I might also add that I am drawn to FPGAs due to the possibility of
doing things in parallel. Processing of UTF encoded Unicode and
tokenizng should be an order of magnitude faster than doing them on
general purpose computers. Then there is always the possibility of
doing pattern matches on the parse tree in parallel.
You can't do things in parallel unless you have parallel paths
through the bottleneck. What's the bottleneck? CPU? Memory?
With an FPGA, it would probably be getting data to/from the FPGA.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
 
Apologies again as I have not been clear in my questions. As you
mention later in your response, I am exploring this as an approach to
parsing of XML data. More to the point, my question was to ask the
FPGA experts here a) is it feasible to build a hardwired parser for a
small language? b) what sort of performance improvements one might see
even while interpreting parser tables as compared to doing the same in
a regular CPU?

Yes this is doable and you get really good results.
http://www.eetimes.com/story/OEG20020819S0033
http://tarari.com/products-xml.html
http://tarari.com/about-technology.html

Regular CPUs just can't keep up.

Steve
 
"Ekalavya Nishada" <enishada@yahoo.com> wrote in message
news:6ae7649c.0309270903.22994d03@posting.google.com...
"Glen Herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message
news:<RK3db.592141$Ho3.113886@sccrnsc03>...
"Phil Tomson" <ptkwt@aracnet.com> wrote in message
news:bl2dji0svq@enews1.newsguy.com...
In article <6ae7649c.0309261310.11abb60b@posting.google.com>,
Ekalavya Nishada <enishada@yahoo.com> wrote:

It would be nice to know the reason for the question. As far as I
know,
parsing of existing languages isn't limiting compilation times. Most
languages were designed when machines were much slower than they are
today,
and if it was a problem then, they might have designed the language to
help
speed up parsing. I know many cases where that wasn't even true.

Apologies again as I have not been clear in my questions. As you
mention later in your response, I am exploring this as an approach to
parsing of XML data. More to the point, my question was to ask the
FPGA experts here a) is it feasible to build a hardwired parser for a
small language? b) what sort of performance improvements one might see
even while interpreting parser tables as compared to doing the same in
a regular CPU?
That makes a lot of sense. Somehow "lexer and parser" implies "programming
language" to many people, as it is a common use for them.

Many XML parsers are pretty slow, though I don't know that they have to be.

-- glen
 
"Tim" <tim@rockylogic.com.nooospam.com> wrote in message
news:bl3jrf$g63$1$8302bc10@news.demon.co.uk...
Ekalavya Nishada wrote:
Greetings,

I am new to hardware design and hoping to get a reality check on
building a lexical analyzer and parser using FPGAs.

Jan Gray, begetter of the excellent but more or less defunct
fpgacpu.org site, has discussed this. Search in Google.
Thanks Tim. Say not defunct -- think of it as on extended hiatus. (Hint:
try googling jan gray msdn).

Anyway, Ekalavya, if you visit fpgacpu.org and type 'parsing' into the
Search box you'll find a few entries that may be of interest.

Jan Gray, Gray Research LLC
FPGA CPU "News": www.fpgacpu.org
 

Welcome to EDABoard.com

Sponsor

Back
Top