Recently I found a bug in KDevelop3′s qmake parser and the same bug existed in the KDevelop4 version. Now personally I was quite satisfied with the KDevelop4 parser, but this particular problem wasn’t fixable in a sane way in it. So I decided its time again for another complete rewrite. IIRC the 3rd time for KDevelop4′s qmake parser to be fully rewritten.
But this time I wanted to look into alternatives for the flex/bison combo, one reason is that somehow I can’t think in LALR as well as in LL and the other is that getting C++ code out of these two is a not really easy and especially chaining them doesn’t work well. I’ve looked at other options, including but not limited to: bison++, bisonc++, antlr and Coco.
Turned out that Coco was most promising from that list (antlr needs a runtime lib, bisonc++ had some issues I don’t recall at the moment), but soon I figured I had to rewrite its Lexer template. It was working ok, however the parser still needed some of the old functions that were defined in the lexer template (for allocating, copying and otherwise handling wchar_t arrays for the actual text). After some fiddling with the generator I decided to drop this one too, because it costs too much time.
So, now I was basically back at flex/bison, but the Coco-Lexer I wrote used Qt to do the reading and handling of the input text, which meant complete unicode support for free. I thought thats a pretty important feature and looking at how the flex-based lexer for kdevelop4 language plugins handle that I ran away screaming. The only option left is a hand-written lexer together with a parser generator.
From there it was only a small jump to kdevelop-pg, a parser generator written by Roberto Raggi and used to create parsers for Java, Ruby, CSharp and Python (right, the GSoC project for KDevelop). Using kdevelop-pg makes sure that it gets updated when needed and already showed some flaws that were fixed in its design.
After about 2 weeks now I have a lexer that should be pretty ok (as in it doesn’t choke on Qt4 .pro files although I’m not 100% sure yet that it produces the right token stream) and a parser. Quite an interesting experience, especially writing up the lexer as this is my first handwritten lexer (I actually wrote two in the duration, because the first approach I took didn’t work out – as usual). And I think the code is even pretty well readable, not even close to a beast like the flex-generated lexer :)
The next step (apart from fixing the parser bugs, which turn out when trying real-world files) is to use the generated AST to build up the easier-usable handwritten AST I already have (although it has to be changed here and there, especially getting location information). This shouldn’t be too hard as kdevelop-pg has some nice support to add special members to ast nodes and also provides a default visitor which walks through the AST. I already implemented a debug visitor (hand-written, as the generated one doesn’t give me enough information) which produces a lot of output when the parser is run in debug mode.
Why this is an odysee? Because when I decided to do a rewrite I was pointed by fellow developers that kdev-pg+handwritten lexer might be a good idea and I even thought about this myself. Also I finally got back to my idea of having a language plugin for the QMake language and discussed that on the kdevelop list…