GRK --- Grammar Recovery Kit

... is a software toolkit for semi-automatic grammar recovery. In fact, the toolkit goes beyond the basic idea of obtaining a relatively correct and complete grammar from a language reference, but GRK even addresses reverse and re-engineering of the language reference itself. This is demonstrated for IBM's VS Cobol II by deriving an authoritative definition of VS Cobol II from IBM's application programming reference semi-automatically. This authoritative definition comprehends all the text from IBM's reference as well as comments on the performed adaptations. Furthermore, GRK provides some simple means for grammar deployment, that is, it derives actual parsers. In the case of VS Cobol II, two parsers are derived: a slow, Prolog-based parser, and a fast, BTYACC-based parser (the latter with the help of GDK).

Because of the potentially different kinds of documents that one can use for grammar recovery, there is no final answer to the question of tool support. The present GRK includes a number of tools that are of general use, e.g., a transformation tool for grammars, but several other tools are biased towards IBM standards such as the VS Cobol II application programming reference, e.g., a specific tool for diagram extraction.

GRK is being developed by Ralf Lämmel at the Free University, Amsterdam and CWI. It is free software. Version 1.0 was released on June 4, 2003. GRK is implemented in SWI-Prolog and gmake is used to glue together all components. The GRK functionality can be seen as a careful implementation of the functionality that Ralf Lämmel and Chris Verhoef describe in their paper on "Semi-Automatic Grammar Recovery" which appeared in SP&E in 2001. GRK goes beyond that approach in that it also supports document re-engineering. GRK is part of a larger effort at VU & CWI in Amsterdam on what we call engineering of grammarware.


GRK tools


The VS Cobol II case

The distribution comprehends a version of IBM's application programming reference for VS Cobol II as downloaded from the freely accessible IBM BookManagerŪ BookServer Library. Several other documents and files are generated from this document. There are the following steps: The result of preparation is called the the reverse-engineered IBM document as it is useful on its own as a self-contained version of the IBM's application programming reference with a normalised notation. For example, all diagrams are labelled by appropriate names, and notational anomalies were eliminated. The result of all steps up-to inlining is called the the re-engineered IBM document because several transformations were applied on the syntax diagrams. All the transformation scripts are included in the distribution but we link them here for convenience:

Deliverables

Transformation scripts

Download and installation

The installation of GRK and SWI-Prolog is really trivial.
The generated Prolog-based Cobol parser can be readily used.
Using the fast, BTYACC-based parser relies on the following:


Useful links


Lessons learned


Acknowledgements

I am grateful for the collaboration with Jan Kort on the subject of providing tooling for treating grammars as engineering artifacts. This activity contributes an overall effort on engineering of grammarware. In this context, I am grateful for collaboration with Paul Klint, Steven Klusener, and Chris Verhoef. I am also very grateful for discussions and comments from colleagues, and I apologise for any omission in the following list: Mark van den Brand, Jim Cordy, Jan Heering, Niels Veerman, Ernst-Jan Verhoeven, Joost Visser.


Page last updated November 26, 2003.
Send your email remarks to Ralf Lämmel.