CHANGES_FROM_1.33
资源名称:pccts133.zip [点击查看]
上传用户:itx_2006
上传日期:2007-01-06
资源大小:493k
文件大小:165k
源码类别:
编译器/解释器
开发平台:
Others
- WITH predicate: line 11 v4.g
- WITHOUT predicate: line 12 v4.g
- The original context set for the predicate:
- A X
- The lookahead set for the alt WITHOUT the semantic predicate:
- A
- The intersection of the two sets
- A
- The original predicate:
- pred << isUpper(LATEXT(1))>>?
- depth=k=1 rule b line 15 v4.g
- set context:
- A X
- tree context: null
- The new (modified) form of the predicate:
- pred << isUpper(LATEXT(1))>>?
- depth=k=1 rule b line 15 v4.g
- set context:
- X
- tree context: null
- #endif
- --------------------------------------------------------------
- The bad news about -mrhoist:
- (a) -mrhoist does not analyze predicates with lookahead
- depth > 1.
- (b) -mrhoist does not look past a guarded predicate to
- find context which might cover other predicates.
- For these cases you might want to use syntactic predicates.
- When a semantic predicate fails during guess mode the guess
- fails and the next alternative is tried.
- Limitation (a) is illustrated by the following example:
- start : (stmt)* EOF ;
- stmt : cast
- | expr
- ;
- cast : <<isTypename(LATEXT(2))>>? LP ID RP ;
- expr : LP ID RP ;
- This is not much different from the first example, except that
- it requires two tokens of lookahead context to determine what
- to do. This predicate is NOT suppressed because the current version
- is unable to handle predicates with depth > 1.
- A predicate can be combined with other predicates during hoisting.
- In those cases the depth=1 predicates are still handled. Thus,
- in the following example the isUpper() predicate will be suppressed
- by line #4 when hoisted from "bizarre" into "start", but will still
- be present in "bizarre" in order to predict "stmt".
- start : (bizarre)* EOF ; // #1
- // #2
- bizarre : stmt // #3
- | A // #4
- ;
- stmt : cast
- | expr
- ;
- cast : <<isTypename(LATEXT(2))>>? LP ID RP ;
- expr : LP ID RP ;
- | <<isUpper(LATEXT(1))>>? A
- Limitation (b) is illustrated by the following example of a
- context guarded predicate:
- rule : (A)? <<p>>? // #1
- (A // #2
- |B // #3
- ) // #4
- | <<q>> B // #5
- ;
- Recall that this means that when the lookahead is NOT A then
- the predicate "p" is ignored and it attempts to match "A|B".
- Ideally, the "B" at line #3 should suppress predicate "q".
- However, the current version does not attempt to look past
- the guard predicate to find context which might suppress other
- predicates.
- In some cases -mrhoist will lead to the reporting of ambiguities
- which were not visible before:
- start : (a)* "@";
- a : bc | d;
- bc : b | c ;
- b : <<isUpper(LATEXT(1))>>? A;
- c : A ;
- d : A ;
- In this case there is a true ambiguity in "a" between "bc" and "d"
- which can both match "A". Without -mrhoist the predicate in "b"
- is hoisted into "a" and there is no ambiguity reported. However,
- with -mrhoist, the predicate in "b" is suppressed by "c" (as it
- should be) making the ambiguity in "a" apparent.
- The motivations for these changes were hoisting problems reported
- by Reinier van den Born (reinier@vnet.ibm.com) and several others.
- #116. (Changed in 1.33MR10) C++ mode: tracein/traceout rule name is (const char *)
- The prototype for C++ mode routine tracein (and traceout) has changed from
- "char *" to "const char *".
- #115. (Changed in 1.33MR10) Using guess mode with exception handlers in C mode
- The definition of the C mode macros zzmatch_wsig and zzsetmatch_wsig
- neglected to consider guess mode. When control passed to the rule's
- parse exception handler the routine would exit without ever closing the
- guess block. This would lead to unpredictable behavior.
- In 1.33MR10 the behavior of exceptions in C mode and C++ mode should be
- identical.
- #114. (Changed in 1.33MR10) difference in [zz]resynch() between C and C++ modes
- There was a slight difference in the way C and C++ mode resynchronized
- following a parsing error. The C routine would sometimes skip an extra
- token before attempting to resynchronize.
- The C routine was changed to match the C++ routine.
- #113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr
- The existing context guarded predicate:
- rule : (guard)? => <<p>>? expr
- | next_alternative
- ;
- generates code which resembles:
- if (lookahead(expr) && (!guard || pred)) {
- expr()
- } else ....
- This is not suitable for some applications because it allows
- expr() to be invoked when the predicate is false. This is
- intentional because it is meant to mimic automatically computed
- predicate context.
- The new context guarded predicate uses the guard information
- differently because it has a different goal. Consider:
- rule : (guard)? && <<p>>? expr
- | next_alternative
- ;
- The new style of context guarded predicate is equivalent to:
- rule : <<guard==true && pred>>? expr
- | next_alternative
- ;
- It generates code which resembles:
- if (lookahead(expr) && guard && pred) {
- expr();
- } else ...
- Both forms of guarded predicates severely restrict the form of
- the context guard: it can contain no rule references, no
- (...)*, no (...)+, and no {...}. It may contain token and
- token class references, and alternation ("|").
- Addition for 1.33MR11: in the token expression all tokens must
- be at the same height of the token tree:
- (A ( B | C))? && ... is ok (all height 2)
- (A ( B | ))? && ... is not ok (some 1, some 2)
- (A B C D | E F G H)? && ... is ok (all height 4)
- (A B C D | E )? && ... is not ok (some 4, some 1)
- This restriction is required in order to properly compute the lookahead
- set for expressions like:
- rule1 : (A B C)? && <<pred>>? rule2 ;
- rule2 : (A|X) (B|Y) (C|Z);
- This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com)
- #112. (Changed in 1.33MR10) failed validation predicate in C guess mode
- John Lilley (jlilley@empathy.com) suggested that failed validation
- predicates abort a guess rather than reporting a failed error.
- This was installed in C++ mode (Item #4). Only now was it noticed
- that the fix was never installed for C mode.
- #111. (Changed in 1.33MR10) moved zzTRACEIN to before init action
- When the antlr -gd switch is present antlr generates calls to
- zzTRACEIN at the start of a rule and zzTRACEOUT at the exit
- from a rule. Prior to 1.33MR10 Tthe call to zzTRACEIN was
- after the init-action, which could cause confusion because the
- init-actions were reported with the name of the enclosing rule,
- rather than the active rule.
- #110. (Changed in 1.33MR10) antlr command line copied to generated file
- The antlr command line is now copied to the generated file near
- the start.
- #109. (Changed in 1.33MR10) improved trace information
- The quality of the trace information provided by the "-gd"
- switch has been improved significantly. Here is an example
- of the output from a test program. It shows the rule name,
- the first token of lookahead, the call depth, and the guess
- status:
- exit rule gusxx {"?"} depth 2
- enter rule gusxx {"?"} depth 2
- enter rule gus1 {"o"} depth 3 guessing
- guess done - returning to rule gus1 {"o"} at depth 3
- (guess mode continues - an enclosing guess is still active)
- guess done - returning to rule gus1 {"Z"} at depth 3
- (guess mode continues - an enclosing guess is still active)
- exit rule gus1 {"Z"} depth 3 guessing
- guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends)
- enter rule gus1 {"o"} depth 3
- guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends)
- guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends)
- exit rule gus1 {"Z"} depth 3
- line 1: syntax error at "Z" missing SC
- ...
- Rule trace reporting is controlled by the value of the integer
- [zz]traceOptionValue: when it is positive tracing is enabled,
- otherwise it is disabled. Tracing during guess mode is controlled
- by the value of the integer [zz]traceGuessOptionValue. When
- it is positive AND [zz]traceOptionValue is positive rule trace
- is reported in guess mode.
- The values of [zz]traceOptionValue and [zz]traceGuessOptionValue
- can be adjusted by subroutine calls listed below.
- Depending on the presence or absence of the antlr -gd switch
- the variable [zz]traceOptionValueDefault is set to 0 or 1. When
- the parser is initialized or [zz]traceReset() is called the
- value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue.
- The value of [zz]traceGuessOptionValue is always initialzed to 1,
- but, as noted earlier, nothing will be reported unless
- [zz]traceOptionValue is also positive.
- When the parser state is saved/restored the value of the trace
- variables are also saved/restored. If a restore causes a change in
- reporting behavior from on to off or vice versa this will be reported.
- When the -gd option is selected, the macro "#define zzTRACE_RULES"
- is added to appropriate output files.
- C++ mode
- --------
- int traceOption(int delta)
- int traceGuessOption(int delta)
- void traceReset()
- int traceOptionValueDefault
- C mode
- --------
- int zzTraceOption(int delta)
- int zzTraceGuessOption(int delta)
- void zzTraceReset()
- int zzTraceOptionValueDefault
- The argument "delta" is added to the traceOptionValue. To
- turn on trace when inside a particular rule one:
- rule : <<traceOption(+1);>>
- (
- rest-of-rule
- )
- <<traceOption(-1);>>
- ; /* fail clause */ <<traceOption(-1);>>
- One can use the same idea to turn *off* tracing within a
- rule by using a delta of (-1).
- An improvement in the rule trace was suggested by Sramji
- Ramanathan (ps@kumaran.com).
- #108. A Note on Deallocation of Variables Allocated in Guess Mode
- NOTE
- ------------------------------------------------------
- This mechanism only works for heap allocated variables
- ------------------------------------------------------
- The rewrite of the trace provides the machinery necessary
- to properly free variables or undo actions following a
- failed guess.
- The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded
- as part of the zzGUESS macro. When a guess is opened
- the value of zzrv is 0. When a longjmp() is executed to
- undo the guess, the value of zzrv will be 1.
- The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded
- as part of the zzGUESS_DONE macro. This is executed
- whether the guess succeeds or fails as part of closing
- the guess.
- The guessSeq is a sequence number which is assigned to each
- guess and is incremented by 1 for each guess which becomes
- active. It is needed by the user to associate the start of
- a guess with the failure and/or completion (closing) of a
- guess.
- Guesses are nested. They must be closed in the reverse
- of the order that they are opened.
- In order to free memory used by a variable during a guess
- a user must write a routine which can be called to
- register the variable along with the current guess sequence
- number provided by the zzUSER_GUESS_HOOK macro. If the guess
- fails, all variables tagged with the corresponding guess
- sequence number should be released. This is ugly, but
- it would require a major rewrite of antlr 1.33 to use
- some mechanism other than setjmp()/longjmp().
- The order of calls for a *successful* guess would be:
- zzUSER_GUESS_HOOK(guessSeq,0);
- zzUSER_GUESS_DONE_HOOK(guessSeq);
- The order of calls for a *failed* guess would be:
- zzUSER_GUESS_HOOK(guessSeq,0);
- zzUSER_GUESS_HOOK(guessSeq,1);
- zzUSER_GUESS_DONE_HOOK(guessSeq);
- The default definitions of these macros are empty strings.
- Here is an example in C++ mode. The zzUSER_GUESS_HOOK and
- zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine
- can be used without change in both C and C++ versions.
- ----------------------------------------------------------------------
- <<
- #include "AToken.h"
- typedef ANTLRCommonToken ANTLRToken;
- #include "DLGLexer.h"
- int main() {
- {
- DLGFileInput in(stdin);
- DLGLexer lexer(&in,2000);
- ANTLRTokenBuffer pipe(&lexer,1);
- ANTLRCommonToken aToken;
- P parser(&pipe);
- lexer.setToken(&aToken);
- parser.init();
- parser.start();
- };
- fclose(stdin);
- fclose(stdout);
- return 0;
- }
- >>
- <<
- char *s=NULL;
- #undef zzUSER_GUESS_HOOK
- #define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv);
- #undef zzUSER_GUESS_DONE_HOOK
- #define zzUSER_GUESS_DONE_HOOK(guessSeq) myGuessHook(guessSeq,2);
- void myGuessHook(int guessSeq,int zzrv) {
- if (zzrv == 0) {
- fprintf(stderr,"User hook: starting guess #%dn",guessSeq);
- } else if (zzrv == 1) {
- free (s);
- s=NULL;
- fprintf(stderr,"User hook: failed guess #%dn",guessSeq);
- } else if (zzrv == 2) {
- free (s);
- s=NULL;
- fprintf(stderr,"User hook: ending guess #%dn",guessSeq);
- };
- }
- >>
- #token A "a"
- #token "[t n]" <<skip();>>
- class P {
- start : (top)+
- ;
- top : (which) ? <<fprintf(stderr,"%s is a whichn",s); free(s); s=NULL; >>
- | other <<fprintf(stderr,"%s is an othern",s); free(s); s=NULL; >>
- ; <<if (s != NULL) free(s); s=NULL; >>
- which : which2
- ;
- which2 : which3
- ;
- which3
- : (label)? <<fprintf(stderr,"%s is a labeln",s);>>
- | (global)? <<fprintf(stderr,"%s is a globaln",s);>>
- | (exclamation)? <<fprintf(stderr,"%s is an exclamationn",s);>>
- ;
- label : <<s=strdup(LT(1)->getText());>> A ":" ;
- global : <<s=strdup(LT(1)->getText());>> A "::" ;
- exclamation : <<s=strdup(LT(1)->getText());>> A "!" ;
- other : <<s=strdup(LT(1)->getText());>> "other" ;
- }
- ----------------------------------------------------------------------
- This is a silly example, but illustrates the idea. For the input
- "a ::" with tracing enabled the output begins:
- ----------------------------------------------------------------------
- enter rule "start" depth 1
- enter rule "top" depth 2
- User hook: starting guess #1
- enter rule "which" depth 3 guessing
- enter rule "which2" depth 4 guessing
- enter rule "which3" depth 5 guessing
- User hook: starting guess #2
- enter rule "label" depth 6 guessing
- guess failed
- User hook: failed guess #2
- guess done - returning to rule "which3" at depth 5 (guess mode continues
- - an enclosing guess is still active)
- User hook: ending guess #2
- User hook: starting guess #3
- enter rule "global" depth 6 guessing
- exit rule "global" depth 6 guessing
- guess done - returning to rule "which3" at depth 5 (guess mode continues
- - an enclosing guess is still active)
- User hook: ending guess #3
- enter rule "global" depth 6 guessing
- exit rule "global" depth 6 guessing
- exit rule "which3" depth 5 guessing
- exit rule "which2" depth 4 guessing
- exit rule "which" depth 3 guessing
- guess done - returning to rule "top" at depth 2 (guess mode ends)
- User hook: ending guess #1
- enter rule "which" depth 3
- .....
- ----------------------------------------------------------------------
- Remember:
- (a) Only init-actions are executed during guess mode.
- (b) A rule can be invoked multiple times during guess mode.
- (c) If the guess succeeds the rule will be called once more
- without guess mode so that normal actions will be executed.
- This means that the init-action might need to distinguish
- between guess mode and non-guess mode using the variable
- [zz]guessing.
- #107. (Changed in 1.33MR10) construction of ASTs in guess mode
- Prior to 1.33MR10, when using automatic AST construction in C++
- mode for a rule, an AST would be constructed for elements of the
- rule even while in guess mode. In MR10 this no longer occurs.
- #106. (Changed in 1.33MR10) guess variable confusion
- In C++ mode a guess which failed always restored the parser state
- using zzGUESS_DONE as part of zzGUESS_FAIL. Prior to 1.33MR10,
- C mode required an explicit call to zzGUESS_DONE after the
- call to zzGUESS_FAIL.
- Consider:
- rule : (alpha)? beta
- | ...
- ;
- The generated code resembles:
- zzGUESS
- if (!zzrv && LA(1)==ID) { <==== line #1
- alpha
- zzGUESS_DONE
- beta
- } else {
- if (! zzrv) zzGUESS_DONE <==== line #2a
- ....
- However, in some cases line #2 was rendered:
- if (guessing) zzGUESS_DONE <==== line #2b
- This would work for simple test cases, but would fail in
- some cases where there was a guess while another guess was active.
- One kind of failure would be to match up the zzGUESS_DONE at line
- #2b with the "outer" guess which was still active. The outer
- guess would "succeed" when only the inner guess should have
- succeeded.
- In 1.33MR10 the behavior of zzGUESS and zzGUESS_FAIL in C and
- and C++ mode should be identical.
- The same problem appears in 1.33 vanilla in some places. For
- example:
- start : { (sub)? } ;
- or:
- start : (
- B
- | ( sub )?
- | C
- )+
- ;
- generates incorrect code.
- The general principle is:
- (a) use [zz]guessing only when deciding between a call to zzFAIL
- or zzGUESS_FAIL
- (b) use zzrv in all other cases
- This problem was discovered while testing changes to item #105.
- I believe this is now fixed. My apologies.
- #105. (Changed in 1.33MR10) guess block as single alt of (...)+
- Prior to 1.33MR10 the following constructs:
- rule_plus : (
- (sub)?
- )+
- ;
- rule_star : (
- (sub)?
- )*
- ;
- generated incorrect code for the guess block (which could result
- in runtime errors) because of an incorrect optimization of a
- block with only a single alternative.
- The fix caused some changes to the fix described in Item #49
- because there are now three code generation sequences for (...)+
- blocks containing a guess block:
- a. single alternative which is a guess block
- b. multiple alternatives in which the last is a guess block
- c. all other cases
- Forms like "rule_star" can have unexpected behavior when there
- is a syntax error: if the subrule "sub" is not matched *exactly*
- then "rule_star" will consume no tokens.
- Reported by Esa Pulkkinen (esap@cs.tut.fi).
- #104. (Changed in 1.33MR10) -o option for dlg
- There was problem with the code added by item #74 to handle the
- -o option of dlg. This should fix it.
- #103. (Changed in 1.33MR10) ANDed semantic predicates
- Rescinded.
- The optimization was a mistake.
- The resulting problem is described in Item #150.
- #102. (Changed in 1.33MR10) allow "class parser : .... {"
- The syntax of the class statement ("class parser-name {")
- has been extended to allow for the specification of base
- classes. An arbirtrary number of tokens may now appear
- between the class name and the "{". They are output
- again when the class declaration is generated. For
- example:
- class Parser : public MyBaseClassANTLRparser {
- This was suggested by a user, but I don't have a record
- of who it was.
- #101. (Changed in 1.33MR10) antlr -info command line switch
- -info
- p - extra predicate information in generated file
- t - information about tnode use:
- at the end of each rule in generated file
- summary on stderr at end of program
- m - monitor progress
- prints name of each rule as it is started
- flushes output at start of each rule
- f - first/follow set information to stdout
- 0 - no operation (added in 1.33MR11)
- The options may be combined and may appear in any order.
- For example:
- antlr -info ptm -CC -gt -mrhoist on mygrammar.g
- #100a. (Changed in 1.33MR10) Predicate tree simplification
- When the same predicates can be referenced in more than one
- alternative of a block large predicate trees can be formed.
- The difference that these optimizations make is so dramatic
- that I have decided to use it even when -mrhoist is not selected.
- Consider the following grammar:
- start : ( all )* ;
- all : a
- | d
- | e
- | f
- ;
- a : c A B
- | c A C
- ;
- c : <<AAA(LATEXT(2))>>?
- ;
- d : <<BBB(LATEXT(2))>>? B C
- ;
- e : <<CCC(LATEXT(2))>>? B C
- ;
- f : e X Y
- ;
- In rule "a" there is a reference to rule "c" in both alternatives.
- The length of the predicate AAA is k=2 and it can be followed in
- alternative 1 only by (A B) while in alternative 2 it can be
- followed only by (A C). Thus they do not have identical context.
- In rule "all" the alternatives which refer to rules "e" and "f" allow
- elimination of the duplicate reference to predicate CCC.
- The table below summarized the kind of simplification performed by
- 1.33MR10. In the table, X and Y stand for single predicates
- (not trees).
- (OR X (OR Y (OR Z))) => (OR X Y Z)
- (AND X (AND Y (AND Z))) => (AND X Y Z)
- (OR X (... (OR X Y) ... )) => (OR X (... Y ... ))
- (AND X (... (AND X Y) ... )) => (AND X (... Y ... ))
- (OR X (... (AND X Y) ... )) => (OR X (... ... ))
- (AND X (... (OR X Y) ... )) => (AND X (... ... ))
- (AND X) => X
- (OR X) => X
- In a test with a complex grammar for a real application, a predicate
- tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)".
- In 1.33MR10 there is a greater effort to release memory used
- by predicates once they are no longer in use.
- #100b. (Changed in 1.33MR1) Suppression of extra predicate tests
- The following optimizations require that -mrhoist be selected.
- It is relatively easy to optimize the code generated for predicate
- gates when they are of the form:
- (AND X Y Z ...)
- or (OR X Y Z ...)
- where X, Y, Z, and "..." represent individual predicates (leaves) not
- predicate trees.
- If the predicate is an AND the contexts of the X, Y, Z, etc. are
- ANDed together to create a single Tree context for the group and
- context tests for the individual predicates are suppressed:
- Optimization 1: (AND X Y Z ...)
- Suppose the context for Xtest is LA(1)==LP and the context for
- Ytest is LA(1)==LP && LA(2)==ID.
- Without the optimization the code would resemble:
- if (lookaheadContext &&
- !(LA(1)==LP && LA(1)==LP && LA(2)==ID) ||
- ( (! LA(1)==LP || Xtest) &&
- (! (LA(1)==LP || LA(2)==ID) || Xtest)
- )) {...
- With the -mrhoist optimization the code would resemble:
- if (lookaheadContext &&
- ! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {...
- Optimization 2: (OR X Y Z ...) with identical contexts
- Suppose the context for Xtest is LA(1)==ID and for Ytest
- the context is also LA(1)==ID.
- Without the optimization the code would resemble:
- if (lookaheadContext &&
- ! (LA(1)==ID || LA(1)==ID) ||
- (LA(1)==ID && Xtest) ||
- (LA(1)==ID && Ytest) {...
- With the -mrhoist optimization the code would resemble:
- if (lookaheadContext &&
- (! LA(1)==ID) || (Xtest || Ytest) {...
- Optimization 3: (OR X Y Z ...) with distinct contexts
- Suppose the context for Xtest is LA(1)==ID and for Ytest
- the context is also LA(1)==LP.
- Without the optimization the code would resemble:
- if (lookaheadContext &&
- ! (LA(1)==ID || LA(1)==LP) ||
- (LA(1)==ID && Xtest) ||
- (LA(1)==LP && Ytest) {...
- With the -mrhoist optimization the code would resemble:
- if (lookaheadContext &&
- (zzpf=0,
- (LA(1)==ID && (zzpf=1) && Xtest) ||
- (LA(1)==LP && (zzpf=1) && Ytest) ||
- !zzpf) {
- These may appear to be of similar complexity at first,
- but the non-optimized version contains two tests of each
- context while the optimized version contains only one
- such test, as well as eliminating some of the inverted
- logic (" !(...) || ").
- Optimization 4: Computation of predicate gate trees
- When generating code for the gates of predicate expressions
- antlr 1.33 vanilla uses a recursive procedure to generate
- "&&" and "||" expressions for testing the lookahead. As each
- layer of the predicate tree is exposed a new set of "&&" and
- "||" expressions on the lookahead are generated. In many
- cases the lookahead being tested has already been tested.
- With -mrhoist a lookahead tree is computed for the entire
- lookahead expression. This means that predicates with identical
- context or context which is a subset of another predicate's
- context disappear.
- This is especially important for predicates formed by rules
- like the following:
- uppperCaseVowel : <<isUpperCase(LATEXT(1))>>? vowel;
- vowel: : <<isVowel(LATEXT(1))>>? LETTERS;
- These predicates are combined using AND since both must be
- satisfied for rule upperCaseVowel. They have identical
- context which makes this optimization very effective.
- The affect of Items #100a and #100b together can be dramatic. In
- a very large (but real world) grammar one particular predicate
- expression was reduced from an (unreadable) 50 predicate leaves,
- 195 LA(1) terms, and 5500 characters to an (easily comprehensible)
- 3 predicate leaves (all different) and a *single* LA(1) term.
- #99. (Changed in 1.33MR10) Code generation for expression trees
- Expression trees are used for k>1 grammars and predicates with
- lookahead depth >1. This optimization must be enabled using
- "-mrhost on". (Clarification added for 1.33MR11).
- In the processing of expression trees, antlr can generate long chains
- of token comparisons. Prior to 1.33MR10 there were many redundant
- parenthesis which caused problems for compilers which could handle
- expressions of only limited complexity. For example, to test an
- expression tree (root R A B C D), antlr would generate something
- resembling:
- (LA(1)==R && (LA(2)==A || (LA(2)==B || (LA(2)==C || LA(2)==D)))))
- If there were twenty tokens to test then there would be twenty
- parenthesis at the end of the expression.
- In 1.33MR10 the generated code for tree expressions resembles:
- (LA(1)==R && (LA(2)==A || LA(2)==B || LA(2)==C || LA(2)==D))
- For "complex" expressions the output is indented to reflect the LA
- number being tested:
- (LA(1)==R
- && (LA(2)==A || LA(2)==B || LA(2)==C || LA(2)==D
- || LA(2)==E || LA(2)==F)
- || LA(1)==S
- && (LA(2)==G || LA(2)==H))
- Suggested by S. Bochnak (S.Bochnak@@microTool.com.pl),
- #98. (Changed in 1.33MR10) Option "-info p"
- When the user selects option "-info p" the program will generate
- detailed information about predicates. If the user selects
- "-mrhoist on" additional detail will be provided explaining
- the promotion and suppression of predicates. The output is part
- of the generated file and sandwiched between #if 0/#endif statements.
- Consider the following k=1 grammar:
- start : ( all ) * ;
- all : ( a
- | b
- )
- ;
- a : c B
- ;
- c : <<LATEXT(1)>>?
- | B
- ;
- b : <<LATEXT(1)>>? X
- ;
- Below is an excerpt of the output for rule "start" for the three
- predicate options (off, on, and maintenance release style hoisting).
- For those who do not wish to use the "-mrhost on" option for code
- generation the option can be used in a "diagnostic" mode to provide
- valuable information:
- a. where one should insert null actions to inhibit hoisting
- b. a chain of rule references which shows where predicates are
- being hoisted
- ======================================================================
- Example of "-info p" with "-mrhoist on"
- ======================================================================
- #if 0
- Hoisting of predicate suppressed by alternative without predicate.
- The alt without the predicate includes all cases where the
- predicate is false.
- WITH predicate: line 11 v36.g
- WITHOUT predicate: line 12 v36.g
- The context set for the predicate:
- B
- The lookahead set for alt WITHOUT the semantic predicate:
- B
- The predicate:
- pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g
- set context:
- B
- tree context: null
- Chain of referenced rules:
- #0 in rule start (line 1 v36.g) to rule all
- #1 in rule all (line 3 v36.g) to rule a
- #2 in rule a (line 8 v36.g) to rule c
- #3 in rule c (line 11 v36.g)
- #endif
- &&
- #if 0
- pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g
- set context:
- X
- tree context: null
- #endif
- ======================================================================
- Example of "-info p" with the default -prc setting ( "-prc off")
- ======================================================================
- #if 0
- OR
- pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g
- set context:
- nil
- tree context: null
- pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g
- set context:
- nil
- tree context: null
- #endif
- ======================================================================
- Example of "-info p" with "-prc on" and "-mrhoist off"
- ======================================================================
- #if 0
- OR
- pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g
- set context:
- B
- tree context: null
- pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g
- set context:
- X
- tree context: null
- #endif
- ======================================================================
- #97. (Fixed in 1.33MR10) "Predicate applied for more than one ... "
- In 1.33 vanilla, the grammar listed below produced this message for
- the first alternative (only) of rule "b":
- warning: predicate applied for >1 lookahead 1-sequences
- [you may only want one lookahead 1-sequence to apply.
- Try using a context guard '(...)? =>'
- In 1.33MR10 the message is issued for both alternatives.
- top : (a)*;
- a : b | c ;
- b : <<PPP(LATEXT(1))>>? ( AAA | BBB )
- | <<QQQ(LATEXT(1))>>? ( XXX | YYY )
- ;
- c : AAA | XXX;
- #96. (Fixed in 1.33MR10) Guard predicates ignored when -prc off
- Prior to 1.33MR10, guard predicate code was not generated unless
- "-prc on" was selected.
- This was incorrect, since "-prc off" (the default) is supposed to
- disable only AUTOMATIC computation of predicate context, not the
- programmer specified context supplied by guard predicates.
- #95. (Fixed in 1.33MR10) Predicate guard context length was k, not max(k,ck)
- Prior to 1.33MR10, predicate guards were computed to k tokens rather
- than max(k,ck). Consider the following grammar:
- a : ( A B C)? => <<AAA(LATEXT(1))>>? (A|X) (B|Y) (C|Z) ;
- The code generated by 1.33 vanilla with "-k 1 -ck 3 -prc on"
- for the predicate in "a" resembles:
- if ( (! LA(1)==A) || AAA(LATEXT(1))) {...
- With 1.33MR10 and the same options the code resembles:
- if ( (! (LA(1)==A && LA(2)==B && LA(3)==C) || AAA(LATEXT(1))) {...
- #94. (Fixed in 1.33MR10) Predicates followed by rule references
- Prior to 1.33MR10, a semantic predicate which referenced a token
- which was off the end of the rule caused an incomplete context
- to be computed (with "-prc on") for the predicate under some circum-
- stances. In some cases this manifested itself as illegal C code
- (e.g. "LA(2)==[Ep](1)" in the k=2 examples below:
- all : ( a ) *;
- a : <<AAA(LATEXT(2))>>? ID X
- | <<BBB(LATEXT(2))>>? Y
- | Z
- ;
- This might also occur when the semantic predicate was followed
- by a rule reference which was shorter than the length of the
- semantic predicate:
- all : ( a ) *;
- a : <<AAA(LATEXT(2))>>? ID X
- | <<BBB(LATEXT(2))>>? y
- | Z
- ;
- y : Y ;
- Depending on circumstance, the resulting context might be too
- generous because it was too short, or too restrictive because
- of missing alternatives.
- #93. (Changed in 1.33MR10) Definition of Purify macro
- Ofer Ben-Ami (gremlin@cs.huji.ac.il) has supplied a definition
- for the Purify macro:
- #define PURIFY(r, s) memset((char *) &(r), ' ', (s));
- Note: This may not be the right thing to do for C++ objects that
- have constructors. Reported by Bonny Rais (bonny@werple.net.au).
- For those cases one should #define PURIFY to an empty macro in the
- #header or #first actions.
- #92. (Fixed in 1.33MR10) Guarded predicates and hoisting
- When a guarded predicate participates in hoisting it is linked into
- a predicate expression tree. Prior to 1.33MR10 this link was never
- cleared and the next time the guard was used to construct a new
- tree the link could contain a spurious reference to another element
- which had previosly been joined to it in the semantic predicate tree.
- For example:
- start : ( all ) *;
- all : ( a | b ) ;
- start2 : ( all2 ) *;
- all2 : ( a ) ;
- a : (A)? => <<AAA(LATEXT(1))>>? A ;
- b : (B)? => <<BBB(LATEXT(1))>>? B ;
- Prior to 1.33MR10 the code for "start2" would include a spurious
- reference to the BBB predicate which was left from constructing
- the predicate tree for rule "start" (i.e. or(AAA,BBB) ).
- In 1.33MR10 this problem is avoided by cloning the original guard
- each time it is linked into a predicate tree.
- #91. (Changed in 1.33MR10) Extensive changes to semantic pred hoisting
- ============================================
- This has been rendered obsolete by Item #117
- ============================================
- #90. (Fixed in 1.33MR10) Semantic pred with LT(i) and i>max(k,ck)
- There is a bug in antlr 1.33 vanilla and all maintenance releases
- prior to 1.33MR10 which allows semantic predicates to reference
- an LT(i) or LATEXT(i) where i is larger than max(k,ck). When
- this occurs antlr will attempt to mark the ith element of an array
- in which there are only max(k,ck) elements. The result cannot
- be predicted.
- Using LT(i) or LATEXT(i) for i>max(k,ck) is reported as an error
- in 1.33MR10.
- #89. Rescinded
- #88. (Fixed in 1.33MR10) Tokens used in semantic predicates in guess mode
- Consider the behavior of a semantic predicate during guess mode:
- rule : a:A (
- <<test($a)>>? b:B
- | c:C
- );
- Prior to MR10 the assignment of the token or attribute to
- $a did not occur during guess mode, which would cause the
- semantic predicate to misbehave because $a would be null.
- In 1.33MR10 a semantic predicate with a reference to an
- element label (such as $a) forces the assignment to take
- place even in guess mode.
- In order to work, this fix REQUIRES use of the $label format
- for token pointers and attributes referenced in semantic
- predicates.
- The fix does not apply to semantic predicates using the
- numeric form to refer to attributes (e.g. <<test($1)>>?).
- The user will receive a warning for this case.
- Reported by Rob Trout (trout@mcs.cs.kent.edu).
- #87. (Fixed in 1.33MR10) Malformed guard predicates
- Context guard predicates may contain only references to
- tokens. They may not contain references to (...)+ and
- (...)* blocks. This is now checked. This replaces the
- fatal error message in item #78 with an appropriate
- (non-fatal) error messge.
- In theory, context guards should be allowed to reference
- rules. However, I have not had time to fix this.
- Evaluation of the guard takes place before all rules have
- been read, making it difficult to resolve a forward reference
- to rule "zzz" - it hasn't been read yet ! To postpone evaluation
- of the guard until all rules have been read is too much
- for the moment.
- #86. (Fixed in 1.33MR10) Unequal set size in set_sub
- Routine set_sub() in pccts/support/set/set.h did not work
- correctly when the sets were of unequal sizes. Rewrote
- set_equ to make it simpler and remove unnecessary and
- expensive calls to set_deg(). This routine was not used
- in 1.33 vanila.
- #85. (Changed in 1.33MR10) Allow redefinition of MaxNumFiles
- Raised the maximum number of input files to 99 from 20.
- Put a #ifndef/#endif around the "#define MaxNumFiles 99".
- #84. (Fixed in 1.33MR10) Initialize zzBadTok in macro zzRULE
- Initialize zzBadTok to NULL in zzRULE macro of AParser.h.
- in order to get rid of warning messages.
- #83. (Fixed in 1.33MR10) False warnings with -w2 for #tokclass
- When -w2 is selected antlr gives inappropriate warnings about
- #tokclass names not having any associated regular expressions.
- Since a #tokclass is not a "real" token it will never have an
- associated regular expression and there should be no warning.
- Reported by Derek Pappas (derek.pappas@eng.sun.com)
- #82. (Fixed in 1.33MR10) Computation of follow sets with multiple cycles
- Reinier van den Born (reinier@vnet.ibm.com) reported a problem
- in the computation of follow sets by antlr. The problem (bug)
- exists in 1.33 vanilla and all maintenance releases prior to 1.33MR10.
- The problem involves the computation of follow sets when there are
- cycles - rules which have mutual references. I believe the problem
- is restricted to cases where there is more than one cycle AND
- elements of those cycles have rules in common. Even when this
- occurs it may not affect the code generated - but it might. It
- might also lead to undetected ambiguities.
- There were no changes in antlr or dlg output from the revised version.
- The following fragment demonstates the problem by giving different
- follow sets (option -pa) for var_access when built with k=1 and ck=2 on
- 1.33 vanilla and 1.33MR10:
- echo_statement : ECHO ( echo_expr )*
- ;
- echo_expr : ( command )?
- | expression
- ;
- command : IDENTIFIER
- { concat }
- ;
- expression : operand ( OPERATOR operand )*
- ;
- operand : value
- | START command END
- ;
- value : concat
- | TYPE operand
- ;
- concat : var_access { CONCAT value }
- ;
- var_access : IDENTIFIER { INDEX }
- ;
- #81. (Changed in 1.33MR10) C mode use of attributes and ASTs
- Reported by Isaac Clark (irclark@mindspring.com).
- C mode code ignores attributes returned by rules which are
- referenced using element labels when ASTs are enabled (-gt option).
- 1. start : r:rule t:Token <<$start=$r;>>
- The $r refrence will not work when combined with
- the -gt option.
- 2. start : t:Token <<$start=$t;>>
- The $t reference works in all cases.
- 3. start : rule <<$0=$1;>>
- Numeric labels work in all cases.
- With MR10 the user will receive an error message for case 1 when
- the -gt option is used.
- #80. (Fixed in 1.33MR10) (...)? as last alternative of block
- A construct like the following:
- rule : a
- | (b)?
- ;
- does not make sense because there is no alternative when
- the guess block fails. This is now reported as a warning
- to the user.
- Previously, there was a code generation error for this case:
- the guess block was not "closed" when the guess failed.
- This could cause an infinite loop or other problems. This
- is now fixed.
- Example problem:
- #header<<
- #include <stdio.h>
- #include "charptr.h"
- >>
- <<
- #include "charptr.c"
- main ()
- {
- ANTLR(start(),stdin);
- }
- >>
- #token "[ t]+" << zzskip(); >>
- #token "[n]" << zzline++; zzskip(); >>
- #token Word "[a-z]+"
- #token Number "[0-9]+"
- start : (test1)?
- | (test2)?
- ;
- test1 : (Word Word Word Word)?
- | (Word Word Word Number)?
- ;
- test2 : (Word Word Number Word)?
- | (Word Word Number Number)?
- ;
- Test data which caused infinite loop:
- a 1 a a
- #79. (Changed in 1.33MR10) Use of -fh with multiple parsers
- Previously, antlr always used the pre-processor symbol
- STDPCCTS_H as a gate for the file stdpccts.h. This
- caused problems when there were multiple parsers defined
- because they used the same gate symbol.
- In 1.33MR10, the -fh filename is used to generate the
- gate file for stdpccts.h. For instance:
- antlr -fh std_parser1.h
- generates the pre-processor symbol "STDPCCTS_std_parser1_H".
- Reported by Ramanathan Santhanam (ps@kumaran.com).
- #78. (Changed in 1.33MR9) Guard predicates that refer to rules
- ------------------------
- Please refer to Item #87
- ------------------------
- Guard predicates are processed during an early phase
- of antlr (during parsing) before all data structures
- are completed.
- There is an apparent bug in earlier versions of 1.33
- which caused guard predicates which contained references
- to rules (rather than tokens) to reference a structure
- which hadn't yet been initialized.
- In some cases (perhaps all cases) references to rules
- in guard predicates resulted in the use of "garbage".
- #79. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com)
- Previously, the maximum length file name was set
- arbitrarily to 300 characters in antlr, dlg, and sorcerer.
- The config.h file now attempts to define the maximum length
- filename using _MAX_PATH from stdlib.h before falling back
- to using the value 300.
- #78. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com)
- Put #ifndef/#endif around definition of ZZLEXBUFSIZE in
- antlr.
- #77. (Changed in 1.33MR9) Arithmetic overflow for very large grammars
- In routine HandleAmbiguities() antlr attempts to compute the
- number of possible elements in a set that is order of
- number-of-tokens raised to the number-of-lookahead-tokens power.
- For large grammars or large lookahead (e.g. -ck 7) this can
- cause arithmetic overflow.
- With 1.33MR9, arithmetic overflow in this computation is reported
- the first time it happens. The program continues to run and
- the program branches based on the assumption that the computed
- value is larger than any number computed by counting actual cases
- because 2**31 is larger than the number of bits in most computers.
- Before 1.33MR9 overflow was not reported. The behavior following
- overflow is not predictable by anyone but the original author.
- NOTE
- In 1.33MR10 the warning message is suppressed.
- The code which detects the overflow allows the
- computation to continue without an error. The
- error message itself made made users worry.
- #76. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com)
- Jeff Vincent has convinced me to make ANTLRCommonToken and
- ANTLRCommonNoRefCountToken use variable length strings
- allocated from the heap rather than fixed length strings.
- By suitable definition of setText(), the copy constructor,
- and operator =() it is possible to maintain "copy" semantics.
- By "copy" semantics I mean that when a token is copied from
- an existing token it receives its own, distinct, copy of the
- text allocated from the heap rather than simply a pointer
- to the original token's text.
- ============================================================
- W * A * R * N * I * N * G
- ============================================================
- It is possible that this may cause problems for some users.
- For those users I have included the old version of AToken.h as
- pccts/h/AToken_traditional.h.
- #75. (Changed in 1.33MR9) Bruce Guenter (bruceg@qcc.sk.ca)
- Make DLGStringInput const correct. Since this is infrequently
- subclassed, it should affect few users, I hope.
- #74. (Changed in 1.33MR9) -o (output directory) option
- Antlr does not properly handle the -o output directory option
- when the filename of the grammar contains a directory part. For
- example:
- antlr -o outdir pccts_src/myfile.g
- causes antlr create a file called "outdir/pccts_src/myfile.cpp.
- It SHOULD create outdir/myfile.cpp
- The suggested code fix has been installed in antlr, dlg, and
- Sorcerer.
- #73. (Changed in 1.33MR9) Hoisting of semantic predicates and -mrhoist
- ============================================
- This has been rendered obsolete by Item #117
- ============================================
- #72. (Changed in 1.33MR9) virtual saveState()/restoreState()/guess_XXX
- The following methods in ANTLRParser were made virtual at
- the request of S. Bochnak (S.Bochnak@microTool.com.pl):
- saveState() and restoreState()
- guess(), guess_fail(), and guess_done()
- #71. (Changed in 1.33MR9) Access to omitted command line argument
- If a switch requiring arguments is the last thing on the
- command line, and the argument is omitted, antlr would core.
- antlr test.g -prc
- instead of
- antlr test.g -prc off
- #70. (Changed in 1.33MR9) Addition of MSVC .dsp and .mak build files
- The following MSVC .dsp and .mak files for pccts and sorcerer
- were contributed by Stanislaw Bochnak (S.Bochnak@microTool.com.pl)
- and Jeff Vincent (JVincent@novell.com)
- PCCTS Distribution Kit
- ----------------------
- pccts/PCCTSMSVC50.dsw
- pccts/antlr/AntlrMSVC50.dsp
- pccts/antlr/AntlrMSVC50.mak
- pccts/dlg/DlgMSVC50.dsp
- pccts/dlg/DlgMSVC50.mak
- pccts/support/msvc.dsp
- Sorcerer Distribution Kit
- -------------------------
- pccts/sorcerer/SorcererMSVC50.dsp
- pccts/sorcerer/SorcererMSVC50.mak
- pccts/sorcerer/lib/msvc.dsp
- #69. (Changed in 1.33MR9) Change "unsigned int" to plain "int"
- Declaration of max_token_num in misc.c as "unsigned int"
- caused comparison between signed and unsigned ints giving
- warning message without any special benefit.
- #68. (Changed in 1.33MR9) Add void return for dlg internal_error()
- Get rid of "no return value" message in internal_error()
- in file dlg/support.c and dlg/dlg.h.
- #67. (Changed in Sor) sor.g: lisp() has no return value
- Added a "void" for the return type.
- #66. (Added to Sor) sor.g: ZZLEXBUFSIZE enclosed in #ifndef/#endif
- A user needed to be able to change the ZZLEXBUFSIZE for
- sor. Put the definition of ZZLEXBUFSIZE inside #ifndef/#endif
- #65. (Changed in 1.33MR9) PCCTSAST::deepCopy() and ast_dup() bug
- Jeff Vincent (JVincent@novell.com) found that deepCopy()
- made new copies of only the direct descendents. No new
- copies were made of sibling nodes, Sibling pointers are
- set to zero by shallowCopy().
- PCCTS_AST::deepCopy() has been changed to make a
- deep copy in the traditional sense.
- The deepCopy() routine depends on the behavior of
- shallowCopy(). In all sor examples I've found,
- shallowCopy() zeroes the right and down pointers.
- Original Tree Original deepCopy() Revised deepCopy
- ------------- ------------------- ----------------
- a->b->c A A
- | | |
- d->e->f D D->E->F
- | | |
- g->h->i G G->H->I
- | |
- j->k J->K
- While comparing deepCopy() for C++ mode with ast_dup for
- C mode I found a problem with ast_dup().
- Routine ast_dup() has been changed to make a deep copy
- in the traditional sense.
- Original Tree Original ast_dup() Revised ast_dup()
- ------------- ------------------- ----------------
- a->b->c A->B->C A
- | | |
- d->e->f D->E->F D->E->F
- | | |
- g->h->i G->H->I G->H->I
- | | |
- j->k J->K J->K
- I believe this affects transform mode sorcerer programs only.
- #64. (Changed in 1.33MR9) anltr/hash.h prototype for killHashTable()
- #63. (Changed in 1.33MR8) h/charptr.h does not zero pointer after free
- The charptr.h routine now zeroes the pointer after free().
- Reported by Jens Tingleff (jensting@imaginet.fr)
- #62. (Changed in 1.33MR8) ANTLRParser::resynch had static variable
- The static variable "consumed" in ANTLRParser::resynch was
- changed into an instance variable of the class with the
- name "resynchConsumed".
- Reported by S.Bochnak@microTool.com.pl
- #61. (Changed in 1.33MR8) Using rule>[i,j] when rule has no return values
- Previously, the following code would cause antlr to core when
- it tried to generate code for rule1 because rule2 had no return
- values ("upward inheritance"):
- rule1 : <<int i; int j>>
- rule2 > [i,j]
- ;
- rule2 : Anything ;
- Reported by S.Bochnak@microTool.com.pl
- Verified correct operation of antlr MR8 when missing or extra
- inheritance arguments for all combinations. When there are
- missing or extra arguments code will still be generated even
- though this might cause the invocation of a subroutine with
- the wrong number of arguments.
- #60. (Changed in 1.33MR7) Major changes to exception handling
- There were significant problems in the handling of exceptions
- in 1.33 vanilla. The general problem is that it can only
- process one level of exception handler. For example, a named
- exception handler, an exception handler for an alternative, or
- an exception for a subrule always went to the rule's exception
- handler if there was no "catch" which matched the exception.
- In 1.33MR7 the exception handlers properly "nest". If an
- exception handler does not have a matching "catch" then the
- nextmost outer exception handler is checked for an appropriate
- "catch" clause, and so on until an exception handler with an
- appropriate "catch" is found.
- There are still undesirable features in the way exception
- handlers are implemented, but I do not have time to fix them
- at the moment:
- The exception handlers for alternatives are outside the
- block containing the alternative. This makes it impossible
- to access variables declared in a block or to resume the
- parse by "falling through". The parse can still be easily
- resumed in other ways, but not in the most natural fashion.
- This results in an inconsistentcy between named exception
- handlers and exception handlers for alternatives. When
- an exception handler for an alternative "falls through"
- it goes to the nextmost outer handler - not the "normal
- action".
- A major difference between 1.33MR7 and 1.33 vanilla is
- the default action after an exception is caught:
- 1.33 Vanilla
- ------------
- In 1.33 vanilla the signal value is set to zero ("NoSignal")
- and the code drops through to the code following the exception.
- For named exception handlers this is the "normal action".
- For alternative exception handlers this is the rule's handler.
- 1.33MR7
- -------
- In 1.33MR7 the signal value is NOT automatically set to zero.
- There are two cases:
- For named exception handlers: if the signal value has been
- set to zero the code drops through to the "normal action".
- For all other cases the code branches to the nextmost outer
- exception handler until it reaches the handler for the rule.
- The following macros have been defined for convenience:
- C/C++ Mode Name
- --------------------
- (zz)suppressSignal
- set signal & return signal arg to 0 ("NoSignal")
- (zz)setSignal(intValue)
- set signal & return signal arg to some value
- (zz)exportSignal
- copy the signal value to the return signal arg
- I'm not sure why PCCTS make a distinction between the local
- signal value and the return signal argument, but I'm loathe
- to change the code. The burden of copying the local signal
- value to the return signal argument can be given to the
- default signal handler, I suppose.
- #59. (Changed in 1.33MR7) Prototypes for some functions
- Added prototypes for the following functions to antlr.h
- zzconsumeUntil()
- zzconsumeUntilToken()
- #58. (Changed in 1.33MR7) Added defintion of zzbufsize to dlgauto.h
- #57. (Changed in 1.33MR7) Format of #line directive
- Previously, the -gl directive for line 1234 would
- resemble: "# 1234 filename.g". This caused problems
- for some compilers/pre-processors. In MR7 it generates
- "#line 1234 filename.g".
- #56. (Added in 1.33MR7) Jan Mikkelsen <janm@zeta.org.au>
- Move PURIFY macro invocaton to after rule's init action.
- #55. (Fixed in 1.33MR7) Unitialized variables in ANTLRParser
- Member variables inf_labase and inf_last were not initialized.
- (See item #50.)
- #54. (Fixed in 1.33MR6) Brad Schick (schick@interacess.com)
- Previously, the following constructs generated the same
- code:
- rule1 : (A B C)?
- | something-else
- ;
- rule2 : (A B C)? ()
- | something-else
- ;
- In all versions of pccts rule1 guesses (A B C) and then
- consume all three tokens if the guess succeeds. In MR6
- rule2 guesses (A B C) but consumes NONE of the tokens
- when the guess succeeds because "()" matches epsilon.
- #53. (Explanation for 1.33MR6) What happens after an exception is caught ?
- The Book is silent about what happens after an exception
- is caught.
- The following code fragment prints "Error Action" followed
- by "Normal Action".
- test : Word ex:Number <<printf("Normal Actionn");>>
- exception[ex]
- catch NoViableAlt:
- <<printf("Error Actionn");>>
- ;
- The reason for "Normal Action" is that the normal flow of the
- program after a user-written exception handler is to "drop through".
- In the case of an exception handler for a rule this results in
- the exection of a "return" statement. In the case of an
- exception handler attached to an alternative, rule, or token
- this is the code that would have executed had there been no
- exception.
- The user can achieve the desired result by using a "return"
- statement.
- test : Word ex:Number <<printf("Normal Actionn");>>
- exception[ex]
- catch NoViableAlt:
- <<printf("Error Actionn"); return;>>
- ;
- The most powerful mechanism for recovery from parse errors
- in pccts is syntactic predicates because they provide
- backtracking. Exceptions allow "return", "break",
- "consumeUntil(...)", "goto _handler", "goto _fail", and
- changing the _signal value.
- #52. (Fixed in 1.33MR6) Exceptions without syntactic predicates
- The following generates bad code in 1.33 if no syntactic
- predicates are present in the grammar.
- test : Word ex:Number <<printf("Normal Actionn");>>
- exception[ex]
- catch NoViableAlt:
- <<printf("Error Actionn");>>
- There is a reference to a guess variable. In C mode
- this causes a compiler error. In C++ mode it generates
- an extraneous check on member "guessing".
- In MR6 correct code is generated for both C and C++ mode.
- #51. (Added to 1.33MR6) Exception operator "@" used without exceptions
- In MR6 added a warning when the exception operator "@" is
- used and no exception group is defined. This is probably
- a case where "@" or "@" is meant.
- #50. (Fixed in 1.33MR6) Gunnar Rxnning (gunnar@candleweb.no)
- http://www.candleweb.no/~gunnar/
- Routines zzsave_antlr_state and zzrestore_antlr_state don't
- save and restore all the data needed when switching states.
- Suggested patch applied to antlr.h and err.h for MR6.
- #49. (Fixed in 1.33MR6) Sinan Karasu (sinan@boeing.com)
- Generated code failed to turn off guess mode when leaving a
- (...)+ block which contained a guess block. The result was
- an infinite loop. For example:
- rule : (
- (x)?
- | y
- )+
- Suggested code fix implemented in MR6. Replaced
- ... else if (zzcnt>1) break;
- with:
- C++ mode:
- ... else if (zzcnt>1) {if (!zzrv) zzGUESS_DONE; break;};
- C mode:
- ... else if (zzcnt>1) {if (zzguessing) zzGUESS_DONE; break;};
- #48. (Fixed in 1.33MR6) Invalid exception element causes core
- A label attached to an invalid construct can cause
- pccts to crash while processing the exception associated
- with the label. For example:
- rule : t:(B C)
- exception[t] catch MismatchedToken: <<printf(...);>>
- Version MR6 generates the message:
- reference in exception handler to undefined label 't'
- #47. (Fixed in 1.33MR6) Manuel Ornato
- Under some circumstances involving a k >1 or ck >1
- grammar and a loop block (i.e. (...)* ) pccts will
- fail to detect a syntax error and loop indefinitely.
- The problem did not exist in 1.20, but has existed
- from 1.23 to the present.
- Fixed in MR6.
- ---------------------------------------------------
- Complete test program
- ---------------------------------------------------
- #header<<
- #include <stdio.h>
- #include "charptr.h"
- >>
- <<
- #include "charptr.c"
- main ()
- {
- ANTLR(global(),stdin);
- }
- >>
- #token "[ t]+" << zzskip(); >>
- #token "[n]" << zzline++; zzskip(); >>
- #token B "b"
- #token C "c"
- #token D "d"
- #token E "e"
- #token LP "("
- #token RP ")"
- #token ANTLREOF "@"
- global : (
- (E liste)
- | liste
- | listed
- ) ANTLREOF
- ;
- listeb : LP ( B ( B | C )* ) RP ;
- listec : LP ( C ( B | C )* ) RP ;
- listed : LP ( D ( B | C )* ) RP ;
- liste : ( listeb | listec )* ;
- ---------------------------------------------------
- Sample data causing infinite loop
- ---------------------------------------------------
- e (d c)
- ---------------------------------------------------
- #46. (Fixed in 1.33MR6) Robert Richter
- (Robert.Richter@infotech.tu-chemnitz.de)
- This item from the list of known problems was
- fixed by item #18 (below).
- #45. (Fixed in 1.33MR6) Brad Schick (schick@interaccess.com)
- The dependency scanner in VC++ mistakenly sees a
- reference to an MPW #include file even though properly
- #ifdef/#endif in config.h. The suggested workaround
- has been implemented:
- #ifdef MPW
- .....
- #define MPW_CursorCtl_Header <CursorCtl.h>
- #include MPW_CursorCtl_Header
- .....
- #endif
- #44. (Fixed in 1.33MR6) cast malloc() to (char *) in charptr.c
- Added (char *) cast for systems where malloc returns "void *".
- #43. (Added to 1.33MR6) Bruce Guenter (bruceg@qcc.sk.ca)
- Add setLeft() and setUp methods to ASTDoublyLinkedBase
- for symmetry with setRight() and setDown() methods.
- #42. (Fixed in 1.33MR6) Jeff Katcher (jkatcher@nortel.ca)
- C++ style comment in antlr.c corrected.
- #41. (Added in 1.33MR6) antlr -stdout
- Using "antlr -stdout ..." forces the text that would
- normally go to the grammar.c or grammar.cpp file to
- stdout.
- #40. (Added in 1.33MR6) antlr -tab to change tab stops
- Using "antlr -tab number ..." changes the tab stops
- for the grammar.c or grammar.cpp file. The number
- must be between 0 and 8. Using 0 gives tab characters,
- values between 1 and 8 give the appropriate number of
- space characters.
- #39. (Fixed in 1.33MR5) Jan Mikkelsen <janm@zeta.org.au>
- Commas in function prototype still not correct under
- some circumstances. Suggested code fix installed.
- #38. (Fixed in 1.33MR5) ANTLRTokenBuffer constructor
- Have ANTLRTokenBuffer ctor initialize member "parser" to null.
- #37. (Fixed in 1.33MR4) Bruce Guenter (bruceg@qcc.sk.ca)
- In ANTLRParser::FAIL(int k,...) released memory pointed to by
- f[i] (as well as f itself. Should only free f itself.
- #36. (Fixed in 1.33MR3) Cortland D. Starrett (cort@shay.ecn.purdue.edu)
- Neglected to properly declare isDLGmaxToken() when fixing problem
- reported by Andreas Magnusson.
- Undo "_retv=NULL;" change which caused problems for return values
- from rules whose return values weren't pointers.
- Failed to create bin directory if it didn't exist.
- #35. (Fixed in 1.33MR2) Andreas Magnusson
- (Andreas.Magnusson@mailbox.swipnet.se)
- Repair bug introduced by 1.33MR1 for #tokdefs. The original fix
- placed "DLGmaxToken=9999" and "DLGminToken=0" in the TokenType enum
- in order to fix a problem with an aggresive compiler assigning an 8
- bit enum which might be too narrow. This caused #tokdefs to assume
- that there were 9999 real tokens. The repair to the fix causes antlr to
- ignore TokenTypes "DLGmaxToken" and "DLGminToken" in a #tokdefs file.
- #34. (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue)
- Previously there was no public function for changing the line
- number maintained by the lexer.
- #33. (Fixed in 1.33MR1) Franklin Chen (chen@adi.com)
- Accidental use of EXIT_FAILURE rather than PCCTS_EXIT_FAILURE
- in pccts/h/AParser.cpp.
- #32. (Fixed in 1.33MR1) Franklin Chen (chen@adi.com)
- In PCCTSAST.cpp lines 405 and 466: Change
- free (t)
- to
- free ( (char *)t );
- to match prototype.
- #31. (Added to 1.33MR1) Pointer to parser in ANTLRTokenBuffer
- Pointer to parser in DLGLexerBase
- The ANTLRTokenBuffer class now contains a pointer to the
- parser which is using it. This is established by the
- ANTLRParser constructor calling ANTLRTokenBuffer::
- setParser(ANTLRParser *p).
- When ANTLRTokenBuffer::setParser(ANTLRParser *p) is
- called it saves the pointer to the parser and then
- calls ANTLRTokenStream::setParser(ANTLRParser *p)
- so that the lexer can also save a pointer to the
- parser.
- There is also a function getParser() in each class
- with the obvious purpose.
- It is possible that these functions will return NULL
- under some circumstances (e.g. a non-DLG lexer is used).
- #30. (Added to 1.33MR1) function tokenName(int token) standard
- The generated parser class now includes the
- function:
- static const ANTLRChar * tokenName(int token)
- which returns a pointer to the "name" corresponding
- to the token.
- The base class (ANTLRParser) always includes the
- member function:
- const ANTLRChar * parserTokenName(int token)
- which can be accessed by objects which have a pointer
- to an ANTLRParser, but do not know the name of the
- parser class (e.g. ANTLRTokenBuffer and DLGLexerBase).
- #29. (Added to 1.33MR1) Debugging DLG lexers
- If the pre-processor symbol DEBUG_LEXER is defined
- then DLexerBase will include code for printing out
- key information about tokens which are recognized.
- The debug feature of the lexer is controlled by:
- int previousDebugValue=lexer.debugLexer(newValue);
- a value of 0 disables output
- a value of 1 enables output
- Even if the lexer debug code is compiled into DLexerBase
- it must be enabled before any output is generated. For
- example:
- DLGFileInput in(stdin);
- MyDLG lexer(&in,2000);
- lexer.setToken(&aToken);
- #if DEBUG_LEXER
- lexer.debugLexer(1); // enable debug information
- #endif
- #28. (Added to 1.33MR1) More control over DLG header
- Version 1.33MR1 adds the following directives to PCCTS
- for C++ mode:
- #lexprefix <<source code>>
- Adds source code to the DLGLexer.h file
- after the #include "DLexerBase.h" but
- before the start of the class definition.
- #lexmember <<source code>>
- Adds source code to the DLGLexer.h file
- as part of the DLGLexer class body. It
- appears immediately after the start of
- the class and a "public: statement.
- #27. (Fixed in 1.33MR1) Comments in DLG actions
- Previously, DLG would not recognize comments as a special case.
- Thus, ">>" in the comments would cause errors. This is fixed.
- #26. (Fixed in 1.33MR1) Removed static variables from error routines
- Previously, the existence of statically allocated variables
- in some of the parser's member functions posed a danger when
- there was more than one parser active.
- Replaced with dynamically allocated/freed variables in 1.33MR1.
- #25. (Fixed in 1.33MR1) Use of string literals in semantic predicates
- Previously, it was not possible to place a string literal in
- a semantic predicate because it was not properly "stringized"
- for the report of a failed predicate.
- #24. (Fixed in 1.33MR1) Continuation lines for semantic predicates
- Previously, it was not possible to continue semantic
- predicates across a line because it was not properly
- "stringized" for the report of a failed predicate.
- rule : <<ifXYZ()>>?[ a very
- long statement ]
- #23. (Fixed in 1.33MR1) {...} envelope for failed semantic predicates
- Previously, there was a code generation error for failed
- semantic predicates:
- rule : <<xyz()>>?[ stmt1; stmt2; ]
- which generated code which resembled:
- if (! xyz()) stmt1; stmt2;
- It now puts the statements in a {...} envelope:
- if (! xyz()) { stmt1; stmt2; };
- #22. (Fixed in 1.33MR1) Continuation of #token across lines using ""
- Previously, it was not possible to continue a #token regular
- expression across a line. The trailing "" and newline caused
- a newline to be inserted into the regular expression by DLG.
- Fixed in 1.33MR1.
- #21. (Fixed in 1.33MR1) Use of ">>" (right shift operator in DLG actions
- It is now possible to use the C++ right shift operator ">>"
- in DLG actions by using the normal escapes:
- #token "shift-right" << value=value >> 1;>>
- #20. (Version 1.33/19-Jan-97 Karl Eccleson <karle@microrobotics.co.uk>
- P.A. Keller (P.A.Keller@bath.ac.uk)
- There is a problem due to using exceptions with the -gh option.
- Suggested fix now in 1.33MR1.
- #19. (Fixed in 1.33MR1) Tom Piscotti and John Lilley
- There were problems suppressing messages to stdin and stdout
- when running in a window environment because some functions
- which uses fprint were not virtual.
- Suggested change now in 1.33MR1.
- I believe all functions containing error messages (excluding those
- indicating internal inconsistency) have been placed in functions
- which are virtual.
- #18. (Version 1.33/ 22-Nov-96) John Bair (jbair@iftime.com)
- Under some combination of options a required "return _retv" is
- not generated.
- Suggested fix now in 1.33MR1.
- #17. (Version 1.33/3-Sep-96) Ron House (house@helios.usq.edu.au)
- The routine ASTBase::predorder_action omits two "tree->"
- prefixes, which results in the preorder_action belonging
- to the wrong node to be invoked.
- Suggested fix now in 1.33MR1.
- #16. (Version 1.33/7-Jun-96) Eli Sternheim <eli@interhdl.com>
- Routine consumeUntilToken() does not check for end-of-file
- condition.
- Suggested fix now in 1.33MR1.
- #15. (Version 1.33/8 Apr 96) Asgeir Olafsson <olafsson@cstar.ac.com>
- Problem with tree duplication of doubly linked ASTs in ASTBase.cpp.
- Suggested fix now in 1.33MR1.
- #14. (Version 1.33/28-Feb-96) Andreas.Magnusson@mailbox.swipnet.se
- Problem with definition of operator = (const ANTLRTokenPtr rhs).
- Suggested fix now in 1.33MR1.
- #13. (Version 1.33/13-Feb-96) Franklin Chen (chen@adi.com)
- Sun C++ Compiler 3.0.1 can't compile testcpp/1 due to goto in
- block with destructors.
- Apparently fixed. Can't locate "goto".
- #12. (Version 1.33/10-Nov-95) Minor problems with 1.33 code
- The following items have been fixed in 1.33MR1:
- 1. pccts/antlr/main.c line 142
- "void" appears in classic C code
- 2. no makefile in support/genmk
- 3. EXIT_FAILURE/_SUCCESS instead of PCCTS_EXIT_FAILURE/_SUCCESS
- pccts/h/PCCTSAST.cpp
- pccts/h/DLexerBase.cpp
- pccts/testcpp/6/test.g
- 4. use of "signed int" isn't accepted by AT&T cfront
- pccts/h/PCCTSAST.h line 42
- 5. in call to ANTLRParser::FAIL the var arg err_k is passed as
- "int" but is declared "unsigned int".
- 6. I believe that a failed validation predicate still does not
- get put in a "{...}" envelope, despite the release notes.
- 7. The #token ">>" appearing in the DLG grammar description
- causes DLG to generate the string literal ">>" which
- is non-conforming and will cause some compilers to
- complain (scan.c function act10 line 143 of source code).
- #11. (Version 1.32b6) Dave Kuhlman (dkuhlman@netcom.com)
- Problem with file close in gen.c. Already fixed in 1.33.
- #10. (Version 1.32b6/29-Aug-95)
- pccts/antlr/main.c contains a C++ style comments on lines 149
- and 176 which causes problems for most C compilers.
- Already fixed in 1.33.
- #9. (Version 1.32b4/14-Mar-95) dlgauto.h #include "config.h"
- The file pccts/h/dlgauto.h should probably contain a #include
- "config.h" as it uses the #define symbol __USE_PROTOS.
- Added to 1.33MR1.
- #8. (Version 1.32b4/6-Mar-95) Michael T. Richter (mtr@igs.net)
- In C++ output mode anonymous tokens from in-line regular expressions
- can create enum values which are too wide for the datatype of the enum
- assigned by the C++ compiler.
- Fixed in 1.33MR1.
- #7. (Version 1.32b4/6-Mar-95) C++ does not imply __STDC__
- In err.h the combination of # directives assumes that a C++
- compiler has __STDC__ defined. This is not necessarily true.
- This problem also appears in the use of __USE_PROTOS which
- is appropriate for both Standard C and C++ in antlr/gen.c
- and antlr/lex.c
- Fixed in 1.33MR1.
- #6. (Version 1.32 ?/15-Feb-95) Name conflict for "TokenType"
- Already fixed in 1.33.
- #5. (23-Jan-95) Douglas_Cuthbertson.JTIDS@jtids_qmail.hanscom.af.mil
- The fail action following a semantic predicate is not enclosed in
- "{...}". This can lead to problems when the fail action contains
- more than one statement.
- Fixed in 1.33MR1.
- #4 . (Version 1.33/31-Mar-96) jlilley@empathy.com (John Lilley)
- Put briefly, a semantic predicate ought to abort a guess if it fails.
- Correction suggested by J. Lilley has been added to 1.33MR1.
- #3 . (Version 1.33) P.A.Keller@bath.ac.uk
- Extra commas are placed in the K&R style argument list for rules
- when using both exceptions and ASTs.
- Fixed in 1.33MR1.
- #2. (Version 1.32b6/2-Oct-95) Brad Schick <schick@interaccess.com>
- Construct #[] generates zzastnew() in C++ mode.
- Already fixed in 1.33.
- #1. (Version 1.33) Bob Bailey (robert@oakhill.sps.mot.com)
- Previously, config.h assumed that all PC systems required
- "short" file names. The user can now override that
- assumption with "#define LONGFILENAMES".
- Added to 1.33MR1.