Skip to content
  • Markus Armbruster's avatar
    852dfa76
    json: Clean up how lexer consumes "end of input" · 852dfa76
    Markus Armbruster authored
    
    
    When the lexer isn't in its start state at the end of input, it's
    working on a token.  To flush it out, it needs to transit to its start
    state on "end of input" lookahead.
    
    There are two ways to the start state, depending on the current state:
    
    * If the lexer is in a TERMINAL(JSON_FOO) state, it can emit a
      JSON_FOO token.
    
    * Else, it can go to IN_ERROR state, and emit a JSON_ERROR token.
    
    There are complications, however:
    
    * The transition to IN_ERROR state consumes the input character and
      adds it to the JSON_ERROR token.  The latter is inappropriate for
      the "end of input" character, so we suppress that.  See also recent
      commit a2ec6be7 "json: Fix lexer to include the bad character in
      JSON_ERROR token".
    
    * The transition to a TERMINAL(JSON_FOO) state doesn't consume the
      input character.  In that case, the lexer normally loops until it is
      consumed.  We have to suppress that for the "end of input" input
      character.  If we didn't, the lexer would consume it by entering
      IN_ERROR state, emitting a bogus JSON_ERROR token.  We fixed that in
      commit bd3924a3.
    
    However, simply breaking the loop this way assumes that the lexer
    needs exactly one state transition to reach its start state.  That
    assumption is correct now, but it's unclean, and I'll soon break it.
    Clean up: instead of breaking the loop after one iteration, break it
    after it reached the start state.
    
    Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Message-Id: <20180831075841.13363-3-armbru@redhat.com>
    852dfa76
    json: Clean up how lexer consumes "end of input"
    Markus Armbruster authored
    
    
    When the lexer isn't in its start state at the end of input, it's
    working on a token.  To flush it out, it needs to transit to its start
    state on "end of input" lookahead.
    
    There are two ways to the start state, depending on the current state:
    
    * If the lexer is in a TERMINAL(JSON_FOO) state, it can emit a
      JSON_FOO token.
    
    * Else, it can go to IN_ERROR state, and emit a JSON_ERROR token.
    
    There are complications, however:
    
    * The transition to IN_ERROR state consumes the input character and
      adds it to the JSON_ERROR token.  The latter is inappropriate for
      the "end of input" character, so we suppress that.  See also recent
      commit a2ec6be7 "json: Fix lexer to include the bad character in
      JSON_ERROR token".
    
    * The transition to a TERMINAL(JSON_FOO) state doesn't consume the
      input character.  In that case, the lexer normally loops until it is
      consumed.  We have to suppress that for the "end of input" input
      character.  If we didn't, the lexer would consume it by entering
      IN_ERROR state, emitting a bogus JSON_ERROR token.  We fixed that in
      commit bd3924a3.
    
    However, simply breaking the loop this way assumes that the lexer
    needs exactly one state transition to reach its start state.  That
    assumption is correct now, but it's unclean, and I'll soon break it.
    Clean up: instead of breaking the loop after one iteration, break it
    after it reached the start state.
    
    Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Message-Id: <20180831075841.13363-3-armbru@redhat.com>
Loading